Journal of Information Science and Engineering, Vol. 22 No. 4, pp. 863-887 (July 2006)

Hybrid Architecture for Web Search Systems Based on Hierarchical Taxonomies*

Fidel Cacheda, Victor Carneiro, Carmen Guerrero and Angel Vina
Department of Information and Communications Technologies
University of A Coruna
A Coruna, 15071 Spain

Search systems based on hierarchical taxonomies provide a specific type of search functionality that is not provided by conventional search engines. For instance, using a taxonomy, the user can look for documents related to just one of the categories of the taxonomy. This paper describes a hybrid data architecture that improves the performance of restricted searches for a few categories of a taxonomy. The proposed architecture is based on a hybrid data structure composed of an inverted file with multiple integrated signature files. A detailed analysis of superimposing codes on directed acyclic graphs proves that they adapt perfectly well to a search system based on a hierarchical ontology. Two variants are presented: the hybrid architecture with complete information and the hybrid architecture with partial information. The validity of this hybrid architecture was analyzed by developing and comparing it with a basic architecture. The performance of restricted queries is clearly improved, especially with the hybrid architecture with partial information. This variant outperformed by 50 % the basic architecture for all workload environments, with a slight reduction in performance for the lower levels of the graph.

Keywords: signature files, superimposing code, inverted file, hybrid data architecture, hierarchical taxonomies, performance evaluation

Received April 13, 2004; revised August 11 & December 27, 2004; accepted January 20, 2005.
Communicated by Ming-Syan Chen.
* This paper was partially supported by the CICYT of the Spanish government, under project TIC2001-0547.