Previous [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

@

Journal of Information Science and Engineering, Vol.19 No.4, pp.681-695 (July 2003)


An Inverted File Cache for Fast Information Retrieval*

Wann-Yun Shieh, Jean Jyh-Jiun Shann and Chung-Ping Chung
Department of Computer Science and Information Engineering
National Chiao Tung University
Hsinchu, 300 Taiwan
E-mail: {wyshieh; jjshann; cpchung}@csie.nctu.edu.tw

The inverted file is the most popular indexing mechanism used for document search in an information retrieval system (IRS). However, the disk I/O for accessing the inverted file becomes a bottleneck in an IRS. To avoid using the disk I/O, we propose a caching mechanism for accessing the inverted file, called the inverted file cache (IF cache). In this cache, a proposed hashing scheme using a linked list structure to handle collisions in the hash table speeds up entry indexing. Furthermore, the replacement and storage mechanisms of this cache are designed specifically for the inverted file structure. We experimentally verify our design, based on documents collected from the TREC (Text REtrieval Conference) and search requests generated by the Zipf-like distribution. Simulation results show that the IF cache can improve the performance of a test IRS by about 60% in terms of the average searching response time.

Keywords: information retrieval system, inverted file, cache, hashing, memory management

Full Text () Retrieve PDF document (200307_09.pdf)

Received November 13, 2001; revised July 4, 2002; accepted October 28, 2002.
Communicated by Arbee L. P. Chen
*This work was supported by National Science Council of R.O.C., NCS-89-2213-E-009-062.