Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18] [ 19] [ 20] [ 21]


Journal of Information Science and Engineering, Vol. 30 No. 3, pp. 553-569 (May 2014)

Enhancing Query Formulation for Spoken Document Retrieval

1Department of Computer Science and Engineering
National Taiwan Normal University
Taipei, 106 Taiwan
2Institute of Information Science
Academia Sinica
Nankang, 115 Taiwan

The popularity and ubiquity of multimedia associated with spoken documents has spurred a lot of research interest in spoken document retrieval (SDR) in the recent past. Beyond much effort devoted to developing robust indexing and modeling techniques for representing spoken documents, a recent line of thought targets at the improvement of query modeling for better reflecting the users information need. Pseudo-relevance feedback is by far the most commonly-used paradigm for query reformulation, which assumes that a small amount of top-ranked feedback documents obtained from the initial round of retrieval are relevant and can be utilized for this purpose. Nevertheless, simply taking all of the top-ranked feedback documents obtained from the initial retrieval for query modeling does not always perform well, especially when the top-ranked documents contain much redundant or non-relevant information. In the view of this, we explore in this paper an interesting problem of how to effectively glean useful cues from the top-ranked documents so as to achieve more accurate query modeling. Towards this end, various sources of information cues are considered and integrated into the process of feedback document selection so as to achieve better retrieval effectiveness. Furthermore, we also investigate representing the query and documents with different granularities of index features to work in conjunction with the query and document models. A series of experiments conducted on the TDT (Topic Detection and Tracking) task seem to demonstrate the effectiveness of our query modeling framework for SDR.

Keywords: spoken document retrieval, language modeling, query modeling, pseudo-relevance feedback, speech recognition

Full Text () Retrieve PDF document (201405_01.pdf)

Received February 28, 2013; accepted June 15, 2013.
Communicated by Hung-Yu Kao, Tzung-Pei Hong, Takahira Yamaguchi, Yau-Hwang Kuo, and Vincent Shin-Mu Tseng.