Journal of Information Science and Engineering, Vol. 28 No. 3, pp. 601-615 (May 2012)

A Keyword Based Prototype for Web Search Result Diversification*

School of Computer Science and Engineering
South China University of Technology
Guangzhou, 510006 China

In web search scenario, users often submit short query terms to search engines, expecting to find their desired information in top ranked results. But their queries are so ambiguous that their actual information needs are often unspecified. To satisfy the different information needs, an effective approach is to diversify the top results retrieved for the query. In this paper, we reduce the diversification problem into optimizing the maximum coverage of information facets related to the query, and introduce KED, a novel keyword based prototype for web search result diversification that provides a diverse ranking by selecting documents to cover keywords which belong to different facets underlying the retrieved documents. We evaluated the effectiveness of KED using two public test collections with different kinds of documents. The experiment results show that KED can stably outperform other existing implicit diversification approaches in promoting diversity of top ranked results. Moreover, we show that its effectiveness can be further improved by using high quality keywords.

Keywords: information retrieval, search result diversification, search result re-ranking, document novelty, keyword extraction

Received June 26, 2010; revised December 22, 2010; accepted March 23, 2011.
Communicated by Jonathan Lee.
* The work described in this paper was partially supported by grants from the National Natural Science Foundation of China (Project No. 61070090, 61003174 and 60973083), a grant from NSFC-Guangdong Joint Fund (Project No. U1035004), grants from Natural Science Foundation of Guangdong Province, China (Project No. 9451064101003233, 10252500002000001 and 10451064101004233), and grants from Fundamental Research Funds for the Central Universities (Project No. 2009ZM0125, 2009ZM0189 and 2009ZM0255).