My main research interest is in the areas of information retrieval, Chinese speech and language processing, and Web
mining. I was one of a few pioneers dedicated to Chinese IR in the mid-1990’s. To my knowledge, I was the first one that published
a Chinese text retrieval paper at SIGIR conferences. I developed an effective retrieval system called Csmart to deal with
quasi-natural language search in Chinese. The system was successfully transferred to industry in Taiwan during 1995 and 1996.
I have developed an innovative PAT-tree-based approach to Chinese keyword extraction in SIGIR’97, which was one of the representative
works frequently cited in the area. In 1998, I received the Best Poster Presentation Award for late-breaking research
at SIGIR’98. In recent years, my research has focused mainly on mining Web data resources, especially in discovering knowledge
from Web anchor texts and query logs. My team proposed a series of Web mining approaches for exploiting the Web as live
bilingual corpora and reducing the difficulties involved in translating words/terms not included in common bilingual dictionaries.
At the same time, we developed a set of log mining techniques including query clustering and new query categorization to
create query taxonomy automatically. These techniques are able to organize users’ search vocabulary into a hierarchical structure, extract relevant terms for queries and assign subject categories to them. In the above research, my team published some
papers at prestigious conferences and in journals, such as WWW’04, ICDM’01 and ‘02, and ACM TOIS’04. In the last 10 years, I have chaired SIGIR for the Association for Computational Linguistics and Chinese Language
Processing to promote IR activities in Taiwan. In 1997, I received the first K. T. Li Distinguished Young Scholar Award,
which was presented by the ACM Taipei Chapter for my contributions in Chinese Information Retrieval. I am also active in
professional services, especially in information retrieval and Asian language information processing. Six times I have served
as a PC member at ACM SIGIR conferences, the top IR conferences. I was a steering committee member organizing the
International Workshop on Information Retrieval with Asian Languages (IRAL), an annual IR meeting in Asia since 1995. I
am currently an associate editor for both ACM Transactions on Asian Language Information Processing and the Journal of
Computational Linguistics and Chinese Language Processing. From 1996 to 2001, I was also an editorial board member of
Information Processing and Management. |