Shian-Hua Lin, Meng Chang Chen, Jan-Ming Ho and Yueh-Ming Huang
In this paper, we present an intelligent Internet information system ACIRD using machine learning techniques to organize and retrieve Internet Web documents. ACIRD consists of three parts: knowledge acquisition, document classifier and two-phase search engine. The knowledge acquisition of ACIRD automatically learns the classification knowledge from classified Internet Web documents and the classifier applies the classification knowledge to classify newly collected Internet Web documents to one or more classes in a class hierarchy. The experiments show that ACIRD performs as good as or better than human experts in both knowledge extraction and document classification. Based on the learned classification knowledge and the given class hierarchy, the ACIRD two-phase search engine presents hierarchically navigable structured results to the users instead of conventional flat ranked results that greatly helps users in discovering information from diversified Internet documents.