Research Fellow  |  Chen, Keh-Jiann  
lab (New window)
Research Descriptions

        Construction of ontology and common sense knowledge databases is very time consuming. In the last twenty some years, we have developed an infrastructure for Chinese language processing that includes part-of-speech tagged corpora, treebank, Chinese lexical databases, Chinese grammar, word identification systems, and sentence parsers. In the future we will utilize the developed infrastructure to extract linguistic and domain knowledge from various corpora and texts on the Web to enhance current knowledge databases. The targeted databases include general domain ontology, special domain ontology, as well as lexical, syntactic, and semantic knowledge databases. The various databases will be inter-connected to form a ConceptNet for language processing and logical inference. For knowledge representation we study the logical foundation of ontology and fine-grain semantic representation. We also study near-synonyms to identify fine-grain differences between synonyms. These processes enable us to better understand meaning representation and meaning composition. We will remodel the current ontology structures of WordNet, HowNet, and FrameNet to achieve a better and more unified representation, called E-HowNet. In addition, we will study semantic composition processing based on the E-HowNet. We will focus on conceptual processing of Chinese documents. Our designed knowledge-based language processing system utilizes statistical, linguistic, and commonsense knowledge to parse the conceptual structures of sentences and interpret the meanings of sentences. The system incorporates knowledge bases to form an automatic learning process. Thus, the language processing system increases its processing power due to enhancement of the knowledge bases. Conversely, the knowledge bases are evolving due to the automatic knowledge extraction made by the language processing system.