@[] [English Version]
@ HomexContact us
@About IASL | Research | Publications | Demos | People
@ Home>>Research>>Biological Literature Mining

@
NER
Information Extraction (IE) is the task of extracting information of interest from unconstrained text. IE involves two main tasks: the recognition of named entities, and the recognition of the relationships among these named entities. Named Entity Recognition (NER) involves the identification of proper names in text and their classification into different types of named entities (e.g., persons, organizations, locations). NER is not only important in IE, but also in lexical acquisition for the development of robust NLP systems [4]. Moreover, NER has proven fruitful for tasks such as documents indexing, and maintenance of databases containing identified named entities.
We concentrate on Chinese NER problems. We proposed a hybrid method combining the advantages of rule-based and machine learning (ML) based NER systems. Rule-based NER systems can explicitly encode human comprehension and can be tuned conveniently, while ML-based systems are robust, portable and inexpensive to develop. Our hybrid system incorporates a rule-based knowledge representation and template-matching tool, InfoMap, into a maximum entropy (ME) framework. Named entities are represented in InfoMap as templates, which serve as ME features in Mencius. These features are edited manually and their weights are estimated by the ME framework according to the training data. To avoid the errors caused by word segmentation, we model the NER problem as a character-based tagging problem. In our experiments, Mencius outperforms both pure rule-based NER systems. The F-Measures of person names (PER), location names (LOC) and organization names (ORG) in the experiment are respectively 94.3%, 77.8% and 75.3%. We also compared the NER results with/without word segmentation and found slight differences.
@

Demo site URL:

@
@
Wen-Lian Hsu
Professor, IEEE Fellow
Research Fellow
Institute of Information Science ,
Academia Sinica, Taipei,
Taiwan, R. O. C.
Phone:
886-2-27883799 ext.1804
Fax:
886-2-27824814
E-mail: hsu@iis.sinica.edu.tw

@

@
@
Ting-Yi Sung
Research Fellow
Institute of Information Science ,
Academia Sinica, Taipei,
Taiwan, R. O. C.
Phone:
886-2-27883799 ext.1711
Fax:
886-2-27824814
E-mail:
 tsungiis.sinica.edu.tw

@

@
Intelligent Agent Systems Lab., Institute of Information Science, Academia Sinica.
128 Academia Road, Sec.2, Nankang, Taipei, Taiwan, ROC
Tel: +886-2-2788-3799, Fax: 886-2-2782-4814, 886-2-2651-8660