 |
| AIIA Lab of IIS, Academia Sinica,
led by Chun-Nan Hsu, achieved top scores in the second BioCreative Text Mining
Challenge |
The research
team from the AIIA Lab, Institute of Information Science, Academia Sinica,
Taiwan, led by Chun-Nan Hsu, and from I-fang Chung's Lab at the Institute
of Bioinformatics, National Yang-Ming University, achieved the second and
third highest scores for the two methods that they submitted to the second
BioCreative Challenge Evaluation, held in Madrid, Spain in 2007. There were
21 participants who submitted their methods to this Challenge. The top score
was achieved by a team from IBM T.J. Watson Research Center in the USA. However,
the organizer reported that the top 3 scores did not have statistically significant
differences, and thus these scores could all be considered as the top scores
in this Challenge. Moreover, after reweighting the samples to correct the
sample selection bias, the first method submitted by Hsu and Chung received
the top score among those submitted by all participants.
This Challenge is to evaluate the performance of the state-of-the-art
computer programs for the task of extracting gene and gene product mentions
from a large corpus of literature in molecular biology. Such computer programs
can assist molecular biologists to search literature related to certain genes.
They also allow researchers to extract a large number of reports on certain
molecular biology events (e.g., protein-protein interactions, reaction pathways,
etc.) from literature without performing resource-demanding and time-consuming
experiments. Therefore, a great deal of efforts has been devoted to this
research around the world. Extracting gene mentions is particularly difficult
because authors rarely use standardized gene names and gene names naturally
co-occur with other types that have similar morphology, and even similar
context. The Academia Sinica-Yang Ming University team applied machine learning
algorithms to train conditional random fields and support vector machines
from a corpus of 15,000 sample sentences to achieve their top scores. They
have been studying efficient training algorithms for conditional random fields
and have already achieved promising results. Those results will be published
in the near future.
This research is supported by the National Research Program for Genomic
Medicine (NRPGM), National Science Council (NSC) under the grant for Advanced
Bioinformatics Core (ABC) facility. ABC consists of four teams from National
Yang-Ming University and Academia Sinica. ABC welcomes collaboration proposals
from biology labs to extend the impact of their research achievements.
Other participants include teams from the University of Pennsylvania,
which was the defending top scorer, National Center for Biotechnology Information
(NCBI), Cambridge University and other renown research institutes from Netherland,
Spain, Germany, Korea, China etc.
Pls. visit http://aiia.iis.sinica.edu.tw/biocreative2.htm
|
| Copyright © Institute of Information
Science, Academia Sinica Tel: +886-2-27883799 |
|
|
|