Modern biomedical sciences are data-driven, and depend on many reliable databases of biomedical knowledge, or simply "knowledge bases". Knowledge bases are particularly important for precision medicine, where the idea is to treat patients with the same condition differently according to their genetic profiles. Because the number of possible genetic profiles is huge, knowledge bases are essential for clinicians to look up what treatment is known to be effective for their patients. Examples of widely-used knowledge bases for precision medicine include ClinVar, COSMIC, Catalog of GWAS, and MyCancerGenome. These knowledge bases contain known associations between genetic variants, disease conditions and/or treatments. In the past decade, Biomedical Text Mining and Natural Language Processing (NLP) have been touted as the solution to extract the knowledge from research publications and automatically create the knowledge bases to meet the needs. However, after decades of development, none of the well-known precision medicine knowledge bases use text mining or NLP to extract knowledge. Instead, they rely on human expert “curators” to manually extract and validate knowledge from the research publications, a practice that is expensive and doom to be overwhelmed by the growth rate of new publications. At the time when we heard a lot of hypes about how AI will overtake human jobs soon, no sign shows that these curators’ jobs will be overtaken by AI anytime soon. In this talk, I will survey the current practice of knowledge base curation, state-of-the-art biomedical text mining and NLP and why they fail, and propose a potential solution — a hybrid human-AI solution with web annotation and its use by ClinGen (curation for ClinVar) and other biomedical knowledge bases. If time allows, I will also cover more on web annotation as a new paradigm of online communication, its use in fighting fake news, and how it can help salvage both democracy and science — the foundations of the modern civilization that Dr. Shih Hu, the first president of Academia Sinica in Taiwan, advocated throughout his life but are under siege around the world.
Chun-Nan Hsu, PhD, Associate Professor at the School of Medicine, University of California, San Diego. Dr. Hsu has published more than 100 highly cited peer-reviewed research articles in the fields of machine learning, data mining, and biomedical informatics. His team developed widely used software tools for biomedical sciences, leading to commercialized products. He was awarded Senior Member of Association of Computing Machinery (ACM) in 2011 and the IBM Faculty Award for his distinguished contributions to biomedical text mining in 2012.