Institute of Information Science Academia Sinica
講 題: Cross-lingual Cross-media Relation and Event Structure Transfer
講 者: Heng Ji 教授 (Department of Computer Science, University of Illinois at Urbana-Champaign)
時 間: 2019-12-17 (Tue) 10:00 – 12:00
地 點: 資訊所新館106演講廳
邀請人: 蘇克毅
摘要:

The identification of complex semantic graph structures such as events and entity relations from unstructured texts, already a challenging Information Extraction task, is doubly difficult to extract from sources written in under-resourced and under-annotated languages. We investigate the suitability of cross-lingual cross-media graph structure transfer techniques for these tasks. Previous efforts on cross-lingual transfer are limited to sequence level. In contrast, we observe that relational facts are typically expressed by identifiable structured graph patterns across multiple languages and data modalities. We exploit relation- and event-relevant language-universal and modality-universal features, leveraging both symbolic (including part-of-speech and dependency path) and distributional (including type representation and contextualized representation) information. We then represent all entity mentions, event triggers, and contexts into this complex and structured multilingual common space, using graph convolutional networks. In this way all the sentences from multiple languages, along with visual objects from images are represented as one shared unified graph representation. We then train a relation or event extractor from source language annotations and apply it to the target language and images. Extensive experiments on cross-lingual and cross-media relation and event transfer demonstrate that our approach achieves performance comparable to state-of-the-art supervised models trained on up to 3,000 manually annotated mentions, and dramatically outperforms methods learned from flat representation. I will show a preliminary demo on applying the resultant event knowledge base for automatic history book generation.


BIO:

Heng Ji is a professor at Computer Science Department of University of Illinois at Urbana-Champaign. She received her B.A. and M. A. in Computational Linguistics from Tsinghua University, and her M.S. and Ph.D. in Computer Science from New York University. Her research interests focus on Natural Language Processing, especially on Information Extraction and Knowledge Base Population. She is selected as "Young Scientist" and a member of the Global Future Council on the Future of Computing by the World Economic Forum in 2016 and 2017. The awards she received include "AI's 10 to Watch" Award by IEEE Intelligent Systems in 2013 and NSF CAREER award in 2009. She has coordinated the NIST TAC Knowledge Base Population task since 2010. She is the associate editor for IEEE/ACM Transaction on Audio, Speech, and Language Processing, and served as the Program Committee Co-Chair of many conferences including NAACL-HLT2018.