Institute of Information Science, Academia Sinica

Events

Print

Press Ctrl+P to print from browser

Seminar

:::

A Dependency Parser for Tweets

  • LecturerMr. Lingpeng Kong (Language Technologies Institute, Carnegie Mellon University)
    Host: Lun-Wei Ku
  • Time2015-08-20 (Thu.) 10:00 ~ 11:00
  • LocationAuditorium 106 at new IIS Building
Abstract

In contrast to the edited, standardized language of traditional publications such as news reports, social media text closely represents language as it is used by people in their everyday lives. These informal texts, which account for ever larger proportions of written content, are of considerable interest to researchers. Here, we describe a new dependency parser for English tweets, TweeboParser. This work builds on several contributions: new syntactic annotations for a corpus of tweets (TweeBank), with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data. Our experiments show that the parser achieves over 80% unlabeled attachment accuracy on our new, high-quality test set and measure the benefit of our contributions.

BIO

Lingpeng Kong is a Ph.D. student in School of Computer Science, Carnegie Mellon University, co-advised by Prof. Noah Smith and Prof. Chris Dyer. His main research interests are in designing algorithms to tackle the core problems in natural language processing (NLP). His work utilizes methods from machine learning, optimization and combinatorial algorithms with applications related to syntactic parsing, machine translation, and social media. Prior to CMU, he worked in IBM China Systems and Technology Lab.