Institute of Information Science, Academia Sinica



An Analysis of a Mandarin-English Code-switching Speech Corpus: SEAME

  • LecturerDr. Dau-cheng Lyu (Nanyang Technological University, Singapore)
    Host: Dr. Ming-Tat Ko
  • Time2010-12-03 (Fri.) 16:00 – 18:00
  • LocationAuditorium 108 at old IIS Building
Abstract: SEAME (South East Asia Mandarin-English) is a 30 hours spontaneous Mandarin-English code-switching speech corpus recorded from Singapore and Malaysia speakers. In this talk, a series of analyses on the recording, processing time and voice activity rate (VAR) of the speech recording, transcription, validation and language boundaries labeling processes are addressed. In addition, the duration of the monolingual segment in the code-switching utterance and the analysis of the speakers' behavior in language switching during conversation are also described. The results of the analysis show that 80% and 72% monolingual segments of English and Mandarin in the code-switching utterance are shorter than one second. In over 80% of the cases, speakers directly switch language without any short pause and discourse particle between two adjacent different languages.