Institute of Information Science Academia Sinica
Topic: ISCA Distinguished Lecture - Feature-Domain, Model-Domain, and Hybrid Approaches to Noise-Robust Speech Recognition
Speaker: Dr. Li Deng (Microsoft Research)
Date: 2011-06-27 (Mon) 15:30 – 17:00
Location: Auditorium 106 at new IIS Building
Host: Dr. Hsin-Min Wang

Abstract:

Noise robustness has long been an active area of research that captures significant interest from speech recognition researchers and developers. In this lecture, we use the Bayesian framework as a common thread to connect, analyze, and categorize a number of popular approaches to noise robust speech recognition pursued in the recent past. The topics covered in this lecture include: 1) Bayesian decision rules with unreliable features and unreliable model parameters; 2) Principled ways of computing feature uncertainty using structured speech distortion models; 3) Use of phase
factor in an advanced speech distortion model for feature compensation;  4) A novel perspective on model compensation as a special implementation of the general Bayesian predictive classification rule capitalizing on model parameter uncertainty; 5) Taxonomy of noise compensation techniques using two distinct axes: feature vs. model domain and structured vs. unstructured transformation; and 6) Noise adaptive training as a hybrid feature-model compensation framework and its various forms of extension.


BIO:

Li Deng received the Ph.D. degree from the University of Wisconsin-Madison. He joined Dept. Electrical and Computer Engineering, University of Waterloo, Ontario, Canada in 1989 as an Assistant Professor, where he became a Full Professor with tenure in 1996. From 1992 to 1993, he conducted sabbatical research at Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass, and from 1997-1998, at ATR Interpreting Telecommunications Research Laboratories, Kyoto, Japan. In 1999, he joined Microsoft Research, Redmond, WA as a Senior Researcher, where he is currently a Principal Researcher. Since 2000, he has also been an Affiliate Professor in the Department of Electrical Engineering at University of Washington, Seattle, teaching the graduate course of Computer Speech Processing. His current (and past) research activities include automatic speech and speaker recognition, spoken language identification and understanding, speech-to-speech translation, machine translation, language modeling, statistical methods and machine learning, neural information processing, deep-structured learning, machine intelligence, audio and acoustic signal processing, statistical signal processing and digital communication, human speech production and perception, acoustic phonetics, auditory speech processing, auditory physiology and modeling, noise robust speech processing, speech synthesis and enhancement, multimedia signal processing, and multimodal human-computer interaction. In these areas, he has published over 300 refereed papers in leading journals and conferences, 3 books, 15 book chapters, and has given keynotes, tutorials, and lectures worldwide. He is elected by ISCA (International Speech Communication Association) as its Distinguished Lecturer 2010-2011. He has been granted over 40 US or international patents in acoustics/audio, speech/language technology, and other fields of signal processing. He received awards/honors bestowed by IEEE, ISCA, ASA, Microsoft, and other organizations.

He is a Fellow of the Acoustical Society of America, and a Fellow of the IEEE. He serves on the Board of Governors of the IEEE Signal Processing
Society (2008-2010), and as Editor-in-Chief for the IEEE Signal Processing Magazine (2009-2011), which ranks consistently among the top journals with the highest citation impact. According to the Thomson Reuters Journal Citation Report, released June 2010, the SPM has ranked first among all IEEE publications (125 in total) and among all publications within the Electrical and Electronics Engineering Category (245 in total) in terms of its impact factor.