Institute of Information Science, Academia Sinica



Press Ctrl+P to print from browser


Incorporating Multiple Knowledge Sources in Feature Compensation and Acoustic Modeling Adaptation for Robust Speech Recognition

  • LecturerDr. Yu Tsao (National Institute of Information and Communications Technology (NICT), Kyoto, Japan)
    Host: Dr. Hsin-Min Wang
  • Time2011-04-08 (Fri.) 10:30 – 12:00
  • LocationAuditorium 106 at new IIS Building

The mismatch between training and testing conditions is an important issue to the current applicability of automatic speech recognition (ASR). The source of mismatch may come from: 1) speaker effects, including speaker’s accent, dialect, and speaking rates; 2) speaking environment effects, including interfering noise, transducers and transmission channel distortions. Many approaches have been proposed in order to reduce this mismatch. Among them, feature compensation and acoustic model adaptation are two popular directions. For feature compensation, a transformation function is calculated to convert testing speech features to match the training condition. The converted features are then used for performing recognition. For acoustic model adaptation, on the other hand, a transformation function is estimated and used to adapt acoustic models from the training condition to match the testing environment. The adapted acoustic models are then used for testing recognition. For both feature compensation and model adaptation, the efficiency and accuracy of the transformation function estimation are crucial to their achievable performance.

In this talk, we present our recent researchesthat attempt to incorporate multiple knowledge sources to improvethe efficiency and accuracy of the transformation function estimation for feature compensation and acoustic model adaptation. We studiedand proposed foureffective ways forincorporatingthe multiple knowledge sources: (1) Prepare multiple sets of prior information that are collected from different phases. (2) Derive new objective functionsthat consider multiple goals. (3) Utilize knowledge based or data-driven based constraints.(4) Characterize diverse acoustic information embedded in the available speech data. Our experimental results indicate that by properly incorporating the multiple knowledge sources, we can improve the efficiency and accuracy of transformation function estimation and therefore enhance the performance of feature compensation and acoustic model adaptation.