Institute of Information Science, Academia Sinica



Press Ctrl+P to print from browser


Discrimination-Emphasized Mel-Frequency-Warping for Time-Varying Speaker Recognition

  • LecturerDr. Thomas Fang Zheng (Research Professor and Vice Dean of the Research Institute of Information Technology (RIIT), Tsinghua University)
    Host: Dr. Hsin-Min Wang
  • Time2011-12-09 (Fri.) 10:30 – 12:00
  • LocationAuditorium 106 at new IIS Building

Performance degradation with time varying is a generally acknowledged phenomenon in speaker recognition and it is widely assumed that speaker models should be updated from time to time to maintain representativeness. However, it is costly, user-unfriendly, and sometimes, perhaps unrealistic, which hinders the technology from practical applications. From a pattern recognition point of view, the time-varying issue in speaker recognition requires such features that are speaker- specific, and as stable as possible across time-varying sessions. Therefore, after searching and analyzing the most stable parts of feature space, a Discrimination-emphasized Mel-frequency- warping method is proposed. In implementation, each frequency band is assigned with a discrimination score, which takes into account both speaker and session information, and Mel- frequency-warping is done in feature extraction to emphasize bands with higher scores. Experimental results show that in the time-varying voiceprint database, this method can not only improve speaker recognition performance with an EER reduction of 19.1%, but also alleviate performance degradation brought by time varying with a reduction of 8.9%.