您的瀏覽器不支援JavaScript語法,網站的部份功能在JavaScript沒有啟用的狀態下無法正常使用。

Institute of Information Science, Academia Sinica

Events

Print

Press Ctrl+P to print from browser

Seminar

:::

A Deep Learning Perspective on Acoustic Signal Processing

  • LecturerProf. Chin-Hui Lee (School of Electrical and Computer Engineering, Georgia Institute of Technology)
    Host: Keh-Yih Su
  • Time2014-11-18 (Tue.) 15:00 ~ 16:00
  • LocationAuditorium 106 at new IIS Building
Abstract

In contrast to conventional model-based acoustic signal processing, we formulate a given acoustic signal processing problem in a novel deep learning framework as finding a nonlinear mapping function between the observed signal and desired targets. Monte Carlo techniques are often required to generate a large collection of signal pairs in order to learn the often-complicated structure of the mapping functions. In the case of speech enhancement, to be able to handle a wide range of additive noises in real-world situations, a large training set, encompassing many possible combinations of speech and noise types, is first designed. Next deep neural network (DNN) architectures are employed as nonlinear regression functions to ensure a powerful approximation capability. In the case of source separation a similar simulation methodology can also be adopted. In the case of speech bandwidth expansion, the target wideband signals can be filtered and down-sampled to create the needed narrowband training examples. Finally in the case of acoustic de-reverberation, a wide variety of simulated room impulse responses are needed to generate a good training set.

When reconstructing the desired target signals, some additional techniques may be required to estimate noise, interfering speaker or missing phase information in order to enhance the quality of the synthesized signals. Experimental results demonstrate that the proposed framework can achieve significant improvements in both objective and subjective measures over the conventional techniques in speech enhancement, speech source separation, bandwidth expansion and voice conversion. It is also interesting to observe that the proposed DNN approach can also serve as an acoustic preprocessing front-end for robust speech recognition to improve performance with or without post-processing.