My research interests include spoken language processing, natural language processing, multimedia information retrieval, machine learning, and pattern recognition. The research goal is to develop methods for analyzing, extracting, recognizing, indexing, and retrieving information from audio data, with special emphasis on speech and music.
In the field of speech, research has been focused mainly on speech recognition, speaker recognition, spoken language recognition, voice conversion, and spoken document retrieval/summarization. Our recent achievements include locally linear embedding-based approaches for voice conversion and post-filtering, discriminative autoencoders for speech/speaker recognition, and novel paragraph embedding methods for spoken document retrieval/summarization. Our ongoing research includes audio-visual speaker recognition and speech enhancement, subspace neural networks for spoken language/dialect/accent recognition, many-to-one/non-parallel voice conversion, and neural network-based spoken document retrieval/summarization and question answering.
In the music field, research has been focused mainly on vocal melody extraction and automatic generation of music video. Our recent achievements in this field include an acoustic-phonetic F0 modeling framework for vocal melody extraction and an emotion-oriented pseudo song prediction and matching framework for automatic music video generation. We have successfully implemented a complete automatic music video generation system that can automatically edit a long user-generated video into a music-compliant short professional-like video. Our ongoing research includes continuous improvement of our own technologies and systems, cover song identification, and automatic generation of set list for concert video, so as to facilitate the management and retrieval of a large music database. Future research directions also include singing voice synthesis, speech to singing voice conversion, and music structure analysis/summarization.