Chinese
English
Research Fellow  |  Wang, Hsin-Min  
 
contact
vita
education
experience
interests
descriptions
activities
invited_talk
invited_visit
honors
publications
others
supervised
lab (New window)
 
 
 
 
 
Research Descriptions
 

My research interests include speech processing, natural language processing, multimedia information retrieval, machine learning, and pattern recognition. The research goal is to develop methods for analyzing, extracting, recognizing, indexing, and retrieving information from audio data, with special emphasis on speech and music.

In the field of speech, research has been focused mainly on speaker recognition, spoken language recognition, voice conversion, and spoken document retrieval/summarization. The recent achievements include a new maximum mutual information-based framework for GMM-based voice conversion, subspace-based spoken language identification, and i-vector-based language modeling for spoken document retrieval. The ongoing research includes language modeling for speech recognition/document classification/information retrieval, subspace-based speaker/spoken language recognition, discriminative training for GMM-based voice conversion, and expressive speech synthesis.  

In the music field, research has been focused mainly on vocal melody extraction, automatic music tagging, music emotion recognition, and music search. The recent achievements in this field include a novel cost-sensitive multi-label (CSML) learning framework for music tagging, a novel query by multiple tags with multiple levels of preference (denoted as an MTML query) scenario and a corresponding tag cloud-based query interface for music search, and an acoustic emotion Gaussians model for emotion-based music annotation and retrieval. Our extended work on acoustic visual emotion Gaussians modeling for automatic music video generation won the ACM Multimedia 2012 Grand Challenge First Prize. The ongoing research includes continuous improvement of our own technologies and systems, audio feature analysis, semantic visualization of music tags, and vocal separation, so as to facilitate the management and retrieval of a large music database. Future research directions also include singing voice synthesis, context-aware music retrieval/recommendation, and music structure analysis/summarization.

 
 
bg