Page 155 - My FlipBook
P. 155
Research Fellow 王新民 研
究
Hsin-Min Wang 人
員
Ph.D., Electrical Engineering, National Taiwan University, Taiwan Faculty
T +886-2-2788-3799 ext. 1714 E whm@iis.sinica.edu.tw
F +886-2-2782-4814 W www.iis.sinica.edu.tw/pages/whm
・ Research Fellow, IIS, Academia Sinica (2010-present)
・ Deputy Director, Academia Sinica Center for Digital Cultures, Academia Sinica
(2013-present)
・ Professor (Joint Appointment), Department of Computer Science and Information
Engineering, National Cheng Kung University (2014-present)
・ Deputy Director, IIS, Academia Sinica (2011-2018)
・ Associate Research Fellow, IIS, Academia Sinica (2002-2010)
・ Assistant Research Fellow, IIS, Academia Sinica (1996-2002)
・ President, Association for Computational Linguistics and Chinese Language Processing
(2013-2015)
・ Editorial Board Member, IJCLCLP (2004-2016), JISE (2012-2016), APSIPA TSIP
(2014-present), IEEE/ACM TASLP (2016-2020)
Research Description
My research interests are in spoken language processing, natural language processing, and multimedia information retrieval. The research
goal is to develop methods for analyzing, extracting, recognizing, indexing, and retrieving information from audio data, with special
emphasis on speech and music. In the eld of speech, the recent research topics include discriminative autoencoders for speech and speaker
recognition, subspace-based models for phonotactic spoken language recognition, variational autoencoder-based voice conversion, audio-
visual speech enhancement, automatic speech quality assessment, autoencoder-based paragraph embeddings for spoken document
retrieval and summarization, and spoken question answering. We have implemented our own large vocabulary continuous speech
recognition (LVCSR) systems for Mandarin and Minnan. A Hakka LVCSR system will also be developed in the future. In the music eld, my
recent research has been focused mainly on acoustic-phonetic F0 modeling for vocal melody extraction, emotion-oriented pseudo song
prediction and matching for automatic music video generation, cover song identi cation, automatic generation of set lists for concert videos,
music transcription and source separation, automatic melody generation, and singing voice conversion.
1. Hung-Yi Lo, Ju-Chiang Wang, Hsin-Min Wang, and Shou-De Publications Brochure 2020
Lin, "Cost-sensitive multi-label learning for audio tag annotation
and retrieval," IEEE Trans. on Multimedia, 13(3), pp. 518-529, on Audio, Speech, and Language Processing, 24(11), pp. 1998 -
June 2011. 2008, November 2016.
2. Hung-Yi Lo, Shou-De Lin, and Hsin-Min Wang, "Generalized 7. Shih-Hung Liu, Kuan-Yu Chen, Yu-Lun Hsieh, Berlin Chen,
k-labelsets ensemble for multi-label and cost-sensitive Hsin-Min Wang, Hsu-Chun Yen, and Wen-Lian Hsu, "A position-
classification," IEEE Trans. on Knowledge and Data Engineering, aware language modeling framework for extractive broadcast
26(7), pp. 1679-1691, July 2014. news speech summarization," ACM Transactions on Asian and
Low-Resource Language Information Processing, 16(4), pp. 1-13,
3. Ju-Chiang Wang, Yi-Hsuan Yang, Hsin-Min Wang, and Shyh- Article 27, August 2017.
Kang Jeng, "Modeling the affective content of music with a
Gaussian mixture model," IEEE Trans. on Affective Computing, 8. Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, and Hsin-Min
6(1), pp. 56 - 68, March 2015. Wang, "An information distillation framework for extractive
summarization," IEEE/ACM Transactions on Audio, Speech, and
4. Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang, Language Processing, 26(1), pp. 161-170, January 2018.
Ea-Ee Jan, Wen-Lian Hsu, Hsin-Hsi Chen, "Extractive broadcast
news summarization leveraging recurrent neural network 9. Jen-Chun Lin, Wen-Li Wei, Tyng-Luh Liu, Yi-Hsuan Yang,
language modeling techniques," IEEE/ACM Trans. on Audio, Hsin-Min Wang, Hsiao-Rong Tyan, and Hong-Yuan Mark Liao,
Speech, and Language Processing, 23(8), pp. 1322-1334, August "Coherent deep-net fusion to classify shots in concert videos,"
2015. IEEE Transactions on Multimedia , 20(11), pp. 3123-3136,
November 2018.
5. Yu-Ren Chien, Hsin-Min Wang, and Shyh-Kang Jeng, "An
acoustic-phonetic model of F0 likelihood for vocal melody 10. Wen-Chin Huang, Hao Luo, Hsin-Te Hwang, Chen-Chou Lo,
extraction," IEEE/ACM Trans. on Audio, Speech, and Language Yu-Huai Peng, Yu Tsao, and Hsin-Min Wang, "Unsupervised
Processing, 23(9), pp. 1457-1468, September 2015. representation disentanglement using cross domain features
and adversarial learning in variational autoencoder based
6. Yu-Ren Chien, Hsin-Min Wang, and Shyh-Kang Jeng, voice conversion," accepted to appear in IEEE Transactions on
"Alignment of lyrics with accompanied singing audio based on Emerging Topics in Computational Intelligence.
acoustic-phonetic vowel likelihood modeling," IEEE/ACM Trans.
153
究
Hsin-Min Wang 人
員
Ph.D., Electrical Engineering, National Taiwan University, Taiwan Faculty
T +886-2-2788-3799 ext. 1714 E whm@iis.sinica.edu.tw
F +886-2-2782-4814 W www.iis.sinica.edu.tw/pages/whm
・ Research Fellow, IIS, Academia Sinica (2010-present)
・ Deputy Director, Academia Sinica Center for Digital Cultures, Academia Sinica
(2013-present)
・ Professor (Joint Appointment), Department of Computer Science and Information
Engineering, National Cheng Kung University (2014-present)
・ Deputy Director, IIS, Academia Sinica (2011-2018)
・ Associate Research Fellow, IIS, Academia Sinica (2002-2010)
・ Assistant Research Fellow, IIS, Academia Sinica (1996-2002)
・ President, Association for Computational Linguistics and Chinese Language Processing
(2013-2015)
・ Editorial Board Member, IJCLCLP (2004-2016), JISE (2012-2016), APSIPA TSIP
(2014-present), IEEE/ACM TASLP (2016-2020)
Research Description
My research interests are in spoken language processing, natural language processing, and multimedia information retrieval. The research
goal is to develop methods for analyzing, extracting, recognizing, indexing, and retrieving information from audio data, with special
emphasis on speech and music. In the eld of speech, the recent research topics include discriminative autoencoders for speech and speaker
recognition, subspace-based models for phonotactic spoken language recognition, variational autoencoder-based voice conversion, audio-
visual speech enhancement, automatic speech quality assessment, autoencoder-based paragraph embeddings for spoken document
retrieval and summarization, and spoken question answering. We have implemented our own large vocabulary continuous speech
recognition (LVCSR) systems for Mandarin and Minnan. A Hakka LVCSR system will also be developed in the future. In the music eld, my
recent research has been focused mainly on acoustic-phonetic F0 modeling for vocal melody extraction, emotion-oriented pseudo song
prediction and matching for automatic music video generation, cover song identi cation, automatic generation of set lists for concert videos,
music transcription and source separation, automatic melody generation, and singing voice conversion.
1. Hung-Yi Lo, Ju-Chiang Wang, Hsin-Min Wang, and Shou-De Publications Brochure 2020
Lin, "Cost-sensitive multi-label learning for audio tag annotation
and retrieval," IEEE Trans. on Multimedia, 13(3), pp. 518-529, on Audio, Speech, and Language Processing, 24(11), pp. 1998 -
June 2011. 2008, November 2016.
2. Hung-Yi Lo, Shou-De Lin, and Hsin-Min Wang, "Generalized 7. Shih-Hung Liu, Kuan-Yu Chen, Yu-Lun Hsieh, Berlin Chen,
k-labelsets ensemble for multi-label and cost-sensitive Hsin-Min Wang, Hsu-Chun Yen, and Wen-Lian Hsu, "A position-
classification," IEEE Trans. on Knowledge and Data Engineering, aware language modeling framework for extractive broadcast
26(7), pp. 1679-1691, July 2014. news speech summarization," ACM Transactions on Asian and
Low-Resource Language Information Processing, 16(4), pp. 1-13,
3. Ju-Chiang Wang, Yi-Hsuan Yang, Hsin-Min Wang, and Shyh- Article 27, August 2017.
Kang Jeng, "Modeling the affective content of music with a
Gaussian mixture model," IEEE Trans. on Affective Computing, 8. Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, and Hsin-Min
6(1), pp. 56 - 68, March 2015. Wang, "An information distillation framework for extractive
summarization," IEEE/ACM Transactions on Audio, Speech, and
4. Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang, Language Processing, 26(1), pp. 161-170, January 2018.
Ea-Ee Jan, Wen-Lian Hsu, Hsin-Hsi Chen, "Extractive broadcast
news summarization leveraging recurrent neural network 9. Jen-Chun Lin, Wen-Li Wei, Tyng-Luh Liu, Yi-Hsuan Yang,
language modeling techniques," IEEE/ACM Trans. on Audio, Hsin-Min Wang, Hsiao-Rong Tyan, and Hong-Yuan Mark Liao,
Speech, and Language Processing, 23(8), pp. 1322-1334, August "Coherent deep-net fusion to classify shots in concert videos,"
2015. IEEE Transactions on Multimedia , 20(11), pp. 3123-3136,
November 2018.
5. Yu-Ren Chien, Hsin-Min Wang, and Shyh-Kang Jeng, "An
acoustic-phonetic model of F0 likelihood for vocal melody 10. Wen-Chin Huang, Hao Luo, Hsin-Te Hwang, Chen-Chou Lo,
extraction," IEEE/ACM Trans. on Audio, Speech, and Language Yu-Huai Peng, Yu Tsao, and Hsin-Min Wang, "Unsupervised
Processing, 23(9), pp. 1457-1468, September 2015. representation disentanglement using cross domain features
and adversarial learning in variational autoencoder based
6. Yu-Ren Chien, Hsin-Min Wang, and Shyh-Kang Jeng, voice conversion," accepted to appear in IEEE Transactions on
"Alignment of lyrics with accompanied singing audio based on Emerging Topics in Computational Intelligence.
acoustic-phonetic vowel likelihood modeling," IEEE/ACM Trans.
153