Most conventional speech assessment metrics require a golden clean reference to calculate the evaluation score. Such a scenario has limited applicability in real-world scenarios since clean reference is not always accessible. To address this limitation, non-intrusive speech assessment metrics have caught great attention in recent years. Recently, with the emergence of the deep learning model and the availability of training data, many studies have involved the deep learning model to deploy a non-intrusive speech assessment model. However, despite the good performance achieved by the deep learning-based speech assessment model, the generalization of the model remains a challenge. In this talk, we would like to introduce several approaches to improve the generalization of the deep learning-based speech assessment model. Additionally, we aim to introduce the direct integration between deep learning-based speech assessment models and speech enhancement systems.
Dr. Ryandhimas E. Zezario received a Ph.D. degree in Computer Science and Information Engineering from National Taiwan University in 2023. He is currently a Postdoctoral Researcher at the Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan. He was awarded the Gold Prize for the best non-intrusive system and 1st place for the Hearing Industry Research Consortium student prizes at the Clarity Prediction Challenge 2022. His research interests include speech enhancement, non-intrusive quality assessment, speech processing, speech/speaker recognition.