Crowdsourcing machine translation shows advantages of lower expense in money to collect the translated data. Yet, when compared with translation by trained professionals, results collected from non-professional translators might yield low-quality outputs. Ageneral solution for crowdsourcing practitioners is to employ a large amount of labor force to gather enough redundant data and then solicit from it. Actually we can further save money by avoid collecting bad translations. In this talk, I will present a scoring model that estimates the quality of Turkers by their authorities based on existing translation, and then stop hiring the unqualified Turkers. We bring both opportunities and risks in crowdsourced translation: we can make it cheaper than cheaper while we might suffer from quality loss. Our model captures the intuition that good translation and good workers are mutually reinforced iteratively. The empirical studies demonstrate the model can keep the performance while reduce work force and hence cut cost in terms of BLEU score, Pearson correlation and real money.
I am an Assistant Professor at Department of Statistics, National Cheng Kung University (NCKU). I received my Ph.D. degree from Graduate Institute of Networking and Multimedia, National Taiwan University.
Before joining NCKU, I was an Assistant Research Fellow at CITI, Academia Sinica. My research interests target at Social Networks, Data Mining, Social Media Analytics, and Internet-of-Things Applications on Urban Computing. Problems I aim to tackle are inspired by real-world applications with Big Data.