A research team led by Dr. Hsin-Min Wang, Research Fellow at the Institute of Information Science and Drs. Yi-Hsuan Yang and Yen-Yu Lin, both Assistant Research Fellows at the Research Center for Information Technology Innovation, has been awarded the first prize at the top international multimedia conference, ACM Multimedia 2012. The team was awarded the Multimedia Grand Challenge first prize for their research entitled “Acousticvisual Emotion Guassians Model for Automatic Generation of Music Video”. In recent years ACM Multimedia has been regarded as one of the most important international and multimedia technology activities in both the academic and industrial sectors. This year the conference was held from October 29 to November 2 in Nara, Japan and was attended by more than 600 experts.
The Multimedia Grand Challenge was first presented as part of the ACM (Association of Computing Machinery) Multimedia conference in 2009 and has established itself as a prestigious competition in the multimedia community. The Grand Challenge poses a set of problems and issues from leading industry brands such as Hewlett Packard (HP), NHK (Japan Broadcasting Corporation) and Google, intended to challenge the multimedia research community to solve relevant, interesting and difficult questions looming on the industry’s 3-5 year horizon. The challenges this year included “Automatic Music Video Generation” by Google, “Realistic Interaction In Online Virtual Environments” by 3DLife, “Understanding the Emotional Impact of Images and Videos” by HP, “Where is beauty? Video Segment Extraction Based on Aesthetic Quality Assessment” by NHK, “Event Understanding through Social Media and its Text-Visual Summarization” by NTT (Nippon Telegraph and Telephone Corporation), and “Audiovisual Recognition of Specific Events” by Technicolor.
The team from Academia Sinica composed of Mr. Ju-Chiang Wang, Mr. I-Hong Jhuo and Dr. Hsin-Min Wang from the Institute of Information Science; and Dr. Yi-Hsuan Yang from the Research Center for Information Technology Innovation submitted an entry entitled “Acousticvisual Emotion Guassians Model for Automatic Generation of Music Video” in response to the challenge proposed by Google. Their solution was awarded the first prize from among 17 finalists all of who came from outstanding international academic affiliations, such as France’s National Institute for Research in Computer Science and Control (INRIA), the University of Illinois at Urbana-Champaign (UIUC), Institut Eurecom, Telecom ParisTech, Tsinghua University, Delft University of Technology, Chinese University of Hong Kong, and the Chinese Academy of Sciences.
The winning solution utilizes the perceived emotion of multimedia content as a bridge to connect music and video. Specifically, Acousticvisual Emotion Guassians (AVEG), a novel machine learning framework jointly learns the tripartite relationship among music, video, and emotion from an emotion-annotated corpus of music videos. The AVEG model is applied to a piece of music (or a video sequence) to predict its emotion distribution in a continuous emotion space from the corresponding low-level acoustic features (or visual). As AVEG is flexible, the model can be personalized via online model adaptation to achieve a user-centered scenario. Finally, music and video are matched by measuring the similarity between the two corresponding emotion distributions, based on a distance measure such as Kullback-Leibler (KL) divergence. The system is effective and efficient and will likely be incorporated into mobile devices in the near future.
ACM Multimedia is an international event that is held in a different country each year. It brings together multimedia experts and practitioners from across the academic and industrial sectors. ACM Multimedia has a highly selective review process with only a 20% acceptance rate for long papers.