Page 27 - 2017 Brochure
P. 27
and idea called cross-batch reference (CBR), which can enhance the training Research Description
of CNNs. The results have been published as a long paper in ACM MM,
2016. Our ongoing research also includes multi-modal depth learning for Hong-Yuan Mark Liao
audiovisual speech enhancement and user identification. In the future, we will
develop effective approaches to analyze characters in a movie and mine their Distinguished Research Fellow
relationships by combining image and natural-language information.
2. Crowd behavior analysis: A crowd can remain motionless or move with respect Chu-Song Chen
to time. When a crowd moves, it moves in a non-rigid manner, or may form
sub-groups of people. Therefore, effective analysis of crowd behavior is a non- Research Fellow
trivial task. We aim to propose heuristic definitions for characterizing a crowd.
Based on these criteria, we would propose various deep-learning modules to Wen-Liang Hwang
construct a complete framework for analyzing crowd behaviors, including sub-
group detection, sub-group merging, abnormal behavior detection and other Research Fellow
traits.
3. Music information retrieval: most descriptions of music apply at multiple Tyng-Luh Liu
granularities, since music signals are constructed with multiple instruments,
hierarchical meter structure and mixed genres. The problem of music Research Fellow
information retrieval could therefore be considered in deep neural networks
(DNNs) with a multi-task learning (MTL) setting. MTL-based DNNs have Chun-Shien Lu
previously been found to be applicable in musical chord recognition. These
networks can recognize a chord and root note in parallel, suggesting that the Research Fellow
technology may be extendable to other musical information as well.
Li Su

Assistant Research Fellow

Hsin-Min Wang

Research Fellow

Figure: SSDH takes inputs from images and learns image representations, binary
codes, and classification through the optimization of an objective function that
combines a classification loss with desirable properties of hash codes. The learned
codes preserve the semantic similarity between images and are compact for image
search.

25
   22   23   24   25   26   27   28   29   30   31   32