Institute of Information Science
Multimedia Technologies Laboratory

Principal Investigators:
:::Hong-Yuan Liao (Chair) :::Chu-Song Chen :::Wen-Liang Hwang :::Chun-Shien Lu
:::Tyng-Luh Liu :::Su Li

[Group Profile]
Multimedia technology is considered to be one of the three most promising industries of the twenty-first century, along with biotechnology and nanotechnology. Indeed, over the past two decades, we have witnessed how multimedia technology can influence various aspects of our daily life. Its wide spectrum of applications naturally calls for constant development of a broad range of multimedia techniques, including those related to music, video, image, text, or 3D animation.

The main research interests of the members of the Multimedia Technology Group at IIS include multimedia signal processing, computer vision, and machine learning. In addition to the research interests of each individual investigator, our group is engaged in joint research activity, which can be best characterized by two ongoing major projects: 1) Integration of Video and Audio Intelligence for Multimedia Applications, and 2) Compressive Sensing and Sparse Representation. These joint projects are described in detail below.

A. Integration of Video and Audio Intelligence for Multimedia Applications
This project concerns our research efforts to explore new multimedia techniques and applications that require the fusion of video and audio intelligence. Specifically, we are considering how a system might first extract key emotion elements from a short music clip, and then carry out emotion transfer to the key targets in a video sequence. Accomplishing such a task would require at least three core techniques. First, we would need to process the video sequence to have access to the geometric and appearance information pertaining to meaningful and representative targets. Second, we need a systematic way to reliably classify and identify important emotions from the music. Third, to complete the emotion transfer, we are considering using computer graphic methods to manipulate the video targets according to the extracted music emotions.

The main challenges in carrying out this project are threefold. First, to efficiently extract and then manipulate video targets from an RGB-based video sequence is a very challenging task. We are addressing the information gap between 2D and 3D by converting a 2D video target sequence into its corresponding 3D skeleton sequence, allowing systematic motion manipulation. Second, we need to establish a well-defined formulation to “quantize” a given piece of music. That is, from an arbitrary music excerpt, we should be able to “calculate” its corresponding emotion intensity and tempo. Third, as manipulation of the video target is based on the corresponding 3D skeleton sequence, increasing its visual impact will require seamless incorporation of 3D texture synthesis accounting for the varying emotion intensity and tempo of the intended music into the proposed system.

B. Compressive Sensing and Sparse Representation
We have already made several significant accomplishments through this project. To solve the signal separation problem, we designed a re-weighting scheme that appreciably outperforms other methods. We also addressed a closely-related problem of analysis operator learning, and introduced a two-stage iterative method with stage-wise closed-form solutions to accomplish the learning task. In addition, our research output for speeding up the sparse fast Fourier transform (sFFT) is promising. We propose a novel sFFT that exploits downsampling in the time domain. The resulting algorithm is more efficient and easy to implement, while yielding comparable results to those obtained with the original sFFT.

Through our ongoing research, we will continue to investigate the aforementioned topics more thoroughly. In particular, we intend to focus on:

1. Studying the signal separation problem, by leveraging with an analytical model that allows each component signal to be sparsely analyzed by only one dictionary. We are also considering the more general problem of structure sparse representation, and expect to gain a deeper understanding of its proper modeling.

2. Extending analysis operator learning to supervised learning to yield more effective features. At the same time, we are addressing how to formulate a unified approach to both analysis and synthesis operator learning.

Studying how to sample Fourier measurements directly within the framework of compressive sensing for fast recovery. As the theoretical recovery bound and practical performance for sparse signal recovery algorithms are still not consistent, our research aims to close the gap between them.
Figure: We extract the joint positions and movements from a clip (shown in the upper-left and lower-left figures). The information is used to construct a complete body skeleton (shown in right figure). It is thus feasible to manipulate the original pose by leveraging with inverse kinematics techniques.


Academia Sinica Institue of Information Science Academia Sinica