Page 124 - My FlipBook
P. 124
八
實驗
室
Multimedia Technologies Laboratory
Research Laboratories Research Faculty Multimedia technology is considered one of the three most promising industries of the twenty-
first century, along with biotechnology and nanotechnology. We have all witnessed how multimedia
Li Su / Chair technology has influenced various aspects of our daily lives over the past two decades. Given its
wide spectrum of applications, there is a constant challenge to advance a broad range of multimedia
Assistant Research techniques, including those related to music, video, images, text, and 3D animation.
Fellow
The main research interests of the Multimedia Technology Group members at IIS include multimedia signal
Hsin-Min Wang processing, computer vision and machine learning. Apart from focusing on their own research interests,
joint research activity by our group is best characterized by two ongoing major projects: I. Integration
Research Fellow of Video and Audio Intelligence for Multimedia Applications; and II. Deep Learning for Multimedia
Information Processing.
Chun-Shien Lu
I. Integration of Video and Audio Intelligence for Multimedia Applications
Research Fellow
This project explores new multimedia techniques and applications requiring fusion of video and audio
Jen-Chun Lin intelligence. Through the prevalence of mobile devices, people can now easily lm a live concert and
create video clips of speci c parts. Popular websites such as YouTube or Vimeo have further boosted
Assistant Research this phenomenon, as data sharing becomes easier. Videos of this kind, recorded by audience members
Fellow at different locations, can provide those who could not attend the event the opportunity to enjoy
the same performance. However, the viewing experience is usually unpleasant since video-capture is
Chu-Song Chen not coordinated and incompleteness or redundancy frequently occurs. To ensure a pleasant viewing/
listening experience, e ective fusing of such videos through a smooth "decoration" process to generate
Research Fellow a single near-professional audio/visual stream is indispensable.
Wen-Liang Hwang Video mashup, an emerging research topic in multimedia, can satisfy the above-mentioned needs. A
successful mashup process has to deal with all videos captured at di erent locations and convert them
Research Fellow into a complete, non-overlapping, seamless, and high-quality outcome. To make a concert video mashup
process successful, we propose to address the following issues: (1) we are exploring the relationship
Mark Liao between music and visual storytelling of shots to make the mashup outcome strike a deep chord; (2)
we are solving the problem of video clip order from visual and/or auditory aspects; (3) we are studying
Distinguished generative models such as GAN in depth to improve video/audio quality; and (4) we are investigating
Research Fellow how to improve model learning to be compact yet accurate, thereby making mashup systems easier to
deploy. For example, how to automatically generate a near-professional shot-type sequence from music
Tyng-Luh Liu is a challenging but desirable task. The hardest element of doing that is the di culty in establishing
and modeling the relationship between music and shots. Motivated by the concept of storyboards, we
Research Fellow have created a probabilistic-based ensemble model that integrates information at di erent temporal
resolutions to encode the relationship between music and shots, as shown in the gure below. To make
Multimedia shapes our future the model e cient, we are further proposing a model distillation technique for learning a lightweight
classifier through collaborative training with the ensemble model. A video composed in this way is
attractive, as the music and visual storytelling blend naturally. We anticipate that the system could also
be used in practical applications.
II. Deep Learning for Multimedia Information Processing
Deep learning-related research has attracted considerable attention in recent years owing to its
effectiveness for solving various challenging tasks in computer science. In the field of Multimedia
Information Processing, deep learning opens up brand new opportunities for both conventional
and modern research topics. In the next few years, we aim to rigorously re-formulate or better solve
emerging multimedia-related problems within the context of deep learning. These efforts can be
highlighted by the following three collaborative projects:
1. Un-rectified neural networks: The main obstacle to analyzing deep neural networks lies in the
composition of non-linear activation functions. We are establishing an "un-rectifying" technique that
replaces point-wise, piece-wise linear activation functions in neural networks with a nite number
122
實驗
室
Multimedia Technologies Laboratory
Research Laboratories Research Faculty Multimedia technology is considered one of the three most promising industries of the twenty-
first century, along with biotechnology and nanotechnology. We have all witnessed how multimedia
Li Su / Chair technology has influenced various aspects of our daily lives over the past two decades. Given its
wide spectrum of applications, there is a constant challenge to advance a broad range of multimedia
Assistant Research techniques, including those related to music, video, images, text, and 3D animation.
Fellow
The main research interests of the Multimedia Technology Group members at IIS include multimedia signal
Hsin-Min Wang processing, computer vision and machine learning. Apart from focusing on their own research interests,
joint research activity by our group is best characterized by two ongoing major projects: I. Integration
Research Fellow of Video and Audio Intelligence for Multimedia Applications; and II. Deep Learning for Multimedia
Information Processing.
Chun-Shien Lu
I. Integration of Video and Audio Intelligence for Multimedia Applications
Research Fellow
This project explores new multimedia techniques and applications requiring fusion of video and audio
Jen-Chun Lin intelligence. Through the prevalence of mobile devices, people can now easily lm a live concert and
create video clips of speci c parts. Popular websites such as YouTube or Vimeo have further boosted
Assistant Research this phenomenon, as data sharing becomes easier. Videos of this kind, recorded by audience members
Fellow at different locations, can provide those who could not attend the event the opportunity to enjoy
the same performance. However, the viewing experience is usually unpleasant since video-capture is
Chu-Song Chen not coordinated and incompleteness or redundancy frequently occurs. To ensure a pleasant viewing/
listening experience, e ective fusing of such videos through a smooth "decoration" process to generate
Research Fellow a single near-professional audio/visual stream is indispensable.
Wen-Liang Hwang Video mashup, an emerging research topic in multimedia, can satisfy the above-mentioned needs. A
successful mashup process has to deal with all videos captured at di erent locations and convert them
Research Fellow into a complete, non-overlapping, seamless, and high-quality outcome. To make a concert video mashup
process successful, we propose to address the following issues: (1) we are exploring the relationship
Mark Liao between music and visual storytelling of shots to make the mashup outcome strike a deep chord; (2)
we are solving the problem of video clip order from visual and/or auditory aspects; (3) we are studying
Distinguished generative models such as GAN in depth to improve video/audio quality; and (4) we are investigating
Research Fellow how to improve model learning to be compact yet accurate, thereby making mashup systems easier to
deploy. For example, how to automatically generate a near-professional shot-type sequence from music
Tyng-Luh Liu is a challenging but desirable task. The hardest element of doing that is the di culty in establishing
and modeling the relationship between music and shots. Motivated by the concept of storyboards, we
Research Fellow have created a probabilistic-based ensemble model that integrates information at di erent temporal
resolutions to encode the relationship between music and shots, as shown in the gure below. To make
Multimedia shapes our future the model e cient, we are further proposing a model distillation technique for learning a lightweight
classifier through collaborative training with the ensemble model. A video composed in this way is
attractive, as the music and visual storytelling blend naturally. We anticipate that the system could also
be used in practical applications.
II. Deep Learning for Multimedia Information Processing
Deep learning-related research has attracted considerable attention in recent years owing to its
effectiveness for solving various challenging tasks in computer science. In the field of Multimedia
Information Processing, deep learning opens up brand new opportunities for both conventional
and modern research topics. In the next few years, we aim to rigorously re-formulate or better solve
emerging multimedia-related problems within the context of deep learning. These efforts can be
highlighted by the following three collaborative projects:
1. Un-rectified neural networks: The main obstacle to analyzing deep neural networks lies in the
composition of non-linear activation functions. We are establishing an "un-rectifying" technique that
replaces point-wise, piece-wise linear activation functions in neural networks with a nite number
122