Institute of Information Science
 
research
  :::Print :::Chinese :::Site Map :::Home
 
Multimedia Processing and Application Labs

Principal Investigators:

Chu-Song Chen Meng-Chang Chen Wen-Liang Hwang Hong-Yuan Liao Tyng-Luh Liu
Chun-Shieh Lu Chun-Chieh Shih

[ Multimedia Processing and Application Labs (MPALs)]

Over the past few years, the Multimedia Technologies Group has achieved accomplishments on addressing several key problems and exciting applications, and meanwhile displayed impressive teamwork for establishing itself as one of the leading teams in Taiwan. Our recent research efforts are highlighted as follows.


(1) Multimedia Image and Video Processing
  • Video Retrieval:
    In content-based video retrieval area, we have developed a video shot-based fast video clip retrieval system. We make use of the statistics extracted at the shot level to execute fast search. The results of this work has been published by IEEE Transactions on Circuits and Systems for Video Technology (May issue, 2006). In addition to the above work, we also calculate the trajectory of a moving object at the shot level, and then use the trajectory to perform fast video retrieval. The developed technique can be applied to general monitoring systems. The core technology developed by our team has been transferred to the Advanced Technology Center (ATC) of ITRI in Taiwan. In the future, we shall put our emphasis on the development of more accurate retrieval algorithms.
  • Representation and Retrieval of 3D Graphical Models:
    We are proposing a visual salience-guided mesh decomposition strategy. Its main idea is based on a theory of part salience, originated from cognitive psychology. The theory asserts that the salience of a part is determined by three factors: the boundary strength, the degree of protrusion, and the relative size of a part. Since the above-mentioned factors are all conceptual, computational processes for modeling these factors are required. We have thus developed a systematic way to conduct 3D mesh decomposition based on visual salience. The paper describing our results of salient component decomposition has been accepted by IEEE Transactions on Multimedia (March 2006). In the future, we plan to apply this technique to extracting significant components from a 3D mesh for 3D mesh retrieval.
  • Video compression:
    We study several coding techniques, such as codecs based on matching pursuits, 3D wavelet transforms, and traditional hybrid motion compensation approaches. We also study multimedia transmission problems, such as transmitting in a packet loss environment. Our goal is to improve current techniques, and produce a new coding representation and techniques for the next generation of coders.
  • Digital watermarking:
    The technique is useful for digital right management, where robustness is a critical issue affecting the practicability of the watermarking system. However, the major disadvantage of known watermarking methods is their limited resistance to extensive geometric attacks. We propose a new robust image watermarking scheme that can withstand geometric distortions and WEAs simultaneously. Extensive experimental results obtained using the standard benchmark (i.e., Stirmark) and thorough comparisons with state of the art technologies confirm the excellent performance of our method in improving robustness. To our knowledge, such thorough evaluations and comparisons have not been reported in the literature before. In addition, we have proposed an asymmetric watermarking method, and demonstrated its security and robustness for some attacks. Based on the method, we have developed a number of application algorithms.
  • Perceptual Hashing:
    Perceptual hashing or digital fingerprinting has been recognized as an alternative approach for many applications previously accomplished with watermarking. In fact, perceptual hash is a compact representation of media data. The major disadvantage of the existing media hashing technologies is their limited resistance to geometric attacks. We propose a novel geometric distortioninvariant image hashing scheme, which is capable of achieving robustness against extensive geometric distortions (e.g., standard benchmarks, Stirmark3.1). In addition, a sophisticated hash database for errorresilient and fast matching is constructed. Our future work is to study its security against forgery attack.
  • Texture synthesis:
    We study texture synthesize problems by analyzing the performance of the patchwork-based algorithm, which has been used in a wide variety of applications. Based on our analysis, we extend the algorithm to multi-scale, multi-class and other applications.
(2) Multimedia Networking, Coding, and Transmission
  • Error Resilient Video Encryption and Transmission:
    Media encryption technologies actively play the first line of defense in securing the access of multimedia data. Traditional cryptographic encryption can achieve provable security but is unfortunately sensitive to a single bit error, which will cause an unreliable packet to be dropped to create packet loss. In order to achieve robust media encryption, error resilience is considered an efficient strategy. We propose an embedded block hash searching scheme at the decoder side to achieve motion estimation for recovery of lost packets, while maintaining format compliance and cryptographic provable security.
  • Available Bandwidth Estimation and Wireless TCP:
    Available bandwidth is an important factor that can be used to adapt the sending rate to network conditions, so that packet loss, caused by congestion, can be significantly reduced before error control mechanisms are employed. To this end, we propose a one-way delay jitter based scheme, "JitterPath," for available bandwidth estimation without relying on common assumptions, including uses of the fluid traffic model and bottleneck link capacity. Extensive simulations and Internet experiments have been conducted, and comparisons with other methods have been made to verify the effectiveness of our method. Our ongoing work is to distinguish congestion loss based on wireless loss based on the relationship between available bandwidth and probing rate to develop a reliable wireless TCP.
  • Video codec:
    Conventional video coding standards, such as MPEG-4 and H.264/ AVC usually perform motion estimation among successive frames so that the encoder is typically more complex than the decoder. However, this kind of architecture is not suitable for some emerging video coding applications that need resourcelimited encoders (e.g. video sensor networks and wireless mobile video communications). Based on the Wyner-Ziv information theorem, distributed video coding systems (called Wyner-Ziv video codec) shift part of the computational burden from the encoder to the decoder and result in a kind of video codec with low-complexity encoder and high-complexity decoder. However, this new coding paradigm still cannot be applied to applications constrained to need both low-complexity encoder and decoder (e.g. wireless mobile video communications). In view of this, we study a new media hash-based lowcomplexity Wyner-Ziv video codec, where motioncompensated interpolation/extrapolation and feedback channel are not required.

(3) Computer Vision and Pattern Recognition
  • Real-time Event Detection and Analysis:
    The aim of the project is to design effective algorithms to automatically detect, recognize, and analyze video objects and events. In the past, we have developed a number of computer vision and pattern recognition techniques that would be useful in supporting this challenging research. In the past year, we have developed a trajectory-based real-time event detection system. The developed system is able to conduct on-line surveillance and the prototype system is ready for use. In the future, we shall develop new algorithms which will result in better accuracy and efficiency.
  • Addressing Lighting for Vision Problems:
    Handling lighting is one of the most challenging problems in computer vision. We proposed a method, generic intrinsic illumination subspace, which can reduce lighting effects for objects of the same class (eg. human face). This method learns a low-dimension subspace of the general appearance space formed by images of all the lighting conditions of pose-fixed objects of the same class. Then, this subspace is applied for lighting normalization. The result was published in one of the most important conferences in computer vision, ICCV 2005. In addition, we introduced a method that can do photometric stereo under general lighting conditions with only four images.
  • Object Tracking:
    We proposed a method that uses object appearance information to assist object tracking. Object tracking can be treated as a state estimation problem of a dynamic system, and particle fi ltering is a typical method for object tracking. However, when the state space dimensionality is high, particle filtering usually results in drifting or local-minimum problem in tracking. Our method adds "attractors" in the state space, and uses attractors to assist tracking. We derived a particle fi ltering method that estimates the maximum a posteriori (MAP) solution when the transition probability is assumed to be a mixture distribution. The result was published in one of the most important conferences in computer vision, CVPR 2005. We have applied this method to 3D hand tracking and lip-contour tracking.
TOP
 
 
 
 
space
Academia Sinica Institue of Information Science