|
|
|
|
| |
|
Multimedia Processing and Application Labs
Principal Investigators:
[ Multimedia Processing and Application Labs (MPALs)]
Over the past few years, the Multimedia
Technologies Group has achieved accomplishments on addressing several key
problems and exciting applications, and meanwhile displayed impressive teamwork
for establishing itself as one of the leading teams in Taiwan. Our recent
research efforts are highlighted as follows.
(1) Multimedia Image and Video Processing
- Video Retrieval:
In content-based video retrieval area,
we have developed a video shot-based fast video clip retrieval system. We
make use of the statistics extracted at the shot level to execute fast
search. The results of this work has been published by IEEE Transactions
on Circuits and Systems for Video Technology (May issue, 2006). In addition
to the above work, we also calculate the trajectory of a moving object
at the shot level, and then use the trajectory to perform fast video retrieval.
The developed technique can be applied to general monitoring systems.
The core technology developed by our team has been transferred to the
Advanced Technology Center (ATC) of ITRI in Taiwan. In the future, we
shall put our emphasis on the development of more accurate retrieval algorithms.
- Representation
and Retrieval of 3D Graphical Models:
We are proposing a visual salience-guided
mesh decomposition strategy. Its main idea is based on a theory of part
salience, originated from cognitive psychology. The theory asserts that the
salience of a part is determined by three factors: the boundary strength,
the degree of protrusion, and the relative size of a part. Since the above-mentioned
factors are all conceptual, computational processes for modeling these factors
are required. We have thus developed a systematic way to conduct 3D mesh
decomposition based on visual salience. The paper describing our results of
salient component decomposition has been accepted by IEEE Transactions on
Multimedia (March 2006). In the future, we plan to apply this technique to
extracting significant components from a 3D mesh for 3D mesh retrieval.
- Video compression:
We study several coding techniques, such
as codecs based on matching pursuits, 3D wavelet transforms, and traditional
hybrid motion compensation approaches. We also study multimedia transmission
problems, such as transmitting in a packet loss environment. Our goal is
to improve current techniques, and produce a new coding representation and
techniques for the next generation of coders.
- Digital watermarking:
The technique is useful for digital right management,
where robustness is a critical issue affecting the practicability of the
watermarking system. However, the major disadvantage of known watermarking
methods is their limited resistance to extensive geometric attacks. We
propose a new robust image watermarking scheme that can withstand geometric
distortions and WEAs simultaneously. Extensive experimental results obtained
using the standard benchmark (i.e., Stirmark) and thorough comparisons with
state of the art technologies confirm the excellent performance of our
method in improving robustness. To our knowledge, such thorough evaluations
and comparisons have not been reported in the literature before. In addition,
we have proposed an asymmetric watermarking method, and demonstrated its
security and robustness for some attacks. Based on the method, we have
developed a number of application algorithms.
- Perceptual Hashing:
Perceptual hashing or digital fingerprinting
has been recognized as an alternative approach for many applications previously
accomplished with watermarking. In fact, perceptual hash is a compact representation
of media data. The major disadvantage of the existing media hashing technologies
is their limited resistance to geometric attacks. We propose a novel geometric
distortioninvariant image hashing scheme, which is capable of achieving
robustness against extensive geometric distortions (e.g., standard benchmarks,
Stirmark3.1). In addition, a sophisticated hash database for errorresilient
and fast matching is constructed. Our future work is to study its security
against forgery attack.
- Texture synthesis:
We study texture synthesize problems by analyzing
the performance of the patchwork-based algorithm, which has been used in
a wide variety of applications. Based on our analysis, we extend the algorithm
to multi-scale, multi-class and other applications.
(2) Multimedia Networking, Coding, and Transmission
- Error Resilient Video Encryption and Transmission:
Media encryption technologies actively
play the first line of defense in securing the access of multimedia data.
Traditional cryptographic encryption can achieve provable security but is
unfortunately sensitive to a single bit error, which will cause an unreliable
packet to be dropped to create packet loss. In order to achieve robust media
encryption, error resilience is considered an efficient strategy. We propose
an embedded block hash searching scheme at the decoder side to achieve motion
estimation for recovery of lost packets, while maintaining format compliance
and cryptographic provable security.
- Available Bandwidth
Estimation and Wireless TCP:
Available bandwidth is an important factor that can be used
to adapt the sending rate to network conditions, so that packet loss, caused
by congestion, can be significantly reduced before error control mechanisms
are employed. To this end, we propose a one-way delay jitter based scheme,
"JitterPath," for available bandwidth estimation without relying on common
assumptions, including uses of the fluid traffic model and bottleneck link
capacity. Extensive simulations and Internet experiments have been conducted,
and comparisons with other methods have been made to verify the effectiveness
of our method. Our ongoing work is to distinguish congestion loss based
on wireless loss based on the relationship between available bandwidth and
probing rate to develop a reliable wireless TCP.
- Video codec:
Conventional video coding standards, such
as MPEG-4 and H.264/ AVC usually perform motion estimation among successive
frames so that the encoder is typically more complex than the decoder.
However, this kind of architecture is not suitable for some emerging video
coding applications that need resourcelimited encoders (e.g. video sensor
networks and wireless mobile video communications). Based on the Wyner-Ziv
information theorem, distributed video coding systems (called Wyner-Ziv video
codec) shift part of the computational burden from the encoder to the decoder
and result in a kind of video codec with low-complexity encoder and high-complexity
decoder. However, this new coding paradigm still cannot be applied to applications
constrained to need both low-complexity encoder and decoder (e.g. wireless
mobile video communications). In view of this, we study a new media hash-based
lowcomplexity Wyner-Ziv video codec, where motioncompensated interpolation/extrapolation
and feedback channel are not required.
(3) Computer Vision and Pattern Recognition
- Real-time Event
Detection and Analysis:
The aim of the project is to design effective algorithms to
automatically detect, recognize, and analyze video objects and events.
In the past, we have developed a number of computer vision and pattern
recognition techniques that would be useful in supporting this challenging
research. In the past year, we have developed a trajectory-based real-time
event detection system. The developed system is able to conduct on-line
surveillance and the prototype system is ready for use. In the future,
we shall develop new algorithms which will result in better accuracy and
efficiency.
- Addressing Lighting
for Vision Problems:
Handling lighting is one of the most challenging problems in
computer vision. We proposed a method, generic intrinsic illumination
subspace, which can reduce lighting effects for objects of the same class
(eg. human face). This method learns a low-dimension subspace of the general
appearance space formed by images of all the lighting conditions of pose-fixed
objects of the same class. Then, this subspace is applied for lighting normalization.
The result was published in one of the most important conferences in computer
vision, ICCV 2005. In addition, we introduced a method that can do photometric
stereo under general lighting conditions with only four images.
- Object Tracking:
We proposed a method that uses object appearance information to assist object tracking. Object tracking can be treated as a state estimation problem of a dynamic system, and particle fi ltering is a typical method for object tracking. However, when the state space dimensionality is high, particle filtering usually results in drifting or local-minimum problem in tracking. Our method adds "attractors" in the state space, and uses attractors to assist tracking. We derived a particle fi ltering method that estimates the maximum a posteriori (MAP) solution when the transition probability is assumed to be a mixture distribution. The result was published in one of the most important conferences in computer vision, CVPR 2005. We have applied this method to 3D hand tracking and lip-contour tracking.

|
|