Institute of Information Science, Academia Sinica

Library

Print

2009 Technical Report

Code
Subject / Author / Abstract
View

Code

TR-IIS-09-001

Subject / Author / Abstract

Playing GWAP With Strategies
Ling-Jyh Chen, Bo-Chun Wang, Kuan-Ta Chen

“Human Computation” represents a new paradigm of applications that exploit people’s desire to be entertained by outsourcing certain steps of the computational process to the players. Such applications also produce useful metadata as a by-product. Games With A Purpose (GWAP) demonstrate the potential of human computation to solve a variety of problems that computer computation cannot currently resolve completely. In this paper, we propose a metric, called system gain, for evaluating the performance of human computation systems, and also use analysis to study the properties of GWAP systems. We argue that GWAP systems should be played with strategies. Therefore, based on our analysis, we implement an Optimal Puzzle Selection Strategy (OPSA) to improve human computation. Using a comprehensive set of simulations, we demonstrate that the proposed OPSA approach can effectively improve the system gain of GWAP systems, as long as the number of puzzles in the system is sufficiently large.

View

Fulltext

Code

TR-IIS-09-002

Subject / Author / Abstract

Tree Decomposition for Large-Scale SVM Problems: Experimental and Theoretical Results
Fu Chang, Chien-Yang Guo, Xiao-Rong Lin, Chi-Jen Lu

To handle problems created by large data sets, we propose a method that uses a de-cision tree to decompose a data space and trains SVMs on the decomposed regions. Al-though there are other means of decomposing a data space, we show that the decision tree has several merits for large-scale SVM training. First, it can classify some data points by its own means, thereby reducing the cost of SVM training applied to the remaining data points. Second, it is efficient for seeking the parameter values that maximize the valida-tion accuracy, which helps maintain good test accuracy. Third, we can provide a genera-lization error bound for the classifier derived by the tree decomposition method. For ex-periment data sets whose size can be handled by current non-linear, or kernel-based SVM training techniques, the proposed method can speed up the training by a factor of thou-sands, and still achieve comparable test accuracy.

View

Fulltext

Code

TR-IIS-09-003

Subject / Author / Abstract

A Binarization Method with Learning-Built Rules for Document Images Produced by Cameras
Chien-Hsing Chou, Wen-Hsiung Lin, and Fu Chang

In this paper, we propose a novel binarization method for document images produced by cameras. Such images often have varying degrees of brightness and require more careful treatment than merely applying a statistical method to obtain a threshold value. To resolve the problem, our method divides an image into several regions and decides how to binarize each region. The decision rules are derived from a learning process that takes training images as input. Tests on images produced under normal and inadequate illumination conditions show that our method yields better visual quality and better OCR performance than three global binarization methods and four locally adaptive binarization methods.

View

Fulltext

Code

TR-IIS-09-004

Subject / Author / Abstract

Power-Rate-Distortion Optimized Resource-Scalable Low-Complexity Video Coding in Wireless Multimedia Sensor Networks
Li-Wei Kang and Chun-Shien Lu

Wireless multimedia sensor networks (WMSNs) have been potentially applicable for several emerging applications. However, the available resources, i.e., power and rate, of visual sensors in a WMSN are very limited. Hence, it is important but challenging to achieve efficient resource allocation and optimal video compression while maximizing the overall network lifetime. In this paper, a power-rate-distortion (PRD) optimized resource-scalable low-complexity multiview video encoding scheme is proposed. In our video encoder, both the temporal and interview information can be efficiently exploited based on the comparisons of extracted media hashes without performing motion and disparity estimations, which are known to be time-consuming. We present a PRD model to characterize the relationship between the available resources and the RD performance of our encoder. More specifically, an RD function in terms of the percentages for different coding modes of blocks and the target bit rate under the available resource constraints is derived for optimal coding mode decision. Analytic results are provided to verify the resource scalability and accuracy of our PRD model, which can provide a theoretical guideline for performance optimization under limited resource constraints. Both the analytic and simulation results have shown the applicability of our video coding scheme for WMSNs.

View

Fulltext

Code

TR-IIS-09-005

Subject / Author / Abstract

Protocols for Secure Multi-party Computation: Design, Implementation and Performance Evaluation
I-Cheng Wang, Kung Chen, Tsan-sheng Hsu, Churn-Jung Liau, Chih-Hao Shen, and Da-Wei Wang

Protocols for secure multi-party computation allow participants to share a computation while each party learns only what can be inferred from their own inputs and the output of the computation. However, the execution time of a secure protocol may be too high so that it is not practical unless some tradeoffs being made between data access and confidentiality. In this technical report, we propose a set of information theoretically secure protocols based on scalar product protocol and aim to provide some empirical basis for making such tradeoffs in computing exponentiation. A detailed performance evaluation was carried out by taking advantage of the compositional nature of our protocols. We come up with a time function which provides good prediction of the execution time of the proposed exponentiation protocols based on the execution time of scalar products. Using the time function, we obtain several interesting tradeoffs between execution time and privacy. In particular, compromising some private information enables a reduction in the execution time from years, if not centuries, to days or even minutes. Based on our results, we argue that there are indeed reasonable tradeoffs between privacy and execution time. Furthermore, our study indicates that a system intelligently offering users possible tradeoff options will make secure multi-party computation a more attractive approach to enhance privacy in practice.

View

Fulltext

Code

TR-IIS-09-006

Subject / Author / Abstract

Data-bandwidth-aware Job Scheduling Techniques in Distributed Systems
De-Yu Chen, Pangfeng Liu, Jan-Jan Wu

This paper introduces techniques in scheduling jobs on a master/workers platform where the bandwidth is shared by all workers. The jobs are independent and each job requires a fixed amount of bandwidth to download input data before execution. The master can communicate with multiple workers simultaneously, provided that the bandwidth used by the master and the workers do not exceed their bandwidth limits. We proposed two models for this limited-bandwidth problem. If the data transfer cannot be interrupted, then we prove that the scheduling problem is NP-complete. Nevertheless we propose heuristic algorithms and experimentally test their performance. If the data transfer can be interrupted, we propose an algorithm that produces optimal makespan. The algorithm is based on a binary search on the completion time, and an efficient feasibility verification process for a given completion time.

View

Fulltext

Code

TR-IIS-09-007

Subject / Author / Abstract

A Multi-Camera Tracking System That Can always Select A Better View to Perform Tracking
Shih-Wei Sun, Hsing-Yuan Lo, Hong-Ju Lin, Yong-Sheng Chen, Fay Huang, Hong-Yuan Mark Liao

In this paper, we propose a new multiple-camera people tracking system that is equipped with the following functions: (1) can handle long-term occlusions, complete occlusions, and unpredictable motions; (2) can detect arbitrary sized foreground objects; (3) can detect objects with much faster speed. The main contribution of our method is twofold: 1) An M-to-one relationship with only point homography matching for occlusion detection can achieve efficiency; 2) A view-hopping technique based on object motion probability (OMP) is proposed to automatically select an appropriate observation view for tracking a human subject.

View

Fulltext

Code

TR-IIS-09-008

Subject / Author / Abstract

Adaptive Local Thresholding for Fluorescence Cell Micrographs
Jyh-Ying Peng and Chun-Nan Hsu

An adaptive local thresholding method for microscope based high content analysis (HCA) is proposed. Cell micrographs of HCA contain detailed objects both in and out of focus, which cannot be correctly segmented by global thresholding methods. The proposed method utilizes adaptive local neighborhood size and double thresholding, and is able to produce segmentations that conform closely to perceptually relevant structures in the original image, robust to background noise and variation. The proposed method is applied to the segmentation of mitochondria in fluorescence cell micrographs. Comparison with both hand segmentation and other global and local thresholding methods shows that the proposed method produces results of comparable quality to hand segmentation and discovers much more detailed structure than any previous thresholding methods.

View

Fulltext

Code

TR-IIS-09-009

Subject / Author / Abstract

EMWF: A Middleware for Flexible Automation and Assistive Devices
J.W. S. Liu, C. S. Shin, T. S. Chou, Y.C. Wang, H. Y. Huang, W. S. Chen, K. C. Chuang, W. C. Wang and T. Y. Chen

EMWF (embedded workflow framework) is an open source middleware for flexible (i.e., configurable, customizable and adaptable) automation and assistive devices and systems, referred to collectively as SISARL (Sensor Information Systems for Active Retirees and Assisted Living). Examples include smart medication dispensers, autonomous appliances, service robots and robotic helpers for personal and home use, as well as automation tools for use in hospitals and long-term care facilities. EMWF 1.0 provides a light-weight workflow manager and engines on Windows CE, Windows XP Embedded, and Linux It is for small embedded automation devices. EMWF 2.0 will include basic message passing mechanism, real-time scheduling capability and workflow communication facility. This paper first gives an overview of EMWF 1.0 and then describes these extensions.

View

Fulltext

Code

TR-IIS-09-010

Subject / Author / Abstract

An Adaptive Multiple Feature Subset Method for Feature Ranking and Feature Selection
Fu Chang and Jen-Cheng Chen

In this paper, we propose a new feature evaluation method that forms the basis for feature ranking and feature selection. The method starts by generating a number of feature subsets in a random fashion and evaluates features based on the derived subsets. It then proceeds in a number of stages. In each stage, it inputs the features whose ranks in the previous stage were above the median rank and re-evaluates those features in the same fashion as it did in the first stage. When the number of features is high, the method has a computational advantage over recursive feature elimination (RFE), a state-of-art method that ranks features by identifying the least valuable feature in each stage. It also achieves better results than RFE in terms of classification accuracy and some other measures introduced in this paper, especially when the size of the training data is small or the number of irrelevant features is large.

View

Fulltext

Code

TR-IIS-09-011

Subject / Author / Abstract

GAPM–A Robust Algorithm for the Physical Mapping Problem
Hsin-Nan Lin, Yen-Yi Lin, Shinn-Ying Ho, Ting-Yi Sung, and Wen-Lian Hsu

A major challenge for next generation sequencing technology is genome assembly. A physical map could be used as a preliminary step towards genome sequencing in a hybrid approach. In this paper, we illustrate a robust physical mapping algorithm, GAPM, which could well complement with the assembly of short fragments. The physical mapping problem (PMP) is to determine the relative positions of genetic markers (called probes) along the DNA sequences. The presence and absence of probes in clones can be represented by a 0-1 matrix with rows corresponding to clones and columns corresponding to probes. A 0-1 matrix satisfies the consecutive ones property (COP) for the rows if there exists a column permutation such that the ones in each row of the resulting matrix are consecutive. In the error-free case, the PMP can be reduced to testing the COP of a 0-1 matrix. Lu and Hsu proposed an iterative clustering algorithm to deal with the following four types of errors: false positives, false negatives, chimerical clones, and non-unique probes. In this paper, we present a novel genetic algorithm, called GAPM, with a much better performance. GAPM can be run in parallel and generate approximate optimal physical maps regardless of the error rates and matrix sizes. Moreover, GAPM is very flexible in dealing with unknown data. We test 9,000 different cases and compare GAPM with L&H's method. The results indicate that GAPM is more robust and reliable for most data.

View

Fulltext

Code

IM-IIS-09-001

Subject / Author / Abstract

IDEAL-Q: An automated tool for label-free quantitation analysis using an efficient peptide alignment approach and spectral data validation
Chih-Chiang Tsou, Chia-Feng Tasi, Ying-Hao Tsui, Putty-Reddy Sudhir, Yi-Ting Wang, Yu-Ju Chen, Jeou-Yuan Chen, Ting-Yi Sung, Wen-Lian Hsu

In this study, we present a fully automated tool, called IDEAL-Q, for label-free quantitation analysis. It accepts raw data in the standard mzXML format as well as search results from major search engines, including Mascot, SEQUEST, and X!Tandem, as input data. To quantify as many identified peptides as possible, IDEAL-Q uses an efficient algorithm to predict the elution time of a peptide unidentified in a specific LC-MS/MS run but identified in other runs. Then, the predicted elution time is used to detect peak clusters of the assigned peptide. Detected peptide peaks are processed by statistical and computational methods and further validated by signal-to-noise ratio, charge state, and isotopic distribution criteria (SCI validation) to filter out noisy data. The performance of IDEAL-Q has been evaluated by several experiments. First, a serially diluted protein mixed with Escherichia coli lysate showed a high correlation with expected ratios and demonstrated good linearity (R(2) = 0.996). Second, in a biological replicate experiment on the THP-1 cell lysate, IDEAL-Q quantified 87% (1,672 peptides) of all identified peptides, surpassing the 45.7% (909 peptides) achieved by the conventional identity-based approach, which only quantifies peptides identified in all LC-MS/MS runs. Manual validation on all 11,940 peptide ions in six replicate LC-MS/MS runs revealed that 97.8% of the peptide ions were correctly aligned, and 93.3% were correctly validated by SCI. Thus, the mean of the protein ratio, 1.00 +/- 0.05, demonstrates the high accuracy of IDEAL-Q without human intervention. Finally, IDEAL-Q was applied again to the biological replicate experiment but with an additional SDS-PAGE step to show its compatibility for label-free experiments with fractionation. For flexible workflow design, IDEAL-Q supports different fractionation strategies and various normalization schemes, including multiple spiked internal standards. User-friendly interfaces are provided to facilitate convenient inspection, validation, and modification of quantitation results. In summary, IDEAL-Q is an efficient, user-friendly, and robust quantitation tool. It is available at http://ms.iis.sinica.edu.tw/IDEAL-Q/.

View

Not for public 原文請洽圖書室