Recent Research Results
"Active Disaster Response System for Smart Buildings," Proceedings of International Symposium on Computer, Consumer and Control, June 2014.
Authors: C.-Y. Lin, E. T. H. Chu, L.-W. Ku and J. W. S. Liu

Abstract:
In recent years, major natural disasters have made it more challenging for us to protect human lives. Examples include the 2011 Japan Tohoku earthquake and Typhoon Morakot in 2009 in Taiwan. However, modern disaster warning systems cannot automatically respond efficiently and effectively to disasters. As a result, it is necessary to develop an automatic response system to prevent, diminish or eliminate the damages caused by disasters. In this paper, we develop an active emergency response system to automatically process standard warning messages, such as CAP (Common Alerting Protocol) messages. After receiving official warning messages of earthquakes, our system automatically shuts down natural gas lines in order to prevent buildings from fire and opens the building doors for easy evacuation. Our system also stops elevators when they reach the closest floor. In addition, our system can be applied to hospitals to tell surgeons to pause on-going operations, or to supermarkets to inform consumers of the way to exit the building. According to our experiment results, the proposed system can avoid possible dangers and save human lives during major disasters.
"Space-Efficient Multiversion Index Scheme for PCM-based Embedded Database Systems," ACM/IEEE Design Automation Conference (DAC), June 2014.
Authors: Yuan-Hung Kuan, Yuan-Hao Chang, Po-Chun Huang, and Kam-Yiu Lam

Abstract:
Embedded database systems are widely adopted in various control and motoring systems, e.g., cyber-physical systems. To support the functionality to access the historical data, a multiversion index is adopted to simultaneously maintain multiple versions of data items and their index information. However, cyber-physical systems are usually battery-powered embedded systems that have limited energy, computing power, and storage space. In this work, we consider the systems with phase-change memory (PCM) as their storage due to its non-volatility and low energy consumption. In order to resolve the problem of the limited storage space and the fact that existing multiversion index designs are lack of space efficiency, we propose a space-efficient multiversion index scheme to enhance the space utilization and access performance of embedded multiversion database systems on PCM by utilizing the byte-addressability and write asymmetry of PCM. A series of experiments was conducted to evaluate the efficacy of the proposed scheme. The results show that the proposed scheme achieves very high space utilization and has good performance on serving update transactions and range queries.
"Garbage Collection for Multi-version Index on Flash Memory," ACM/IEEE Design, Automation and Test in Europe (DATE), March 2014.
Authors: Kam-Yiu Lam, Jian-Tao Wang, Yuan-Hao Chang, Po-Chun Huang, Jen-Wei Hsieh, Chung Keung Poon, and ChunJiang Zhu

Abstract:
In this paper, we study the important performance issues in using the purging-range query to reclaim old data versions to be free blocks in a flash-based multi-version database. To reduce the overheads for processing the purging-range query,
the physical block labeling (PBL) scheme is proposed to provide a better estimation on the purging version number to be used for purging old data versions. With the use of the frequencybased placement (FBP) scheme to place data versions in a block, the efficiency in garbage collection can be further enhanced by increasing the deadspans of data versions and reducing reallocation cost especially when the spaces of the flash memory for the databases are limited.
Current Research Results
Authors: Jia-Ming Chang, Jean-Francois Taly, Ionas Erb, Ting-Yi Sung, Wen-Lian Hsu, Chuan Yi Tang, Cedric Notredame, Emily Chia-Yu Su

Abstract:
Predicting protein functional classes such as localization sites and modifications plays a crucial role in function annotation. Given a tremendous amount of sequence data yielded from high-throughput sequencing experiments, the need of efficient and interpretable prediction strategies has been rapidly amplified. Our previous approach for subcellular localization prediction, PSLDoc, archives high overall accuracy for Gram-negative bacteria. However, PSLDoc is computational intensive due to incorporation of homology extension in feature extraction and probabilistic latent semantic analysis in feature reduction. Besides, prediction results generated by support vector machines are accurate but generally difficult to interpret. In this work, we incorporate three new techniques to improve efficiency and interpretability. First, homology extension is performed against a compact non-redundant database using a fast search model to reduce running time. Second, correspondence analysis (CA) is incorporated as an efficient feature reduction to generate a clear visual separation of different protein classes. Finally, functional classes are predicted by a combination of accurate compact set (CS) relation and interpretable one-nearest neighbor (1-NN) algorithm. Besides localization data sets, we also apply a human protein kinase set to validate generality of our proposed method. Experiment results demonstrate that our method make accurate prediction in a more efficient and interpretable manner. First, homology extension using a fast search on a compact database can greatly accelerate traditional running time up to twenty-five times faster without sacrificing prediction performance. This suggests that computational costs of many other predictors that also incorporate homology information can be largely reduced. In addition, CA can not only efficiently identify discriminative features but also provide a clear visualization of different functional classes. Moreover, predictions based on CS achieve 100% precision. When combined with 1-NN on unpredicted targets by CS, our method attains slightly better or comparable performance compared with the state-of-the-art systems.
Current Research Results
Authors: Chia-Feng Tsai, Chuan-Chih Hsu, Jo-Nan Hung, Yi-Ting Wang, Wai-Kok Choong, Ming-Yao Zeng, Pei-Yi Lin, Ruo-Wei Hong, Ting-Yi Sung, and Yu-Ju Chen

Abstract:
Methodologies to enrich heterogeneous types of phosphopeptides are critical for comprehensive mapping of the under-explored phosphoproteome. Taking advantage of the distinct binding affinities of Ga3+ and Fe3+ for phosphopeptides, we designed a metal-directed immobilized metal ion affinity chromatography for the sequential enrichment of phosphopeptides. In Raji B cells, the sequential Ga3+ - Fe3+-immobilized metal affinity hromatography (IMAC) strategy displayed a 1.5−3.5-fold superior phosphoproteomic coverage compared to single IMAC (Fe3+,Ti4+,Ga3+, and Al3+). In addition, up to 92% of the
6283 phosphopeptides were uniquely enriched in either the first Ga3+-IMAC (41%) or second Fe3+-IMAC (51%). The complementary properties of Ga3+ and Fe3+ were further demonstrated through the exclusive enrichment of almost all of 1214 multiply phosphorylated peptides (99.4%) in the Ga3+-IMAC, whereas only 10% of 5069 monophosphorylated phosphopeptides were commonly enriched in both fractions. The application of sequential Ga3+-Fe3+-IMAC to human lung cancer tissue allowed the identification of 2560 unique phosphopeptides with only 8% overlap. In addition to the above-mentioned mono- and multiply phosphorylated peptides, this fractionation ability was also demonstrated on the basic and acidic phosphopeptides: acidophilic phosphorylation sites were predominately enriched in the first Ga3+-IMAC (72%), while Pro-directed (85%) and basophilic (79%) phosphorylation sites were enriched in the second Fe3+-IMAC. This strategy provided complementary mapping of different kinase substrates in multiple cellular pathways related to cancer invasion and metastasis of lung cancer. Given the fractionation ability and ease of tip preparation of this Ga3+-Fe3+-IMAC, we propose that this strategy allows more comprehensive characterization of the phosphoproteome both in vitro and in vivo.

Current Research Results
Authors: Jun Ming Chen, Meng Chang Chen and Yeali S. Sun

Abstract:
Prior knowledge is an important issue in the study of concept acquisition among students. Traditional studies on prior knowledge generation during reading activities have focused on extracting sentences from reading materials that are manually generated by website administrators and educators. This is time-consuming and strenuous, and hence personalized prior knowledge recommendation is difficult to perform. To cope with this problem, we combine the concept of prior knowledge with social tagging methods to assist the reading comprehension of students studying English. We incorporate tags into a tag based learning approach, which then identifies suitable supplementary materials for quickly constructing a student’s prior knowledge reservoir. The experimental results demonstrate that the proposed approach benefits the students by embedding the additional information in social knowledge, and hence significantly improve their on-line reading efficiency.
Current Research Results
"Example-based Human Motion Extrapolation and Motion Repairing Using Contour Manifold," IEEE Transactions on Multimedia, January 2014.
Authors: Nick C. Tang, C. T. Hsu, M. F. Weng, T. Y. Lin, and H. Y. Mark Liao

Abstract:
We propose a human motion extrapolation algorithm that synthesizes new motions of a human object in a still image from a given reference motion sequence. The algorithm is implemented in two major steps: contour manifold construction and object motion synthesis. Contour manifold construction searches for low-dimensional manifolds that represent the temporal-domain deformation of the reference motion sequence. Since the derived manifolds capture the motion information of the reference sequence, the representation is more robust to variations in shape and size. With this compact representation, we can easily modify and manipulate human motions through interpolation or extrapolation in the contour manifold space. In the object motion synthesis step, the proposed algorithm generates a sequence of new shapes of the input human object in the contour manifold space and then renders the textures of those shapes to synthesize a new motion sequence. We demonstrate the efficacy of the algorithm on different types of practical applications, namely, motion extrapolation and motion repair.

Current Research Results
"A Resource-Driven DVFS Scheme for Smart Handheld Devices," ACM Transactions on Embedded Computing Systems, December 2013.
Authors: Yu-Ming Chang, Pi-Cheng Hsiu, Yuan-Hao Chang, and Che-Wei Chang

Abstract:
Reducing the energy consumption of the emerging genre of smart handheld devices while simultaneously maintaining mobile applications and services is a major challenge. This work is inspired by an observation on the resource usage patterns of mobile applications. In contrast to existing DVFS scheduling algorithms and history-based prediction techniques, we propose a resource-driven DVFS scheme, in which resource state machines are designed to model the resource usage patterns in an online fashion to guide DVFS. We have implemented the proposed scheme on Android smartphones and conducted experiments based on realworld applications. The results are very encouraging and demonstrate the efficacy of the proposed scheme.
Current Research Results
"DEEP: Density-Aware Emergency Message Extension Protocol for VANETs," IEEE Transactions on Wireless Communications, October 2013.
Authors: Ming-Chin Chuang and Meng Chang Chen

Abstract:
With the rapid developments in vehicular communication technology, academics and industry researchers are paying increasing attention to vehicular ad hoc networks (VANETs). In VANETs, dissemination delay and reliability are important criteria for many applications, especially for emergency messages. Existing approaches have difficulty satisfying both requirements simultaneously because they conflict with one another. In this paper, we propose a novel mechanism, called the Density-aware Emergency message Extension Protocol (DEEP) to disseminate emergency messages in VANETs. DEEP resolves the broadcast storm problem, achieves low dissemination delay, and provides high reliability over a realistic multi-lane freeway scenario. The mechanism delivers emergency messages to a specific area (e.g., the area before the exit) in a timely manner and guarantees that all relevant vehicles in that area will receive the messages. Drivers can then change their routes and avoid getting caught in a traffic jam. Performance evaluations via NS-2 simulations demonstrate that DEEP achieves both lower dissemination delay and higher reliability than existing approaches.
"Kylin: An Efficient and Scalable Graph Data Processing System," 2013 IEEE International Conference on Big Data, October 2013.
Authors: Li-Yung Ho, Tsung-Han Li, Jan-Jan Wu, Pangfeng Liu

Abstract:
We introduce Kylin, an efficient and scalable graph data processing system. Kylin is based on bulk synchronization processing(BSP) model to process graph data. Although there have been some BSP-based graph processing systems, Kylin is different from these systems in two aspects. First, Kylin cooperates with HBase to achieve scalable graph data management. Second, We propose three techniques to optimize the performance of Kylin. The proposed techniques are pull messaging, lazy vertex loading and vertex-weighted partitioning. Our experiment results demonstrate that Kylin outperforms other BSP-based systems, such as Apache Hama and Giraph.
"Sum-Rate Maximization and Energy-Cost Minimization for Renewable Energy Empowered Base-Stations using Zero-Forcing Beamforming," Proceedings of APSIPA ASC 2013 - the fifth Annual Summit and Conference of Asia-Pacific Signal and Information Processing Association, October 2013.
Authors: Yung-Shun Wang, Yao-Win Peter Hong, and W. T. Chen

Abstract:
Zero-forcing (ZF) beamforming is a practical linear transmission scheme that eliminates inter-user interference in the downlink of a multiuser multiple-input single-output (MISO) wireless system. By considering base-stations (BSs) that are supported by renewable energy, this work examines offline and online ZF beamforming designs based on two different objectives, namely, sum-rate maximization and energy-cost minimization. For offline policies, the channel states and the energy arrivals are assumed to be known a priori for all time instants whereas, in the online policies, only causal information is available. The designs are subject to energy causality and energy storage constraints, i.e., the constraint that energy cannot be used before it arrives and the constraint that the stored energy cannot exceed the maximum battery storage capacity. In the offline sum-rate maximization problem, the base-station is assumed to be supported only by renewable energy and the goal is to maximize the sum rate of all users by a predetermined deadline. The optimization over the ZF beamforming direction and power allocation can be decoupled, and the solutions can be found exactly. In the energycost minimization problem, the base-station is assumed to be supported by both renewable and power-grid energy, and the goal is to minimize the cost of purchasing grid energy subject to quality-of-service constraints at the users. The problem can be formulated as a convex optimization problem and can be solved efficiently using off-the-shelf solvers. Offline solutions are first obtained and used as the basis for the proposed online policies. The effectiveness of the proposed policies are demonstrated through computer simulations.
Current Research Results
"On the Quality of Service of Cloud Gaming Systems," IEEE Transactions on Multimedia, February 2014.
Authors: Kuan-Ta Chen, Yu-Chun Chang, Hwai-Jung Hsu, De-Yu Chen, Chun-Ying Huang, and Cheng-Hsin Hsu

Abstract:
Cloud gaming, i.e., real-time game playing via thin clients, relieves users from being forced to upgrade their com- puters and resolve the incompatibility issues between games and computers. As a result, cloud gaming is generating a great deal of interests among entrepreneurs, venture capitalists, general publics, and researchers. However, given the large design space, it is not yet known which cloud gaming system delivers the best user-perceived Quality of Service (QoS) and what design elements constitute a good cloud gaming system.

This study is motivated by the question: How good is the QoS of current cloud gaming systems? Answering the question is challenging because most cloud gaming systems are proprietary and closed, and thus their internal mechanisms are not accessible for the research community. In this paper, we propose a suite of measurement techniques to evaluate the QoS of cloud gaming systems and prove the effectiveness of our schemes using a case study comprising two well-known cloud gaming systems: OnLive and StreamMyGame. Our results show that OnLive performs better, because it provides adaptable frame rates, better graphic quality, and shorter server processing delays, while consuming less network bandwidth. Our measurement techniques are general and can be applied to any cloud gaming systems, so that researchers, users, and service providers may systematically quantify the QoS of these systems. To the best of our knowledge, the proposed suite of measurement techniques have never been presented in the literature.

Current Research Results
"Reliability Enhancement of Flash-Memory Storage Systems: An Efficient Version-Based Design," IEEE Transactions on Computers, December 2013.
Authors: Yuan-Hao Chang, Po-Chun Huang, Pei-Han Hsu, Lue-Jane Lee, Tei-Wei Kuo and David Hung-Chang Du

Abstract:
In recent years, reliability has become one critical issue in the designs of flash-memory file/storage systems, due to the growing unreliability of advanced flash-memory chips. In this paper, a version-based strategy is proposed to effectively and efficiently maintain the consistency among page versions of a file for potential recovery needs. In particular, a two-version one for a native file system is presented with the minimal overheads in version maintenance. A recovery scheme is then presented to restore a corrupted file back to the latest consistent version. The strategy is later extended to maintain multiple data versions with the considerations of the write constraints of multi-level-cell flash memory. It was shown that the proposed strategy could significantly improve the reliability of flash memory with limited management and space overheads.
Current Research Results
"Automatic Training Image Acquisition and Effective Feature Selection from Community-Contributed Photos for Facial Attribute Detection," IEEE Transactions on Multimedia, October 2013.
Authors: Yan-Ying Chen, Winston Hsu, and H. Y. Mark Liao

Abstract:
Facial attributes are shown effective for mining specific persons and profiling human activities in large-scale media such as surveillance videos or photo-sharing services. For comprehensive analysis, a rich number of facial attributes
is required. Generally, each attribute detector is obtained by supervised learning via the use of large training data. It is promising to leverage the exponentially growing community contributed photos and the associated informative contexts to ease the burden of manual annotation; however, such huge noisy data from the Internet still pose great challenges. We propose to measure the quality of training images by discriminable visual features, which are verified with the relative discrimination between the unlabeled images and the pseudo-positives (pseudo-negatives) retrieved by textual relevance. The proposed feature selection requires no heuristic threshold, therefore, can be generalized to multiple feature modalities. We further exploit the rich context cues (e.g., tags, geo-locations, etc.) associated with the publicly available photos for mining more semantically consistent but visually diverse training images around the world. Experimenting in the benchmarks, we demonstrate that our work can successfully acquire effective training images for learning generic facial attributes, where the classification error is relatively reduced up to 23.35% compared with that of the text-based approach and shown comparable with that of costly manual annotations.
Current Research Results
"An Index-Based Management Scheme with Adaptive Caching for Huge-Scale Low-Cost Embedded Flash Storages," ACM Transactions on Design Automation of Electronic Systems, October 2013.
Authors: Po-Chun Huang, Yuan-Hao Chang, and Tei-Wei Kuo

Abstract:
Due to its remarkable access performance, shock resistance and costs, NAND flash memory is now widely adopted in a variety of computing environments, especially in mobile devices such as smart phones, media players and electronic book readers. For the consideration of costs, low-cost embedded flash storages such as flash memory cards are often employed on such devices. Different from solid-state disks, the RAM buffer equipped on low-cost embedded flash storages are very small, for example, limited under tens of kilobytes, despite of the rapidly growing capacity of the storages. The significance of effectively utilizing the very limited on-device RAM buffers of embedded flash storages is therefore highlighted, and a novel design of scalable flash management schemes is needed to tackle the new access constraints of MLC NAND flash memory. In this work, a highly-scalable design of the flash translation layer is presented with the considerations of the on-device RAM size, user access patterns, address-mapping-information caching and MLC access constraints. Through a series of experiments, it is verified that, with appropriate settings of cache sizes, the proposed management scheme provides comparable performance results to prior arts with much lower requirements on the on-device RAM. In other words, the proposed scheme suggests a strategy to make better use of the on-device RAM, and is suitable for embedded flash storages.
"Selective Profiling for OS Scalability Study on Multicore Systems," IEEE International Conference on Service-Oriented Computing and Applications (SOCA), December 2013.
Authors: Kuo-Yi Chen, Yuan-Hao Chang, Pei-Shu Liao, Pen-Chung Yew, Sheng-Wei Cheng, and Tien-Fu Chen

Abstract:
With the wide deployment of multi-core processors, the scalability is becoming an important issue of modern computers.  In order to detail the scalability issues of an operating system, the performance profilers are usually used to detect the scalability bottlenecks. However, profilers usually go alone with the significant overheads. The overheads not only reduce the accuracy of scalability profiling, but also mislead to the actual scalability bottlenecks. In order to improve this issue, a selective-profiling approach is proposed to analyze the scalability of an operating system precisely with minimum overheads. In proposed selective-profiling approach, the possible scalability bottleneck hotspots are located by the tracers first, and then the information of possible bottleneck hotspots is passed to the sampler to detect actual bottlenecks.  Since the sampler only concentrates on the possible bottleneck hotspots instead of a whole system, the overhead can be reduced significantly. Therefore, the proposed selective-profiling approach takes the advantages of performance tracers and samplers both to reduce the overhead and still reach the accurate bottleneck analysis for an operating system in multi-core systems.
"Interactive Coding, Revisited," The 54th Annual IEEE Symposium on Foundations of Computer Science (FOCS), October 2013.
Authors: Kai-Min Chung and Rafael Pass and Sidharth Telang

Abstract:
How can we encode a communication protocol between two parties to become resilient to adversarial errors on the communication channel? If we encode each message in the communication protocol with agood'' error-correcting code (ECC), the error rate of the encoded protocol becomes poor (namely O(1/m) where m is the number of communication rounds). Towards addressing this issue, Schulman (FOCS'92, STOC'93) introduced the notion of *interactive coding*.

We argue that whereas the method of separately encoding each message with an ECC ensures that the encoded protocol carries the same amount of information as the original protocol, this may no longer be the case if using interactive coding. In particular, the encoded protocol may completely leak a player's private input, even if it would remain secret in the original protocol. Towards addressing this problem, we introduce the notion of *knowledge-preserving interactive coding*, where the interactive coding protocol is required to preserve the ''knowledge'' transmitted in the original protocol. Our main results are as follows.

- The method of separately applying ECCs to each message has essentially optimal error rate: No knowledge-preserving interactive coding scheme can have an error rate of 1/m, where m is the number of rounds in the original protocol.

- If restricting to computationally-bounded (polynomial-time) adversaries, then assuming the existence of one-way functions (resp. subexponentially-hard one-way functions), for every ϵ>0, there exists a knowledge-preserving interactive coding schemes with constant error rate and information rate nϵ (resp. 1/polylog(n)) where n is the security parameter; additionally to achieve an error of even 1/m requires the existence of one-way functions.

- Finally, even if we restrict to computationally-bounded adversaries, knowledge-preserving interactive coding schemes with constant error rate can have an information rate of at most o(1/logn). This results applies even to *non-constructive* interactive coding schemes.
"Simultaneous Resettability from One-Way Functions," The 54th Annual IEEE Symposium on Foundations of Computer Science (FOCS), October 2013.
Authors: Kai-Min Chung and Rafail Ostrovsky and Rafael Pass and Ivan Visconti

Abstract:
Resettable-security, introduced by Canetti, Goldreich, Goldwasser and Micali (STOC'00), considers the security of cryptographic two-party protocols (in particular zero-knowledge arguments) in a setting where the attacker may reset'' or rewind'' one of the players. The strongest notion of resettable security, simultaneous resettability, introduced by Barak, Goldreich, Goldwasser and Lindell (FOCS'01), requires resettable security to hold for both parties: in the context of zero-knowledge, both the soundness and the zero-knowledge conditions remain robust to resetting attacks.

To date, all known constructions of protocols satisfying simultaneous resettable security rely on the existence of ZAPs; constructions of ZAPs are only known based on the existence of trapdoor permutations or number-theoretic assumptions.

In this paper, we provide a new method for constructing protocols satisfying simultaneous resettable security while relying only on the minimal assumption of one-way functions. Our key results establish, assuming only one-way functions:

- Every language in NP has an ω(1)-round simultaneously resettable witness indistinguishable argument system.

- Every language in NP has a (polynomial-round) simultaneously resettable zero-knowledge argument system.

The key conceptual insight in our technique is relying on black-box impossibility results for concurrent zero-knowledge to achieve resettable-security.
"Constant-Round Concurrent Zero Knowledge From P-Certificates," The 54th Annual IEEE Symposium on Foundations of Computer Science (FOCS), October 2013.
Authors: Kai-Min Chung and Huijia Lin and Rafael Pass

Abstract:
We present a constant-round concurrent zero-knowledge protocol for NP. Our protocol relies on the existence of families of collision-resistant hash functions, and a new, but in our eyes, natural complexity-theoretic assumption: the existence of P-certificates---that is, "succinct'' non-interactive proofs/arguments for P. As far as we know, our results yield the first constant-round concurrent zero-knowledge protocol for NP with an explicit zero-knowledge simulator based on any assumption.
"CloudRS: An Error Correction Algorithm of High-Throughput Sequencing Data," Proceedings IEEE BigData 2013, October 2013.
Authors: Chien-Chih Chen, Yu-Jung Chang, Wei-Chun Chung, Der-Tsai Lee, and Jan-Ming Ho

Abstract:
Next-generation sequencing (NGS) technologies produce huge amounts of data. These sequencing data unavoidably are accompanied by the occurrence of sequencing errors which constitutes one of the major problems of further analyses. Error correction is indeed one of the critical steps to the success of NGS applications such as de novo genome assembly and DNA resequencing as illustrated in literature. However, it requires computing time and memory space heavily. To design an algorithm to improve data quality by efficiently utilizing on-demand computing resources in the cloud is a challenge for biologist and computer scientists. In this study, we present an error-correction algorithm, called the CloudRS algorithm, for correcting errors in NGS data. The CloudRS algorithm aims at emulating the notion of error correction algorithm of Allpath-LG on the Hadoop/ MapReduce framework. It is conservative in correcting sequencing errors to avoid introducing false decisions, e.g., when dealing with reads from repetitive regions. We also illustrate several probabilistic measures we introduce into CloudRS to make the algorithm more efficient without sacrificing its effectiveness. Running time of using up to 80 instances each with 8 computing units shows satisfactory speedup. Experiments of comparing with other error correction programs show that CloudRS algorithm performs lower false positive rate for most evaluation benchmarks and higher sensitivity on genome S. cerevisiae. We demonstrate that CloudRS algorithm provides significant improvements in the quality of the resulting contigs on benchmarks of NGS de novo assembly.
"A Cooperative Botnet Profiling and Detection in Virtualized Environment," IEEE Conference on Communications and Network Security (CNS), October 2013.
Authors: Shun-Wen Hsiao, Yi-Ning Chen, Yeali S. Sun, Meng Chang Chen

Abstract:
Cloud security becomes an important topic in recent years, as to overcome the botnet in a visualized environment is a critical task for the cloud providers. Although numerous intrusion detection systems are available, yet it is not practical to install IDS in every virtual machine. In this paper, we argue that a virtual machine monitor (VMM) can support certain security functions that our proposed design can actively collect information directly from the VMM without installing an agent in the guest OS. In addition, bot could not aware of the existence of such detection agent in the VMM. The proposed detection mechanism takes both passive and active detection approaches that the passive detection agent lies in the VMM to examine the tainted data used by a bot to check against bot behavior profiles and the active detection agent that performs active bot fingerprinting can actively send specific stimulus to a guest and examine if there exists expected triggered behavior. In the real-world bot experiments, we show the passive detection agent can distinguish between bots and benign process with low false positive and false negative rates. Also, the result shows the active detection agent can detect a bot even when before it performs its malicious jobs. The proposed mechanism suites an enterprise having cloud environment well to defeat malware.
Current Research Results
Authors: Jhih-Siang Lai, Cheng-Wei Cheng, Allan Lo*, Ting-Yi Sung*, and Wen-Lian Hsu

Abstract:
Background: Since membrane protein structures are challenging to crystallize, computational approaches are essential for elucidating the sequence-to-structure relationships. Structural modeling of membrane proteins requires a multidimensional approach, and one critical geometric parameter is the rotational angle of transmembrane helices. Rotational angles of transmembrane helices are characterized by their folded structures and could be inferred by the hydrophobic moment; however, the folding mechanism of membrane proteins is not yet fully understood. The rotational angle of a transmembrane helix is related to the exposed surface of a transmembrane helix, since lipid exposure gives the degree of accessibility of each residue in lipid environment. To the best of our knowledge, there have been few advances in investigating whether an environment descriptor of lipid exposure could infer a geometric parameter of rotational angle.

Results: Here, we present an analysis of the relationship between rotational angles and lipid exposure and a support-vector-machine method, called TMexpo, for predicting both structural features from sequences. First, we observed from the development set of 89 protein chains that the lipid exposure, i.e., the relative accessible surface area (rASA) of residues in the lipid environment, generated from high-resolution protein structures could infer the rotational angles with a mean absolute angular error (MAAE) of 46.32°. More importantly, the predicted rASA from TMexpo achieved an MAAE of 51.05°, which is better than 71.47° obtained by the best of the compared hydrophobicity scales. Lastly, TMexpo outperformed the compared methods in rASA prediction on the independent test set of 21 protein chains and achieved an overall Matthew’s correlation coefficient, accuracy, sensitivity, specificity, and precision of 0.51, 75.26%, 81.30%, 69.15%, and 72.73%, respectively. TMexpo is publicly available at http://bio-cluster.iis.sinica.edu.tw/TMexpo.

Conclusions: TMexpo can better predict rASA and rotational angles than the compared methods. When rotational angles can be accurately predicted, free modeling of transmembrane protein structures in turn may benefit from a reduced complexity in ensembles with a significantly less number of packing arrangements. Furthermore, sequence-based prediction of both rotational angle and lipid exposure can provide essential information when high-resolution structures are unavailable and contribute to experimental design to elucidate transmembrane protein functions.
Current Research Results
Authors: Tsai, Z.T.Y., Chu, W.Y., Cheng, J.H., and Tsai, H.K.*

Abstract:
Non-B DNA structures are abundant in the genome and are often associated with critical biological processes, including gene regulation, chromosome rearrangement and genome stabilization. In particular, G-quadruplex (G4) may affect alternative splicing based on its ability to impede the activity of RNA polymerase II. However, the specific role of non-B DNA structures in splicing regulation still awaits investigation. Here, we provide a genomewide and cross-species investigation of the associations between five non-B DNA structures and exon skipping. Our results indicate a statistically significant correlation of each examined non-B DNA structures with exon skipping in both human and mouse. We further show that the contributions of non-B DNA structures to exon skipping are influenced by the occurring region. These correlations and contributions are also significantly different in human and mouse. Finally, we detailed the effects of G4 by showing that occurring on the template strand and the length of G-run, which is highly related to the stability of a G4 structure, are significantly correlated with exon skipping activity. We thus show that, in addition to the well-known effects of RNA and protein structure, the relative positional arrangement of intronic non-B DNA structures may also impact exon skipping.
Current Research Results
"What Distinguish One from Its Peers in Social Networks," Data Mining and Knowledge Discovery, November 2013.
Authors: Yi-Chen Lo, Jhao-Yin Li, Mi-Yen Yeh, Shou-De Lin, Jian Pei

Abstract:
Being able to discover the uniqueness of an individual is a meaningful task in social network analysis. This paper proposes two novel problems in social network analysis: how to identify the uniqueness of a given query vertex, and how to identify a group of vertices that can mutually identify each other. We further propose intuitive yet effective methods to identify the uniqueness identification sets and the mutual identification groups of different properties. We further conduct an extensive experiment on both real and synthetic datasets to demonstrate the effectiveness of our model.
"A Systematic Methodology for OS Benchmark Characterization," ACM Research in Adaptive and Convergent Systems (RACS), October 2013.
Authors: Shuo-Hung Chen, Hsiao-Mei Lin, Kuo-Yi Chen, Yuan-Hao Chang, Pen-Chung Yew, and Chien-Chung Ho

Abstract:
Using benchmarks to evaluate operating systems is a common and important approach. However, determining which benchmarks to use for such evaluation requires very careful consideration. It has been found that a seemingly naive change of system configuration or input set could lead to drastic change of benchmark characteristics, and could also lead to misleading or incorrect results. Some OS benchmark suites may also include too many benchmark programs with very similar characteristics that could give biased results against, or in favor of, certain kernel behavior. Hence, we need to determine the characteristics of benchmark programs in order to come up with an appropriate benchmark suite for such evaluation, and to interpret the measured results more precisely and correctly. Although there have been many tools developed to help profiling an OS and to characterize its run-time behavior, the collected data by those tools are often very large and complex. It is extremely time consuming, labor intensive, and error prone to analyze the large volume of measured results, and to determine the characteristics of a suite of benchmark programs. In this work, we propose to use machine-learning techniques to help analyzing and characterizing OS benchmark programs based on the traced OS kernel events. In the work, a systematic methodology is proposed to automatically characterize benchmarks. We found that the characterized OS behavior could help developers to choose appropriate applications to benchmark operating systems.
"Non-Reference Audio Quality Assessment for Online Live Music Recordings," Proc. ACM International Conference on Multimedia (MM), October 2013.
Authors: Zhonghua Li, Ju-Chiang Wang, Jingli Cai, Zhiyan Duan, Hsin-Min Wang and Ye Wang

Abstract:
Immensely popular video sharing websites such as YouTube have become the most important sources of music information for Internet users and the most prominent platform for sharing live music. The audio quality of this huge amount
of live music recordings, however, varies significantly due to factors such as environmental noise, location, and recording device. However, most video search engines do not take audio quality into consideration when retrieving and ranking results. Given the fact that most users prefer live music videos with better audio quality, we propose the first automatic, non-reference audio quality assessment framework for live music video search online. We first construct two annotated datasets of live music recordings. The first dataset contains 500 human-annotated pieces, and the second contains 2,400 synthetic pieces systematically generated by adding noise effects to clean recordings. Then we formulate the assessment task as a ranking problem and try to solve it using a learning-based scheme. To validate the effectiveness of our framework, we perform both objective and subjective evaluations. Results show that our framework significantly improve the ranking performance of live music recording retrieval and can prove useful for various real-world music applications.
"Joint Management of Performance-predictable Virtualized Storage Devices with Hard Disk and Flash Memory," ACM Research in Adaptive and Convergent Systems (RACS), October 2013.
Authors: Po-Chun Huang, Yuan-Hao Chang, Tei-Wei Kuo, Chien-Chung Ho, and Hyunseung Choo

Abstract:
Recently, the significance of storage virtualization has been highlighted due to the growing performance demands of next-generation applications. However, with its unpredictable long seek time and rotational delays, hard disk fails to provide sufficient performance guarantees to fulfill the service-level objectives of the applications, especially on random accesses. To resolve the random-access problem of hard disk, it is a common approach to adopt flash memory in a hard-disk-based storage system. Nevertheless, flash memory requires garbage collection to reclaim the space with the invalid data, where the garbage collection process might introduce long and unpredictable time overheads and considerably reduce the performance benefits of flash memory. This work proposes a joint management strategy for virtualized storage systems with hard disk and flash memory. The major idea of this work is, to exploit flash memory to enhance the random access performance of hard disk, while exploiting hard disk to aid flash memory in avoiding the worst-case delays incurred by garbage collection. Analytical and experimental studies showed that the resulted virtualized storage device shows surprising behaviors of the worst-case performance higher than both hard disk and flash memory.
"Spatial Search for K Diverse-Near Neighbors," ACM Conference on Information and Knowledge Management (ACM CIKM), October 2013.
Authors: G. Ference, W.-C. Lee, H.-J. Hung, and D.-N. Yang

Abstract:
To many location-based service applications that prefer diverse results, finding locations that are spatially diverse and close in proximity to a query point (e.g., the current location of a user) can be more useful than finding the nearest neighbors/locations. In this paper, we investigate the problem of searching for the Diverse-Near Neighbors (DNNs) in spatial space that is based upon the spatial diversity and proximity of candidate locations to the query point. While employing a conventional distance measure for proximity, we develop a new and intuitive diversity metric based upon the variance of the angles among the candidate locations with respect to the query point. Accordingly, we create a dynamic programming algorithm that _finds the optimal DNNs. Unfortunately, the dynamic programming algorithm, with a time complexity of O(kn^3), incurs excessive computational cost. Therefore, we further propose two heuristic algorithms, namely, Distance-based Browsing (DistBrow) and Diversity-based Browsing (DivBrow) that provide high effectiveness while being efficient by exploring the search space prioritized upon the proximity to the query point and spatial diversity, respectively. Using real and synthetic datasets, we conduct a comprehensive performance evaluation. The results show that DistBrow and DivBrow have superior effectiveness compared to state-of-the-art algorithms while maintaining high efficiency.
"A Disturb-Alleviation Scheme for 3D Flash Memory," ACM/IEEE International Conference on Computer-Aided Design (ICCAD), November 2013.
Authors: Yu-Ming Chang, Yuan-Hao Chang, Tei-Wei Kuo, Hsiang-Pang Li, and Yung-Chun Li

Abstract:
Even though 3D flash memory presents a grand opportunity for huge-capacity non-volatile memory, it suffers from serious program disturb problems. Different from the past efforts in error correction codes or the work in trading the space utilization with reliability, we propose a disturb-alleviation scheme that can alleviate the negative effects caused by program disturb, especially inside a block, without introducing extra overheads on encoding or storing of extra redundant data. In particular, a methodology is proposed to reduce the data error rate by distributing unavoidable disturb errors over the flash-memory space of invalid data, with the considerations of the physical organization of 3D flash memory. A series of experiments was conducted based on real multi-layer 3D flash chips, and it showed that the proposed scheme could significantly enhance the reliability of 3D flash memory.
"Size Does Matter: How Does Image Display Size Affect Aesthetic Perception?," Proceedings of ACM Multimedia 2013, October 2013.
Authors: Wei-Ta Chu, Yu-Kuang Chen, and Kuan-Ta Chen

Abstract:
It is undoubtedly that an image's content determines how people assess the image's aesthetic level. Previous works have shown that image contrast, saliency features, and the composition of objects may jointly decide whether an image looks good or not. In addition to the "content" of an image, however, the way an image is "presented" to viewers may also affect how much it is appreciated. For example, one might expect a picture always looks better when it is displayed in a larger size. Is this "the-bigger-the-better" rule always true? If not, under what situations it becomes invalid?

This paper devotes to analyze how an image's resolution (pixels) and physical dimension (inches) affect how much viewers appreciate this image. Based on a large-scale aesthetic assessments of 100 images displayed in a variety of resolutions and physical dimensions, we show that an image's display size significantly affects its aesthetic rating in a complicated way; normally a picture looks better with a bigger display size, but it may look relatively worse depending on its content. We develop a set of regression models to predict a picture's absolute and relative aesthetic levels at a given display size based on its content and compositional features, and, simultaneously, we analyze the essential features that lead to the size-dependent property of image aesthetics. We hope that this work will shed some light on future research by revealing that both content and presentation should be considered in image aesthetics evaluation.
Current Research Results
"Multicast with Intra-Flow Network Coding in Multi-Rate Multi-Channel Wireless Mesh Networks," IEEE Trans. on Vehicular Technology, November 2013.
Authors: C.-J Lin and D.-N. Yang

Abstract:
It has been shown that intra-flow network coding can improve the multicast throughput in single-channel single rate wireless mesh networks. Most existing studies focus on developing a practical intra-flow network-coding multicast protocol. With the ripe of the wireless products supporting multi-rate multi-channel communications, nowadays it becomes increasingly important to improve the network coding gain by utilizing multiple bit-rates and non-overlapping channels, but new challenges also arise due to various tradeoffs. In a multi-rate, multi-channel, multi-hop network, overhearing opportunities and bandwidth utilization are highly correlated to the selection of the channel and rate, which, in turn, determines the achievable throughput of multicast with intra-flow network coding. Specifically, wireless nodes can transmit at higher bit-rates over orthogonal channels to increase the forwarding throughput, but the strategy reduces overhearing opportunities and the coding gain in multicast with intra-flow network coding. To achieve the best trade-off between the above two aspects, we formulate a new optimization model called multi-Rate multi-Channel Multicast with intra-flow Network Coding (RCMNC), which solves the joint channel assignment, rate selection and flow allocation problems for multi-hop intraflow network coding multicast. We then reformulate the primal problem as the Lagrangian dual problem, and propose an algorithm to approximate the solution of the dual problem in polynomial-time. Simulation results show that, by assigning a suitable rate and channel to each network interface, multicast throughput with intra-flow network coding under RCMNC is up to 3.3 times higher than that achieved by the existing approach in a multi-rate multi-channel wireless mesh network.
Current Research Results
"Structural Diversity for Resisting Community Identification in Published Social Networks," IEEE Trans. on Knowledge and Data Engineering, October 2013.
Authors: C.-H. Tai, P. S. Yu, D.-N. Yang, and M.-S. Chen

Abstract:
Recently, the privacy issues about the individuals in the social networks have become serious concerns. Vertex identification is one of the most important problems that have been addressed. In reality, however, each individual in a social network is inclined to be associated with not only a vertex identity but also a community identify, which could represent the personal privacy information sensitive to the public, such as the political party affiliation. This paper first addresses the new privacy issue, referred to as community identification, by showing that the community identity of a victim can still be inferred even though the social network is protected by existing anonymity schemes. For this problem, we then propose the concept of structural diversity to provide the anonymity of the community identities. The k-Structural Diversity Anonymization (k-SDA) is to ensure sufficient vertices with the same vertex degree in at least k communities in a social network. We propose an Integer Programming formulation to find optimal solutions to k-SDA and also devise scalable heuristics to solve large-scale instances of k-SDA with different perspectives. The performances studies on real data sets from various perspectives demonstrate the practical utility of the proposed privacy schema and our anonymization approaches.