2012 Technical Report

Code

TR-IIS-12-001

Subject / Author / Abstract

Construction of Gene Clusters Resembling Genetic Causal Mechanisms for Common Complex Disease with an Application to Young-Onset Hypertension
Ke-Shiuan Lynn, Chen-Hua Lu, Han-Ying Yang, Wen-Lian Hsu and Wen-Harn Pan

Motivation: Lack of power and reproducibility are caveats of genetic association studies of common complex diseases. Indeed, the heterogeneity of disease etiology demands that causal models consider the simultaneous involvement of multiple genes. Rothman’s sufficient-cause model, which is well known in epidemiology, provides a framework for such a concept. In the present work, we developed a three-stage algorithm to construct gene clusters resembling Rothman’s causal model for a complex disease, starting from finding influential gene pairs followed by grouping homogeneous pairs.

Result: The algorithm was trained and tested on 2,772 hypertensives and 6,515 normotensives extracted from four large Caucasian and Taiwanese databases. The constructed clusters, each featured by a major gene interacting with many other genes and identified a distinct group of patients, reproduced in both ethnic populations and across three genotyping platforms. We present the 14 largest gene clusters which were capable of identifying 19.3% of hypertensives in all the datasets and 41.8% if one dataset was excluded for lack of phenotype information. Although a few normotensives were also identified by the gene clusters, they usually carried less risky combinatory genotypes (insufficient causes) than the hypertensive counterparts. After establishing a cut-off percentage for risky combinatory genotypes in each gene cluster, the 14 gene clusters achieved a classification accuracy of 82.8% for all datasets and 98.9% if the information-short dataset was excluded. Furthermore, not only 9 of the 14 major genes but also many other contributing genes in the clusters are associated with hypertension-related functions. Our results provide insights into polygenic aspect of hypertension etiology.

Availability: Supplementary Data Files and MATLAB files that generate Figs. 3-5 are available at http://ms.iis.sinica.edu.tw/genetic_causal_pies/index.htm.

Contact: pan@ibms.sinica.edu.tw or hsu@iis.sinica.edu.tw

Keywords: genetic causal pie, sufficient cause, data-mining, young-onset hypertension, complex disease

View

Fulltext

Code

TR-IIS-12-002

Subject / Author / Abstract

A Dynamic Binary Translation System in a Client/Server Environment
Ding-Yong Hong, Chun-Chen Hsu, Chao-Rui Chang, Jan-Jan Wu, Pen-Chung Yew, Wei-Chung Hsu, Pangfeng Liu, Chien-Min Wang, Yeh-Ching Chung.

With rapid advances in mobile computing, multi-core processors and expanded memory resources are being made available in new mobile devices. This trend will enable a wider range of existing applications to be migrated to mobile devices, for example, running desktop applications in IA-32 (x86) binaries on ARM-based mobile devices transparently using dynamic binary translation (DBT). However, the overall performance could significantly affect the energy consumption of the mobile devices because it is directly linked to the number of instructions executed and the overall execution time of the translated code. Hence, even though the capability of today’s mobile devices will continue to grow, the concern over anslation efficiency and energy consumption will put more constraints on a DBT for mobile devices, in particular, for thin mobile clients than that for severs. With increasing network accessibility and bandwidth in various environments, it makes many network servers highly reachable to thin mobile clients. Those network servers are usually equipped with a substantial amount of resources. This opens an opportunity for DBT on thin clients to leverage such powerful servers. However, designing such a DBT for a client/server environment requires many critical considerations.

In this work, we looked at those design issues and developed a distributed DBT system based on the client/server model. We proposed a DBT system that consists of two dynamic binary translators. An aggressive dynamic binary translator/optimizer to serve the translation/optimization requests from thin clients are run on the server. A thin DBT that executes light-weight binary translation and basic emulation functions is run on each thin client. With such a two-translator client/server approach, we successfully off-load the DBT overhead of the thin client to the server and achieve significant performance improvement over the non-client/server model. Experimental results show that the DBT of the client/server model could achieve 14% speedup over that of non-client/server model for x86-32 to ARM emulation using SPEC CINT2006 benchmarks with test inputs and are only 3.4X and 2.2X slower than the native execution with test and reference inputs, respectively, as opposed to 7.1X and 5.1X slow-down on QEMU.

View

Fulltext

Code

TR-IIS-12-003

Subject / Author / Abstract

Improving Region SelectionThrough Early-Exit Detection
Chun-Chen Hsu, Pangfeng Liu, Jan-Jan Wu, Chien-Min Wang, Ding-Yong Hong, Wei Chung Hsu

Many dynamic binary translation (DBT) systems and just-in-time compilers target traces, i.e. frequently-taken execution paths, as code regions to be translated/optimized. The Next-Tail-Execution (NET) trace selection method used in HP Dynamo is an early example of such techniques. Many current trace optimization schemes are actually variations of NET. These NET-like trace optimizations work very well for most traces, but they also suffer the same problem: the selected traces may contain a large number of early exits that could branch out in the middle of traces. If early exits are taken frequently during program execution, the benefit of trace optimization could be lost due to the overhead of costly compensation code in the trace epilogue. We refer to traces/regions with frequently taken earlyexits as delinquent traces/regions. Our empirical study shows that at least 9 of the 12 SPEC CPU2006 integer benchmarks have delinquent traces, i.e., if we use NET to select traces, each of these nine benchmarks will take more than 100 early exits per million executed instructions in their traces.

In this paper, we significantly improve the performance of NET by merging delinquent traces into larger code regions. We propose a light-weight region formation technique called Early-Exit Guided region selection (EEG)to improve the performance by iteratively detecting and merging delinquent regions into larger code regions. Hardware assisted dynamic profiling is first used to identify hot code regions without incurring significant runtime overhead. Key software counters are then instrumented at the exit points of the hot regions to detect early exits. When a counter exceeds certain threshold value, the code region that begins at the branch target of that early exit is merged into the main code region. We also employ a heuristic to decide whether it is beneficial to merge the selected regions or not. We will not merge two regions if the cost of spill code is too high for the merged code.

We implement our EEG algorithm in two LLVM-based parallel dynamic binary translators. These two parallel dynamic binary translators are for ARM and IA32 instruction set architecture (ISA) respectively, and both use multiple compilation threads to compile different code regions concurrently. We evaluate the performance of EEG with two benchmark suites: the SPEC CPU2006 single-threaded benchmark suite with reference inputs, and the PARSEC multi-threaded benchmarks with na- tive inputs. The experimental results show that, compared to NET, EEG can achieve a performance improvement of up to 67% (13% on average) for SPEC CPU2006 integer benchmarks, and up to 20% (10% on average) for PARSEC multi-threaded benchmarks.

View

Fulltext

Code

TR-IIS-12-004

Subject / Author / Abstract

Ubiquitous Smart Devices and Applications for Disaster Preparedness
W. P. Liao, Y. Z. Ou, E. T. H. Chu, C. S. Shih and J. W. S. Liu

Recent advances in disaster prediction and detection technologies and ICT support infrastructures have enabled the generation and reliable deliveries of machine-readable early disaster alerts over all communication pathways. The emergence of ubiquitous smart devices and applications that can receive, authenticate and process standard-conforming disaster alert messages and respond by taking appropriate actions to help us to be better prepared for nature disasters is a natural next step in the advancement of disaster management technologies. Such smart devices and applications are called iGaDs (intelligent Guards against Disasters). This paper describes reference architecture, key components and design of iGaDs in general and an ASIC enhancement of battery-powered iGaDs.

View

Fulltext

Code

TR-IIS-12-005

Subject / Author / Abstract

Ranking and Selecting Features Using an Adaptive Multiple Feature Subset Method
Fu Chang and Chan-Cheng Liu

We propose a method called adaptive multiple feature subset (AMFES), which ranks and selects features at a reasonable computation cost. In the AMFES ranking procedure, we conduct an iterative process. At the initial stage, we compute features’ strength (i.e., degree of usefulness) based on various subsets drawn from the pool of all features. We then rank features according to their strength thus derived. At each subsequent stage, we input half of features that were top-ranked from the previous stage. We then re-rank them in the same fashion as we do at the first stage. In the AMFES selection procedure, we conduct a sequential batch search to tremendously reduce the computation cost. Compared with a few other methods, we show by experiments that AMFES achieves higher or comparable test accuracy rates. When achieving comparable rates, AMFES selects a smaller number of features. We argue that the employment of multiple feature subsets can diminish the ill effect of feature correlation, which explains the advantage of AMFES over other methods on the experimental data sets. Keywords: AMFES, adaptive multiple feature subset, feature correlation, feature ranking, feature selection.

View

Fulltext

Code

TR-IIS-12-006

Subject / Author / Abstract

A Novel Approach for Efficient Big Data Broadcasting
Chi-Jen Wu, Chin-Fu Ku, Jan-Ming Ho and Ming-Syan Chen

Big-Data Computing is a new critical challenge for the ICT industry. Engineers and researchers are dealing with data sets of petabyte scale in the cloud computing paradigm. Thus the demand for building a service stack to distribute, manage and process massive data sets has risen drastically. In this paper, we investigate the Big Data Broadcasting problem for a single source node to broadcast a big chunk of data to a set of nodes with the objective of minimizing the maximum completion time. These nodes may locate in the same datacenter or across geo-distributed datacenters. This problem is one of the fundamental problems in distributed computing and is known to be NP-hard in heterogeneous environments. We model the Big-data broadcasting problem into a LockStep Broadcast Tree (LSBT) problem.

The main idea of the LSBT model is to define a basic unit of upload bandwidth, r, such that a node with capacity c broadcasts data to a set of ?c=r? children at the rate r. Note that r is a parameter to be optimized as part of the LSBT problem. We further divide the broadcast data into m chunks. These data chunks can then be broadcast down the LSBT in a pipeline manner. In a homogeneous network environment in which each node has the same upload capacity c, we show that the optimal uplink rate r of LSBT is either c=2 or c=3, whichever gives the smaller maximum completion time. For heterogeneous environments, we present an O(nlog2n) algorithm to select an optimal uplink rate r and to construct an optimal LSBT. Numerical results show that our approach performs well with less maximum completion time and lower computational complexity than other efficient solutions in literature.

View

Fulltext

Code

TR-IIS-12-007

Subject / Author / Abstract

A Framework for Fusion of Symbiotic Human Sensor and Physical Sensor Data
J.W. S. Liu, E. T.-H. Chu, and P. H. Tsai

Past experiences tell us that a disaster warning and response system can improve its surveillance coverage of the threatened area and situation awareness by supplementing in-situ and remote sensor data with human sensor data captured and sent by people in the area. This paper is concerned with fusion and processing methods with which the system can make use of human sensor data and physical sensor data synergistically to speed up the decision process and improve the quality of its decision. We formulate the problem in a statistical detection and estimation framework. Within this framework, value fusion and decision fusion of human sensor data and physical sensor data can be treated in a coherent way.

View

Fulltext

Institute of Information Science, Academia Sinica

Library

2012 Technical Report