Institute of Information Science
近期研究成果
Current Research Results
"A Mobile Proxy Architecture for Video Services over High-Speed Rail Environments in LTE-A Networks," IEEE Systems Journal, To Appear.
Authors: Ming-Ching Chuang, Meng Chang Chen

Meng ChangChenImageAbstract:
With the rapid developments in wireless communication technology, people access the Internet applications anytime and anywhere. Recently, academics and industry researchers are paying increasing attention to the video streaming services over high speed rail environments. For providing the quality-of-service (QoS) of video streaming, we present a novel mobile proxy architecture for improving the system performance over high-speed trains in LTE-A networks. The mobile proxy not only saves the wireless bandwidth consumption between the base station and the high speed train but also reduces the starting delay time of the video. Moreover, we propose a score-based replacement (SBR) scheme in the memory management of the proxy server based on the feature of the video. Our simulation results demonstrate that the proposed scheme provides a better performance than other existing cache management schemes.
Current Research Results
"Slow-Paced Persistent Network Attacks Analysis and Detection Using Spectrum Analysis," IEEE Systems Journal, To Appear.
Authors: Li-Ming Chen, Meng Chang Chen, WanJiun Liao

Meng ChangChenAbstract:
A slow-paced persistent attack, such as slow worm or bot, can bewilder the detection system by slowing down their attack. Detecting such attacks based on traditional anomaly detection techniques may yield high false alarm rates. In this paper, we frame our problem as detecting slow-paced persistent attacks from a time series obtained from network trace. We focus on time series spectrum analysis to identify peculiar spectral patterns that may represent the occurrence of a persistent activity in the time domain. We propose a method to adaptively detect slow-paced persistent attacks in a time series and evaluate the proposed method by conducting experiments using both synthesized traffic and real-world traffic. The results show that the proposed method is capable of detecting slow-paced persistent attacks even in a noisy environment mixed with legitimate traffic.
"Warranty-Aware Page Management for PCM-Based Embedded Systems," ACM/IEEE International Conference on Computer-Aided Design (ICCAD), November 2014.
Authors: Sheng-Wei Cheng, Yu-Fen Chang, Yuan-Hao Chang, Shin-Wen Wei, and Wei-Kuan Shih

Yuan-HaoChangAbstract:
The thriving growth in mobile consumer electronics keeps the energy efficiency in the embedded system design a recurring theme. PCM main memory has shown its potential in replacing DRAM due to the huge amount of energy reduction, e.g. 65%. When considering the usage of PCM main memory, its write endurance becomes a critical issue, and wear leveling design is a common approach to resolve this issue. Even though the wear leveling design should emphasize on the operation efficiency and overhead reduction, existing wear leveling strategies designed for PCM main memory are usually enforced to prolong the lifetime of PCM in best effort. In this paper, we propose an idea that, instead of valuing PCM lifetime exploitation at first priority, we should turn to satisfy the least product time guarantee. To this end, further overhead reduction and operation efficiency enhancement are possible. We propose a warranty-aware page management design to mitigate the operation overhead required for managing the endurance issue in PCM. For overloaded pages, a cooling-based wear leveling is proposed to keep the pages resting for a period of time. In addition, an auxiliary wear-leveling is proposed to proactively free pages that are possessed by some processes but are rarely written. To show the effectiveness of the proposed design, we collect real traces on QEMU by running SPEC2006 benchmarks with different write intensity workloads. The experiment results showed the cooling-based design reduced the overhead to one third of that of the state-of-the-art designs while having the same level of performance.
"Verifying Curve25519 Software," ACMCCS 2014, November 2014.
Authors: Yu-Fang Chen, Chang-Hong Hsu, Hsin-Hung Lin, Peter Schwabe, Ming-Hsien Tsai, Bow-Yaw Wang, Bo-Yin Yang, and Shang-Yi Yang

Bo-YinYangBow-YawWangYu-FangChenAbstract:
This paper presents results on formal verification of high-speed cryptographic software.
We consider speed-record-setting hand-optimized assembly software for Curve25519 elliptic-curve key exchange   software was presented by Bernstein, Duif, Lange, Schwabe, and Yang in 2011 Two versions for different microarchitectures are available.
We successfully verify the core part of computation, and reproduce a bug in a previously published edition.
An SMT solver for the bit-vector theory is used to establish almost all properties. Remaining properties are verified in a proof assistant.
We also exploit the compositionality of Hoare logic to address the scalability issue. Essential differences between both versions of the software are discussed from a formal-verification perspective.
"Leveraging Effective Query Modeling Techniques for Speech Recognition and Summarization," the Conference on Empirical Methods in Natural Language Processing (EMNLP 2014), October 2014.
Authors: Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Ea-Ee Jan, Hsin-Min Wang, Wen-Lian Hsu, and Hsin-Hsi Chen

Wen-LianHsuHsin-MinWangAbstract:
Statistical language modeling (LM) that purports to quantify the acceptability of a given piece of text has long been an interesting yet challenging research area. In particular, language modeling for information retrieval (IR) has enjoyed remarkable empirical success; one emerging stream of the LM approach for IR is to employ the pseudo-relevance feedback process to enhance the representation of an input query so as to improve retrieval effective-ness. This paper presents a continuation of such a general line of research and the main contribution is three-fold. First, we propose a principled framework which can unify the relationships among several widely-used query modeling for-mulations. Second, on top of the successfully de-veloped framework, we propose an extended query modeling formulation by incorporating critical query-specific information cues to guide the model estimation. Third, we further adopt and formalize such a framework to the speech recognition and summarization tasks. A series of empirical exper-iments reveal the feasibility of such an LM framework and the performance merits of the de-duced models on these two tasks.
Authors: Wei-Chun Chung, Yu-Jung Chang, D. T. Lee, and Jan-Ming Ho

Jan-MingHoDer- TsaiLeeImageImageAbstract:
Next-generation sequencing (NGS) data is a typical type of big data in science. Data growth rapidly since the throughput of NGS doubles about every five months. NGS reads are getting longer with unavoidably sequencing errors. Error correction is one of the critical steps to the success of NGS applications such as de novo genome assembly and DNA resequencing. New improvements are demanded on both efficiency and effectiveness, especially for correcting the long reads. In this
paper, we study the design of efficient and effective computational strategies to improve performance of CloudRS, a open-sourc e MapReduce application designed to correct sequencing errors of NGS data. We introduce the RM diagram to represent the key-value pairs generated on each NGS read. We also present several schemes to trim off the RM diagram, and thus to reduce the size of each message. The experimental results show that our proposed schemes reduce the execution time, and improve the quality of the de novo genome assembly.
"Distributed Algorithms for the Lovász Local Lemma and Graph Coloring," ACM Symposium on Principles of Distributed Computing (PODC), July 2014.
Authors: Kai-Min Chung and Seth Pettie and Hsin-Hao Su

Kai-MinChungAbstract:

The Lovász Local Lemma (LLL), introduced by Erdos and Lovász in 1975, is a powerful tool of the probabilistic method that allows one to prove that a set of n "bad" events do not happen with non-zero probability, provided that the events have limited dependence.  However, the LLL itself does not suggest how to find a point avoidingall bad events. Since the work of Beck (1991) there has been a sustained effort in finding a constructive proof (i.e. an algorithm) for the LLL or weaker versions of it.  In a major breakthrough Moser and Tardos (2010) showed that a point avoiding all bad events can be found efficiently. They also proposed a distributed/parallel version of their algorithm that requires $O(log^2 n)$  rounds of communication in a distributed network.
 
In this paper we provide two new distributed algorithms for the LLL that improve on both the efficiency and simplicity 
of the Moser-Tardos algorithm.  For clarity we express our results in terms of the symmetric LLL though both algorithms deal with the asymmetric version as well. Let p bound the probability of any bad event and d be the maximum degree in the dependency graph of the bad events. When $epd^2 < 1$ we give a truly simple LLL algorithm running in $O(log_{1/epd^2} n)$ rounds. Under the tighter condition ep(d+1)<1, we give a slightly slower algorithm running in O(\log^2 d \cdot \log_{1/ep(d+1)} n)  rounds. Furthermore, we gave an algorithm that runs in sublogarithmic  rounds under the condition p \cdot  f(d) < 1, where $f(d)$ is an exponential function of d. Although the conditions of the LLL are locally verifiable, we prove that any distributed LLL algorithm requires \Omega(\log^* n) rounds.
 
In many graph coloring problems the existence of a valid coloring is established by one or more applications of the LLL. Using our LLL algorithms, we give logarithmic-time distributed algorithms for frugal coloring, defective coloring, coloring girth-4 (triangle-free) and girth-5 graphs, edge coloring, and list coloring.
"Screencast in the Wild: Performance and Limitations," Proceedings of ACM Multimedia 2014, November 2014.
Authors: Chih-Fan Hsu, De-Yu Chen, Chun-Ying Huang, Cheng-Hsin Hsu, and Kuan-Ta Chen

Sheng-Wei(Kuan-Ta)ChenAbstract:
Displays without associated computing devices are increasingly more popular, and the binding between computing devices and displays is no longer one-to-one but more dynamic and adaptive.
 
Screencast technologies enable such dynamic binding over ad hoc one-hop networks or Wi-Fi access points. In this paper, we design and conduct the first detailed measurement study on the performance of state-of-the-art screencast technologies. By varying the user demands and network conditions, we find that Splashtop and Miracast outperform other screencast technologies under typical setups. Our experiments also show that the screencast technologies either: (i) do not dynamically adjust bitrate or (ii) employ a suboptimal adaptation strategy. The developers of future screencast technologies are suggested to pay more attentions on the bitrate adaptation strategy, e.g., by leveraging cross-layer optimization paradigm. 
"An Engergy-efficient Task Scheduler for Multicore Platforms with per-core DVFS Based on Task Characteristics," International Conference on Parallel Processing (ICPP), September 2014.
Authors: Ching-Chi Lin, Chao-Jui Chang, You-Chen Syu, Jan-Jan Wu, Po-Wen Cheng, Wei-Te Hsu

Jan-JanWuImageImageAbstract:
Abstract—Energy-efficient task scheduling is a fundamental issue in many application domains, such as energy conservation for mobile devices and the operation of green computing data centers. Modern processors support dynamic voltage and frequency scaling (DVFS) on a per-core basis, i.e., the CPU can adjust the voltage or frequency of each core. As a result, the core in a processor may have different computing power and energy consumption. To conserve energy in multi-core platforms, we propose task scheduling algorithms that leverage per-core DVFS and achieve a balance between performance and energy consumption. We consider two task execution modes: the batch mode, which runs jobs in batches; and the online mode in which jobs with different time constraints, arrival times, and computation workloads co-exist in the system.
For tasks executed in the batch mode, we propose an algorithm that finds the optimal scheduling policy; and for the online mode, we present a heuristic algorithm that determines the execution order and processing speed of tasks in an online fashion. The heuristic ensures that the total cost is minimal for every time interval during a task’s execution.
"Comparing Profitability of Day Trading Using ORB Strategies on Index Futures Markets in Taiwan, Hong-Kong and USA," The 10th Annual Conference of the Asia-Pacific Association of Derivatives, August 2014.
Authors: Yi-Cheng Tsai, Mu-En Wu, Chin-Laung Lei, Chung-Shu Wu, and Jan-Ming Ho

Jan-MingHoAbstract:
In literature, it has been shown that market behavior is the aggregation of investors’ responses to information. Although the markets do not respond in real time to events occur after the market, it seems that information is picked up by the market in early stage of market opening, and has an impact on market dynamics over the same day. In this paper, we present our studies on the design of timing parameters of ORB strategies in index futures markets. We first show that, based on cross-sectional analysis on 10-year historical data, trading volume and fluctuation of return on each one-minute interval of trading hours of the futures markets reach their peaks at the opening and closing of the underlying stock markets. Based on these observations, we define the active period of an index futures market to align with the underlying stock market. We test ORB strategies on the index futures of DJIA, S&P, NASDAQ, HIS, and TAIEX from 2003 to 2013. We find that by optimizing the trading timing, ORB achieves over 10% annual return with p-value no greater than 2% in each futures market. The best performance, 22.6% annual return at p-value of 1×10-8, occurs in TAIEX.
Current Research Results
"Generalized k-Labelsets Ensemble for Multi-Label and Cost-Sensitive Classification," IEEE Transactions on Knowledge and Data Engineering, July 2014.
Authors: Hung-Yi Lo, Shou-De Lin, and Hsin-Min Wang

Hsin-MinWangAbstract:
Label powerset (LP) method is one category of multi-label learning algorithm. This paper presents a basis expansions
model for multi-label classification, where a basis function is an LP classifier trained on a random k-labelset. The expansion coefficients are learned to minimize the global error between the prediction and the ground truth. We derive an analytic solution to learn the coefficients efficiently. We further extend this model to handle the cost-sensitive multi-label classification problem, and apply it in social tagging to handle the issue of the noisy training set by treating the tag counts as the misclassification costs. We have conducted experiments on several benchmark datasets and compared our method with other state-of-the-art multi-label learning methods. Experimental results on both multi-label classification and cost-sensitive social tagging demonstrate that our method has better performance than other methods.
Current Research Results
"Garbage Collection for Multiversion Index in Flash-based Embedded Databases," ACM Transactions on Design Automation of Electronic Systems, June 2014.
Authors: Po-Chun Huang, Yuan-Hao Chang, Kam-Yiu Lam, Jian-Tao Wang, and Chien-Chin Huang

Yuan-HaoChangAbstract:
Recently, flash-based embedded databases have gained their momentum in various control and monitoring systems, such as cyber-physical systems (CPSes). To support the functionality to access the historical data, a multiversion index is adopted to simultaneously maintain multiple versions of data items, as well as their index information. However, maintaining a multiversion index on flash memory incurs considerable performance overheads on garbage collection, which is to reclaim the space occupied by the outdated/invalid data items and their index information on flash memory. In this work, we propose an efficient garbage collection strategy to solve the garbage collection issues of flash-based multiversion databases. In particular, a version-tracking method is proposed to accelerate the performance on the process on identifying/reclaiming the space of invalid data and their indexes, and a pre-summary method is also designed to solve the cascading update problem that is caused by the write-once nature of flash memory and is worsened when more versions refer to the same data item. The capability of the proposed strategy is then verified through a series of analysis.
Current Research Results
"Exploring Sequential Probability Tree for Movement-based Community Discovery," IEEE Transactions on Knowledge and Data Engineering, To Appear.
Authors: Wen-Yuan Zhu, Wen-Chih Peng, Po-Ruey Lei, and Ling-Jyh Chen

Ling-JyhChenAbstract:
In this paper, we tackle the problem of discovering movement-based communities of users, where users in the same community have similar movement behaviors. Note that the identification of movement-based communities is beneficial to location-based services and trajectory recommendation services. Specifically, we propose a framework to mine movementbased communities which consists of three phases: 1) constructing trajectory profiles of users, 2) deriving similarity between trajectory profiles, and 3) discovering movement-based communities. In the first phase, we design a data structure, called the Sequential Probability tree (SP-tree), as a user trajectory profile. SP-trees not only derive sequential patterns, but also indicate transition probabilities of movements. Moreover, we propose two algorithms: BF (standing for Breadth-First) and DF (standing for Depth-First) to construct SP-tree structures as user profiles. To measure the similarity values among users’ trajectory profiles, we further develop a similarity function that takes SP-tree information into account. In light of the similarity values derived, we formulate an objective function to evaluate the quality of communities. According to the objective function derived, we propose a greedy algorithm Geo-Cluster to effectively derive communities. To evaluate our proposed algorithms, we have conducted comprehensive experiments on two real datasets. The experimental results show that our proposed framework can effectively discover movement-based user communities.
Current Research Results
Authors: Chia-Hao Chin, Shu-Hwa Chen, Hsin-Hung Wu, Chin-Wen Ho, Ming-Tat Ko *, and Chung-Yen Lin

Chung-YenLinMing-TatKoSophiaJoviceAbstract:
Background
Network is a useful way for presenting many types of biological data including protein-protein interactions, gene regulations, cellular pathways, and signal transductions. We can measure nodes by their network features to infer their importance in the network, and it can help us identify central elements of biological networks.
Results
We introduce a novel Cytoscape plugin cytoHubba for ranking nodes in a network by their network features. CytoHubba provides 11 topological analysis methods including Degree, Edge Percolated Component, Maximum Neighborhood Component, Density of Maximum Neighborhood Component, Maximal Clique Centrality and six centralities (Bottleneck, EcCentricity, Closeness, Radiality, Betweenness, and Stress) based on shortest paths. Among the eleven methods, the new proposed method, MCC, has a better performance on the precision of predicting essential proteins from the yeast PPI network.
Conclusions
CytoHubba provide a user-friendly interface to explore important nodes in biological networks. It computes all eleven methods in one stop shopping way. Besides, researchers are able to combine cytoHubba with and other plugins into a novel analysis scheme. The network and sub-networks caught by this topological analysis strategy will lead to new insights on essential regulatory networks and protein drug targets for experimental biologists. Cytohubba is available as cytoscape plug-in and can be accessed freely at http://hub.iis.sinica.edu.tw/cytohubba/ for more detail. According to cytoscape plugin download statistics, the accumulated number of cytohubba is around 6,500 times since 2010.
"A PCM Translation Layer for Integrated Memory and Storage Management," ACM/IEEE International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), October 2014.
Authors: Bing-Jing Chang, Yuan-Hao Chang, Hung-Sheng Chang, Tei-Wei Kuo, and Hsiang-Pang Li

Yuan-HaoChangAbstract:
Phase change memory (PCM) is known for its potentials as main memory and as storage. In contrast to the past work, this work presents a PCM translation layer that considers how the main memory is used by the operating system together with the usage patterns of storage by trading their performance and reliability. In particular, a joint management scheme is proposed to improve the capability of PCM as both main memory and storage so as to enhance the performance of the entire system. The endurance issue of PCM for main memory is resolved by utilizing the potentially large capacity of the PCM storage space with limited modifications to the existing operating system implementations. Moreover, three commands are proposed to help operating system engineers to take advantage of PCM as main memory and storage by reducing I/O overheads and speeding up both the initialization and termination of program executions. The experimental results show that the performance of PCM as both main memory and storage can be significantly improved with reasonable lifetime, while the system overhead is very limited under coarse-grained wear leveling.
"Current-Aware Scheduling for Flash Storage Devices," IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), August 2014.
Authors: Tzu-Jung Huang, Chien-Chung Ho, Po-Chun Huang, Yuan-Hao Chang, Che-Wei Chang, and Tei-Wei Kuo

Yuan-HaoChangAbstract:
As NAND flash memory has become a major choice of storage media in diversified computing environments, the performance issue of flash memory has been extensively addressed, and many excellent designs have been proposed. Among such designs, it is an effective strategy to adopt multiple channels and flash-memory chips to facilitate parallelized data accesses, so as to improve the throughput of the flash-memory storage device. However, the degree of data access parallelism cannot be increased by simply increasing the number of channels and chips in the storage device, because it is seriously limited by the maximum current constraint of the bus and affected by the access patterns of user data. As a consequence, to maximize the degree of access parallelism, it is of paramount significance to have a proper scheduling strategy to determine the order that read/write requests are served. In this paper, a current-aware scheduling strategy for read/write requests is proposed to maximize the performance of read requests, provided that the maximum current constraint of the bus is not violated and the deadline violations of write requests are minimized. The proposed strategy is then evaluated through a series of experiments, where the results are quite encouraging.
"Content-based Feature Matching for Fragment Reassembly of Ceramic Reconstruction," Proceedings International Agent Technology, IAT 2014 - The 2014 Web Intelligence Congress., August 2014.
Authors: Shu-Yu Lin, Han-Jen Ku, Chao-Lung Ting, Ray-I Chang, Yu-Chun Wang, and Jan-Ming Ho

Jan-MingHoAbstract:
A large amount of fragments of different ceramic objects are usually found during archaeological excavation.
Finding the relation between these fragments is time-consuming and demands expert knowledge. In this paper, we study a related problem known as ceramic reconstruction. We present a novel content-based feature matching method and develop the ceramic fragment reassembly system (CFRS) for ceramic reconstruction. CFRS can automatically reassemble fragments to make it easier for human experts to further improve the quality of reconstruction if necessary. CFRS consists of three modules: fragment preprocessing, candidate matching, and matching-candidate selection. The fragment preprocessing module extracts features from images of fragments. Then, the candidate-matching module picks up matching candidates based on curve matching. Finally, in the matching-candidate selection module, Binary Particle Swarm Optimization (BPSO) is applied to select the final matching results. Experimental results indicate that the proposed method attains not only higher precision but also less computation time than the previous approaches.
"An In-depth Study of Forecasting Household Electricity Demand using Realistic Datasets," ACM International Conference on Future Energy Systems (e-Energy'14), June 2014.
Authors: Chien-Yu Kuo, Ming-Feng Lee, Chia-Lin Fu, Yao-Hua Ho, and Ling-Jyh Chen

Ling-JyhChenAbstract:
Data analysis and accurate forecasts of electricity demand are crucial to help both suppliers and consumers understand their detailed electricity footprints and improve their awareness about their impacts to the ecosystem. Several studies of the subject have been conducted in recent years, but they are either comprehension-oriented without practical merits; or they are forecast-oriented and do not consider per-consumer cases. To address this gap, in this paper, we conduct data analysis and evaluate the forecasting of household electricity demand using three realistic datasets of geospatial and lifestyle diversity. We investigate the correlations between household electricity demand and different external factors, and perform cluster analysis on the datasets using an exhaustive set of parameter settings. To evaluate the accuracy of electricity demand forecasts in different datasets, we use the support vector regression method. The results demonstrate that the medium mean absolute percentage error (MAPE) can be reduced to 15.6% for household electricity demand forecasts when proper configurations are used.
Current Research Results
Authors: Nicholas C Wu, Arthur P Young, Laith Q Al-Mawsawi, C Anders Olson, Jun Feng, Hangfei Qi, Shu-Hwa Chen, I-Hsuan Lu, Chung-Yen Lin, Robert G Chin, Harding H Luan, Nguyen Nguyen, Stanley F Nelson, Xinmin Li, Ting-Ting Wu, Ren Sun

Chung-YenLinImageImageAbstract:
Genetic research on influenza virus biology has been informed in large part by nucleotide variants present in seasonal or pandemic samples, or individual mutants generated in the laboratory, leaving a substantial part of the genome uncharacterized. Here, we have developed a single-nucleotide resolution genetic approach to interrogate the fitness effect of point mutations in 98% of the amino acid positions in the influenza A virus hemagglutinin (HA) gene. Our HA fitness map provides a reference to identify indispensable regions to aid in drug and vaccine design as targeting these regions will increase the genetic barrier for the emergence of escape mutations. This study offers a new platform for studying genome dynamics, structure-function relationships, virus-host interactions, and can further rational drug and vaccine design. Our approach can also be applied to any virus that can be genetically manipulated.
Current Research Results
"Outstanding Principal as Prepayment Value: A Closed-Form Formula for Mortgage Pricing," Journal of Information Science and Engineering, 2014.
Authors: Yi-Cheng Tsai, Chin-Laung Lei, Jan-Ming Ho, Ming-Yang Kao, and Szu-Lang Liao

ImageImageJan-MingHoImageAbstract:
Mortgage is one of the most popular instruments in the financial markets. In this study, we consider three actions, i.e., to default, to prepay, and to maintain the mortgage, that a borrower may have in mortgage horizon. We provide an effective pricing formula, which not only considers the effect that default might affect the mortgage value, but also accurately computes the impact due to prepayment risk. Our model defines the prepayment value of the mortgage as the amount of outstanding principal, in contrast to defining prepayment value as a constant proportion of maintaining value of the mortgage. We present a new closed-form pricing
formula of risky mortgage and also derive its yield to maturity, duration and convexity to provide a framework for risk management.
"Dynamic Tail Packing to Optimize Space Utilization of File Systems in Embedded Computing Systems," IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), August 2014.
Authors: Nien-I Hsu, Tseng-Yi Chen, Yuan-Hao Chang, Hsin-Wen Wei, Wei-Kuan Shih, and Norman Chang

Yuan-HaoChangAbstract:
Embedded computing systems usually have limited computing power, RAM space, and storage capacity due to the consideration of their cost, energy consumption, and physical size. Some of them such as sensor nodes and embedded consumer electronics only have a small-sized flash memory as their storage with a (simple) file system to manage their data, which are usually of small sizes. However, the existing file systems usually have low space utilization on managing small files and the tail data of large files. In this work, we propose a dynamic tail packing scheme to optimize the space utilization of file systems by dynamically aggregating/packing the tail data of (small) files together. The proposed scheme was implemented in the file system of Linux operating systems to evaluate its capability. The results demonstrate that the proposed scheme could significantly improve the space utilization of existing file systems.
"Willingness Optimization for Social Group Activity," International Conference on Very Large Data Bases (VLDB), August 2014.
Authors: H.-H. Shuai, D.-N. Yang, P. S. Yu, and M.-S. Chen

Ming-SyanChenDe-NianYangAbstract:
Studies show that a person is willing to join a social group activity if the activity is interesting, and if some close friends also join the activity as companions. The literature has demonstrated that the interests of a person and the social tightness among friends can be effectively derived and mined from social networking websites. However, even with the above two kinds of information widely available, social group activities still need to be coordinated manually, and the process is tedious and time-consuming for users, especially for a large social group activity, due to complications of social connectivity and the diversity of possible interests among friends. To address the above important need, this paper proposes to automatically select and recommend potential attendees of a social group activity, which could be very useful for social networking websites as a value-added service. We first formulate a new problem, named Willingness mAximization for Social grOup (WASO). This paper points out that the solution obtained by a greedy algorithm is likely to be trapped in a local optimal solution. Thus, we design a new randomized algorithm to effectively and efficiently solve the problem. Given the available computational budgets, the proposed algorithm is able to optimally allocate the resources and find a solution with an approximation ratio. We implement the proposed algorithm in Facebook, and the user study demonstrates that social groups obtained by the proposed algorithm significantly outperform the solutions manually configured by users.
Current Research Results
Authors: Wei-Chun Chung, Chien-Chih Chen, Jan-Ming Ho, Chung-Yen Lin, Wen-Lian Hsu, Yu-Chun Wang, Der-Tsai Lee, Feipei Lai, Chih-Wei Huang and Yu-Jung Chang

ImageJan-MingHoImageImageImageImage
ImageDer- TsaiLeeWen-LianHsuAbstract:
Background:
Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and consequent computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce.
 
Results:
We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid a system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows a user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard.
Conclusions:
CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark.
Availability: CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/.
"Multiple Two-Phase Data Processing with MapReduce," Proceedings of the 2014 IEEE International Conference on Cloud Computing, June 2014.
Authors: Hsiang-Huang Wu, Tse-Chen Yeh, and Chien-Min Wang

Chien-MinWangAbstract:
MapReduce, proposed as a programming model, has been widely adopted in the field of text processing over large datasets with the capability of exploiting the distributed resources and processing the large-scale data. Attributed to its simplicity and scalability, the success seems to have the potential to make big data processing by cloud computing available. Nevertheless, such promise is accompanied by the difficulty of fitting the applications into MapReduce. This is
because MapReduce is limited to the kind of applications that every input key-value pair is independent of each other. In this paper, we extend the general applicability of MapReduce by allowing the dependence within a set of input key-value pairs but preserving independence among all sets. Such this new modeling paradigm intends MapReduce to shift processing the independent input key-value pairs to processing the independent sets. However, the advancement in the applicability brings the intricate problem of how two-stage processing structure, inherent in MapReduce, handles the dependence within a set of input key-value pairs.
To tackle this problem, we propose the design pattern called two-phase data processing. It expresses the application in two phases not only to match the two-stage processing structure but to exploit the power of MapReduce through the cooperation between the mappers and reducers. In addition, we present the design methodology—multiple two-phase data processing—to offer advice on processing the independent sets. The experiment of background subtraction, a part of video surveillance, proves that the new modeling paradigm broadens the possibilities of MapReduce and demonstrates how our design methodology guides the applications to the implementation.
Current Research Results
Authors: Shu-Hwa Chen, Kun-Lin Li, I-Hsuan Lu, Yu-Bin Wang,Che-Huang Tung, Hsiu-Chi Ting, Ching-Yi Lin, Chung-Yen Lin*, Yi-Hsien Su*, Jr-Kai Sky Yu

ImageImageImageImageImageAbstract:
Hemichordates are the sister group of echinoderms, and together they are closely related to  chordates within the deuterostome lineage. Therefore, hemichordates represent an important animal  group for the understanding of both the evolution of developmental mechanisms in deuterostome  animals and the origin of chordates. Recently, the majority of studies investigating hemichordates have  focused on the direct-developing enteropneust hemichordate Saccoglossus kowalevskii; few have  focused on the indirect-developing hemichordates, partly because of the lack of extensive genomic  resources in these animals. In this study, we report the sequencing and analysis of a transcriptome  from an indirect-developing enteropneust hemichordate Ptychodera flava. We sequenced a mixed  cDNA library from six developmental stages using the Roche GS FLX Titanium System to generate more  than 879,000 reads. These reads were assembled into 17,990 contigs with an average length of 1,316  bp. We found 60% of the assembled contigs, along with 28% of the unassembled singleton reads, had  significant hits to sequences in the NCBI database by a BLASTx search, and we also annotated these  sequences and obtained Gene Ontology (GO) terms for 6,744 contigs and 5,802 singletons. We further  identified candidate P. flava transcripts corresponding to genes involved in major developmental  signaling pathways, including the Wnt, Notch and TGF-β signaling pathways. Using available  genome/transcriptome datasets from the direct-developing hemichordate S. kowalevskii, the  echinoderm Strongylocentrotus purpuratus and the chordate Branchiostoma floridae, we found that  90%, 80% and 73% of the annotated protein sequences in these respective species matched our P.  flava transcriptome in a homology search. We also constructed a database for the P. flava  transcriptome, and researchers can easily access this dataset online. Our dataset significantly increases  the amount of available P. flava sequence data and can serve as a reference transcriptome for future  studies using this species. 
Current Research Results
Authors: Bryant Chen, William W.Y. Hsu, Jan-Ming Ho, Ming-Yang Kao

Jan-MingHoImageAbstract:
This paper proposes novel lattice algorithms to compute tail conditional expectation of European calls and puts in linear time. We incorporate the technique of prefix-sum into tilting, trinomial, and extrapolation algorithms as well as some syntheses of these algorithms. Furthermore, we introduce fractional-step lattices to help reduce interpolation error in the extrapolation algorithms. We demonstrate the efficiency and accuracy of these algorithms with numerical results. A key finding is that combining the techniques of tilting lattice, extrapolation, and fractional steps substantially increases speed and accuracy.
Current Research Results
"A Framework for Fusion of Human Sensor and Physical Sensor Data," IEEE Transactions on Systems, Man and Cybernetics: Systems, October 2014.
Authors: P. H. Tsai, Y.-C. Lin, Y. Z. Ou, E. T.-H. Chu, and J. W. S. Liu

Jane Win ShihLiuAbstract:

Many disaster warning and response systems can improve their surveillance coverage of the threatened area by supplementing in-situ and remote physical sensor data with crowdsourced human sensor data captured and sent by people in the area. This paper presents fusion methods which enable a crowdsourcing enhanced system to use human sensor data and physical sensor data synergistically to improve its sensor coverage and the quality of its decisions. The methods are built on results of classical statistical detection and estimation theory and use value fusion and decision fusion of human sensor data and physical sensor data in a coherent way. They are building blocks of a central fusion unit in a crowdsourcing support system for disaster surveillance and early warning applications.

"Human Action Recognition using Associated Depth and Skeleton Information," Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, May 2014.
Authors: Nick C. Tang, Yen-Yu Lin, Ju-Hsuan Hua, Ming-Fang Weng, H. Y. Mark Liao

MarkLiaoAbstract:
The recent advances in imaging devices have opened the opportunity of better solving computer vision tasks. The nextgeneration cameras, such as the depth or binocular cameras, capture diverse information, and complement the conventional 2D RGB cameras. Thus, investigating the yielded multi-modal images generally facilitates the accomplishment of related applications. However, the limitations of these devices, such as short effective distances, expensive costs, or long response time, degrade their applicability in practical use. Addressing this problem in this work, we aim at action recognition in RGB videos with the aid of Kinect. We improve recognition accuracy by leveraging information derived from an offline collected database, in which not only the RGB but also the depth and skeleton images of actions are available. Our approach adapts the inter-database variations, and enables the sharing of visual knowledge across different image modalities. Each action instance for recognition in RGB representation is then augmented with the borrowed depth and skeleton features.
Current Research Results
"Booting Time Minimization for Real-Time Embedded Systems with Non-Volatile Memory," IEEE Transactions on Computers, April 2014.
Authors: Che-Wei Chang, Chuan-Yue Yang, Yuan-Hao Chang, and Tei-Wei Kuo

Yuan-HaoChangAbstract:
Minimizing the booting time of an embedded system has become a major technical issue for the success of many consumer electronics. In this paper, the booting time minimization problem for real-time embedded systems with the joint consideration of DRAM and non-volatile memory is formally formulated. We show this is an ${\cal NP}$-hard problem, and propose an optimal but pseudo-polynomial-time algorithm with dynamic programming techniques. In considering polynomial-time solutions, a $0.25$-approximation greedy algorithm is provided, and a polynomial-time approximation scheme is developed to trade the optimality of the derived solution for the time complexity according to a user-specified error bound. The proposed algorithms can manage real-time embedded systems consisting of not only real-time tasks, but also initialization tasks that are executed only once during system booting. The proposed algorithms were then evaluated with $65$ real benchmarks from the MRTC and DSPstone benchmark suites, and the results showed that all of the proposed algorithms can reduce booting time for each benchmark set by more than $29\%$. Moreover, extensive simulations were conducted to show the capability of the proposed approaches when used with various hardware resources and software workloads. 
"On Trading Wear-leveling with Heal-leveling," ACM/IEEE Design Automation Conference (DAC), June 2014.
Authors: Yu-Ming Chang, Yuan-Hao Chang, Jian-Jia Chen, Tei-Wei Kuo, Hsiang-Pang Li, and Hang-Ting Lue

Yuan-HaoChangAbstract:
The density of flash memory keeps increasing because of the strong demand on the storage capacity. However, such a development trend significantly reduces the reliability and endurance of flash memory in terms of erase cycles, even when the wear leveling technology is adopted to extend the lifetime of flash memory by evenly distributing the erase cycles to every flash block. To address this issue, the self-healing technology is proposed to recover a flash block before the flash block is worn out, but such a technology still has a limitation on recovering flash blocks. In contrast to the existing wear leveling designs, we adopt the self-healing technology to propose a heal-leveling design that evenly distribute the healing cycles to flash blocks so as to ultimately extend the lifetime of flash memory without the huge amount of live-data copying overheads imposed by existing wear leveling designs. A series of experiments was conducted to evaluate the capability of the proposed design. The results show that our design can significantly improve the access performance and the effective lifetime of flash memory without the unnecessary overheads caused by wear leveling technology.
"A Content Privacy-Preserving Protocol for Energy-Efficient Access to Commercial Online Social Networks," Proceedings of IEEE International Conference on Communications (ICC 2014), June 2014.
Authors: Yi-Hui Lin, Chih-Yu Wang, and W. T. Chen

Wen-TsuenChenImageAbstract:
Abstract—The privacy issue of online social networks (OSNs) has been getting attention from the public, especially when data privacy has caused the disagreement between users and OSN providers. While the providers utilize users’ data as a commercial usage to make profit; on the other hand, users feel their privacy has been violated by such behavior. In this paper, we propose a privacy preserving protocol for users’data sharing in OSNs, where the OSN provider cannot retrieve the users’ social content while the users can efficiently add or remove a social contact and flexibly perform the data access control. Moreover, we prove that the users would allow the OSN provider to perform keyword search over the encrypted content for advertising profit, so that the OSN provider can commercialize its products without the knowledge of content.
Current Research Results
"A quantitative high-resolution genetic profile rapidly identifies sequence determinants of viral fitness and drug sensitivity," PLoS Pathogens, July 2014.
Authors: Hangfei Qi, C. Anders Olson, Nicholas C. Wu, Ruian Ke, Claude Loverdo, Roland Remenyi, Virginia Chu, Shawna Truong, Zugen Chen, Yushen Du, Sheng-Yao Su, Laith Q. Al-Mawsawi, Ting-Ting Wu, Shu-Hua Chen, Chung-Yen Lin, Weidong Zhong, James O. Lloyd-Smith, Ren Sun

Chung-YenLinSophiaDanielAbstract:
Widely used chemical genetic screens have greatly facilitated the identification of many antiviral agents. However, the regions of interaction and inhibitory mechanisms of many therapeutic candidates have yet to be elucidated. Previous chemical screens identified Daclatasvir (BMS- 790052) as a potent nonstructural protein 5A (NS5A) inhibitor for Hepatitis C virus (HCV)  infection with an unclear inhibitory mechanism. Here we have developed a quantitative high resolution genetic (qHRG) approach to systematically map the drug-protein interactions between  Daclatasvir and NS5A and profile genetic barriers to Daclatasvir resistance. We implemented  saturation mutagenesis in combination with next-generation sequencing technology to  systematically quantify the effect of every possible amino acid substitution in the drug-targeted  region (domain IA of NS5A) on replication fitness and sensitivity to Daclatasvir. This enabled  determination of the residues governing drug-protein interactions. The relative fitness and drug  sensitivity profiles also provide a comprehensive reference of the genetic barriers for all possible  single amino acid changes during viral evolution, which we utilized to predict clinical outcomes  using mathematical models. We envision that this high-resolution profiling methodology will be  useful for next-generation drug development to select drugs with higher fitness costs to  resistance, and also for informing the rational use of drugs based on viral variant spectra from  patients.
"Active Disaster Response System for Smart Buildings," Proceedings of International Symposium on Computer, Consumer and Control, June 2014.
Authors: C.-Y. Lin, E. T. H. Chu, L.-W. Ku and J. W. S. Liu

Jane Win ShihLiuAbstract:
 In recent years, major natural disasters have made it more challenging for us to protect human lives. Examples include the 2011 Japan Tohoku earthquake and Typhoon Morakot in 2009 in Taiwan. However, modern disaster warning systems cannot automatically respond efficiently and effectively to disasters. As a result, it is necessary to develop an automatic response system to prevent, diminish or eliminate the damages caused by disasters. In this paper, we develop an active emergency response system to automatically process standard warning messages, such as CAP (Common Alerting Protocol) messages. After receiving official warning messages of earthquakes, our system automatically shuts down natural gas lines in order to prevent buildings from fire and opens the building doors for easy evacuation. Our system also stops elevators when they reach the closest floor. In addition, our system can be applied to hospitals to tell surgeons to pause on-going operations, or to supermarkets to inform consumers of the way to exit the building. According to our experiment results, the proposed system can avoid possible dangers and save human lives during major disasters. 
"Space-Efficient Multiversion Index Scheme for PCM-based Embedded Database Systems," ACM/IEEE Design Automation Conference (DAC), June 2014.
Authors: Yuan-Hung Kuan, Yuan-Hao Chang, Po-Chun Huang, and Kam-Yiu Lam

Yuan-HaoChangAbstract:
Embedded database systems are widely adopted in various control and motoring systems, e.g., cyber-physical systems. To support the functionality to access the historical data, a multiversion index is adopted to simultaneously maintain multiple versions of data items and their index information. However, cyber-physical systems are usually battery-powered embedded systems that have limited energy, computing power, and storage space. In this work, we consider the systems with phase-change memory (PCM) as their storage due to its non-volatility and low energy consumption. In order to resolve the problem of the limited storage space and the fact that existing multiversion index designs are lack of space efficiency, we propose a space-efficient multiversion index scheme to enhance the space utilization and access performance of embedded multiversion database systems on PCM by utilizing the byte-addressability and write asymmetry of PCM. A series of experiments was conducted to evaluate the efficacy of the proposed scheme. The results show that the proposed scheme achieves very high space utilization and has good performance on serving update transactions and range queries.
Current Research Results
"Rating Equity Funds against Return of Random Traders," ASE Science Journal, September 2014.
Authors: Ta-Wei Hung, Mu-En Wu, Hsueh-I Lu, Jan-Ming Ho

Jan-MingHoImageImageAbstract:
In this paper, we study the problem of benchmarking returns of equity funds. We present a novel approach, the random-trader scheme, to benchmark return of an equity fund during a specific period. A random trader uses all-in-all-out strategy to trade inthe market at random timing with capital being negligible as compared with the market size.

Let DRRT denote the distribution of returns of randomtraders, and Rrt be a random variable sampled from DRRT. In this paper we model DRRT as alog-normal distribution, denoted as sDRRT, and provide an efficient algorithm to compute the mean and variance of sDRRT, denoted as μDRRT and σDRRT, respectively. Using TAIEX index as dataset, our experiments show that sDRRT approximates DRRT well when the length of given period is one month.

We then score each equity fund by the cumulative distribution function of sDRRT i.e., s(m;DRRT)=Pr[RrtR(m)], where R(m) denotes the return of the equity fund mduring the given period. Using the historical data on equity funds in Taiwan, we observed interesting characteristics. When analyzing monthly returns, although there are some winners who are able to achieve scores higher than .9 sometimes, it is difficult for them to always keep up with the high scores. Moreover, few equity funds showed the stability of scores higher than .7, when analyzing long-term returns. Furthermore, there are times when most funds obtain high scores. There are also times when no funds perform well.

相關資訊

Academia Sinica 資訊科學研究所 Academia Sinica