Institute of Information Science
近期研究成果
"Optimum Quantizing of Monotonic Nondecreasing Arrays," Proceedings IEEE Symposium on Computational Intelligence for Financial Engineering & Economics, 2013, April 2013.
Authors: William W.Y. Hsu, Cheng-Yu Lu, Ming-Yang Kao, and Jan-Ming Ho

Jan-MingHoAbstract:
This paper presents an efficient algorithm for finding the optimal k-cuts of a nondecreasing array of size n that produces the maximum area under the points. The naive approach uses a dynamic programming algorithm which requires O(kn^2) time, where n is the size of the array. This algorithm is time consuming for large n or k and thus inappropriate. We design faster algorithms by discovering and proving some nice properties of the nondecreasing arrays, finding convex hull, and by continuous-to-discrete transformation. We believe that an O(kn) time algorithm exists and show a possible heuristic algorithm.
"Performance Enhancement of Garbage Collection for Flash Storage Devices: An Efficient Victim Block Selection Design," ACM/IEEE Design Automation Conference (DAC), June 2013.
Authors: Che-Wei Tsao, Yuan-Hao Chang, and Ming-Chang Yang

Yuan-HaoChangAbstract:
Motivated by the needs to enhance the performance of garbage collection in low-cost flash storage devices, we propose a victim block selection design to efficiently identify the blocks for erases and reclaim the space of invalid data without extensively scanning flash memory for the status of data stored in the storage, so as to achieve improved performance of garbage collection on reclaiming space of invalid data. At the same time, this design could also easily identify and reclaim the space released by file systems.  A series of experiments based on benchmark traces demonstrates the significantly improved performance of garbage collection with limited system overheads.
Current Research Results
"Localized Algorithms for Detection of Node Replication Attacks in Mobile Sensor Networks," IEEE Trans. on Information Forensics and Security, May 2013.
Authors: Chia-Mu Yu, Yao-Tung Tsou, Chun-Shien Lu, and Sy-Yen Kuo

Chun-ShienLuAbstract:
We deal with the challenging problem of node replication detection. Although defending against node replication attacks demands immediate attention, compared to the extensive exploration on the defense against node replication attacks in static networks, only a few solutions in mobile networks have been presented. Moreover, while most of the existing schemes in static networks rely on the witness-finding strategy, which cannot be applied to mobile networks, the velocity-exceeding strategy used in existing schemes in mobile networks incurs efficiency and security problems. Therefore, based on our devised challenge-and-response and encounter-number approaches, localized algorithms are proposed to resist node replication attacks in mobile sensor networks. The advantages of our proposed algorithms include (1) Localized Detection; (2) Efficiency and Effectiveness; (3) Network-Wide Synchronization Avoidance; (4) Network-Wide Revocation Avoidance. Performance comparison with known methods are provided to demonstrate the efficiency of our proposed algorithms. Prototype implementation on TelosB mote demonstrates the practicality of our proposed methods.
"On Shortest Unique Substring Queries," Proc. of the 29th IEEE International Conference on Data Engineering (ICDE-2013), April 2013.
Authors: Jian Pei, Wush Chi-Hsuan Wu, and Mi-Yen Yeh

Mi-YenYehAbstract:
In this paper, we tackle a novel type of interesting queries — shortest unique substring queries. Given a (long) string S and a query point q in the string, can we find a shortest substring containing q that is unique in S? We illustrate that shortest unique substring queries have many potential applications, such as information retrieval, bioinformatics, and event context analysis. We develop efficient algorithms for online query answering. First, we present an algorithm to answer a shortest unique substring query in O(n) time using a suffix tree index, where n is the length of string S. Second, we show that, using O(n · h) time and O(n) space, we can compute a shortest unique substring for every position in a given string, where h is variable theoretically in O(n) but on real data sets often much smaller than n and can be treated as a constant. Once the shortest unique substrings are pre-computed, shortest unique substring queries can be answered online in constant time. In addition to the solid algorithmic results, we empirically demonstrate the effectiveness and efficiency of shortest unique substring queries on real data sets.
"Improving Dynamic Binary Optimization Through Early-Exit Guided Code Region Formation," ACM 9th International Conference on Virtual Execution Environments (VEE), March 2013.
Authors: Chun-Chen Hsu, Pangfeng Liu, Jan-Jan Wu, Pen-Chung Yew, Wei Chung Hsu, Chien-Min Wang, Ding-Yong Hong

Chien-MinWangPen-ChungYewJan-JanWuAbstract:
Most dynamic binary translators (DBT) and optimizers (DBO) target binary traces, i.e. frequently executed paths, as code regions to be translated and optimized. Code region formation is the most important first step in all DBTs and DBOs. The quality of the dynamically formed code regions determines the extent and the types of optimization opportunities that can be exposed to DBTs and DBOs, and thus, determines the ultimate quality of the final optimized code. The Next-Executing-Tail (NET) trace formation method used in HP Dynamo is an early example of such techniques. Many existing trace formation schemes are variants of NET. They work very well for most binary traces, but they also suffer a major problem: the formed traces may contain a large number of early exits that could be branched out during the execution. If this happens frequently, the program execution will spend more time in the slow binary interpreter or in the unoptimized code regions than in the optimized traces in code cache. The benefit of the trace optimization is thus lost. Traces/regions with frequently taken early-exits are called delinquent traces/regions. Our empirical study shows that at least 8 of the 12 SPEC CPU2006 integer benchmarks have delinquent traces.
In this paper, we propose a light-weight region formation technique called Early-Exit Guided Region Formation (EEG) to improve the quality of the formed traces/regions. It iteratively identifies and merges delinquent regions into larger code regions. We have implemented our EEG algorithm in two LLVM-based multithreaded DBTs targeting ARM and IA32 instruction set architecture (ISA), respectively. Using SPEC CPU2006 benchmark suite with reference inputs, our results show that compared to an NETvariant currently used in QEMU, a state-of-the-art retargetable DBT, EEG can achieve a significant performance improvement of up to 72% (27% on average), and to 49% (23% on average) for IA32 and ARM, respectively.
Current Research Results
"Validating Contradiction in Texts Using Online Co-Mention Pattern Checking," Transactions on Asian Language Information Processing (TALIP), December 2012.
Authors: Cheng-Wei Shih, Cheng-Wei Lee, Richard Tzong-Han Tsai, Wen-Lian Hsu

Wen-LianHsuAbstract:
Detecting contradictive statements is a foundational and challenging task for text understanding applica-
tions such as textual entailment. In this article, we aim to address the problem of the shortage of specific
background knowledge in contradiction detection. A novel contradiction detecting approach based on the dis-
tribution of the query composed of critical mismatch combinations on the Internet is proposed to tackle the
problem. By measuring the availability of mismatch conjunction phrases (MCPs), the background knowl-
edge about two target statements can be implicitly obtained for identifying contradictions. Experiments
on three different configurations show that the MCP-based approach achieves remarkable improvement on
contradiction detection and can significantly improve the performance of textual entailment recognition
"New ERA: New Efficient Reliability-Aware Wear Leveling for Endurance Enhancement of Flash Storage Devices," ACM/IEEE Design Automation Conference (DAC), June 2013.
Authors: Ming-Chang Yang, Yuan-Hao Chang, Che-Wei Tsao, and Po-Chun Huang

Yuan-HaoChangAbstract:
As the program/erase (P/E) cycles of flash memory keep decreasing, improving the lifetime/endurance of flash memory has become a fundamental issue in the design of flash storage devices. This work is motivated by the observation that flash blocks endured the same P/E cycles usually have different bit error rates. In contrast to the existing wear-leveling techniques that try to distribute erases to flash blocks as evenly as possible, we propose an efficient reliability-aware wear-leveling scheme to distribute block erases based on the bit error rates of blocks so as to even out the error rate among flash blocks, to maximize the number of good blocks, and thus to ultimately prolong the lifetime of flash storage devices. The experiments were conducted based on representative realistic workloads to evaluate the efficacy of the proposed scheme, for which the results are very encouraging.
"Gender Swapping and User Behaviors in Online Social Games," Proceedings of ACM WWW 2013, May 2013.
Authors: Jing-Kai Lou, Kunwoo Park, Meeyoung Cha, Juyong Park, Chin-Laung Lei, and Kuan-Ta Chen

Sheng-Wei(Kuan-Ta)ChenJing-Kai LouAbstract:
Modern Massively Multiplayer Online Role-Playing Games (MMORPGs) provide lifelike virtual environments in which players can conduct a variety of activities including combat, trade, and chat with other players. While the game world and the available actions therein are inspired by their offline counterparts, the games' popularity and dedicated fan base are testament to the allure of novel social interactions granted to people by granting them an alternative life as new characters and personae. In this paper we investigate the phenomenon of "gender swapping," which refers to players choosing avatars of genders opposite to their natural ones. We report the behavioral patterns observed in players of Fairyland Online, a globally serviced MMORPG, during social interactions when playing as in-game avatars of their own real gender or genderswapped, and discuss the effect of gender role and self-image in virtual social situations and the potential of our study for improving MMORPG qualities and detection of fake online identities.
Current Research Results
"Wavelet Bayesian Network Image Denoising," IEEE Transactions on Image Processing, April 2013.
Authors: Jinn Ho, Wen-Liang Hwang

Wen-LiangHwangJinn_HoAbstract:
From the perspective of the Bayesian approach, the denoising problem is essentially a prior probability modeling and estimation task. In this paper, we propose an approach that exploits a hidden Bayesian network, constructed from wavelet coefficients, to model the prior probability of the original image. Then, we use the belief propagation (BP) algorithm, which estimates a coefficient based on all the coefficients of an image, as the maximum-a-posterior (MAP) estimator to derive the denoised wavelet coefficients. We show that if the network is a spanning tree, the standard BP algorithm can perform MAP estimation efficiently. Our experiment results demonstrate that, in terms of the peak-signal-to-noise-ratio and perceptual quality, the proposed approach outperforms state-of-the-art algorithms on several images, particularly in the textured regions, with various amounts of white Gaussian noise.
Current Research Results
Authors: J. W. S. Liu, E. T.-H. Chu, and C. S. Shih

Jane Win ShihLiuAbstract:
This paper advocates the development and pervasive deployment of iGaDs (intelligent guards against disasters). The term collectively refers to smart devices, applications and services that are capable of receiving, authenticating, and processing standard-based disaster warning and response messages from authorized senders and taking appropriate actions to help us stay safe and reduce adverse economic impacts when disasters strike. The paper first describes scenarios based on recent calamities to illustrate how iGaDs can help to minimize loss of lives and damages to property. It then describes a general structure of embedded iGaDs and presents the challenges to make such devices and systems dependable and affordable enough to be used pervasively as parts of future smart living environments. 
Current Research Results
Authors: Chi-Jen Wu, Jan-Ming Ho, and Ming-Syan Chen

Ming-SyanChenJan-MingHoChi-JenWuAbstract:
Social network applications are becoming increasingly popular on mobile devices. A mobile presence service is an essential component of a social network application because it maintains each mobile user’s presence information, such as the current status (online/offline), GPS location and network address, and also updates the user’s online friends with the information continually. If presence updates occur frequently, the enormous number of messages distributed by presence servers may lead to a scalability problem in a large-scale mobile presence service. To address the problem, we propose an efficient and scalable server architecture, called PresenceCloud, which enables mobile presence services to support large-scale social network applications. When a mobile user joins a network, PresenceCloud searches for the presence of his/her friends and notifies them of his/her arrival. PresenceCloud organizes presence servers into a quorum-based server-to-server architecture for efficient presence searching. It also leverages a directed search algorithm and a one-hop caching strategy to achieve small constant search latency. We analyze the performance of PresenceCloud in terms of the search cost and search satisfaction level. The search cost is defined as the total number of messages generated by the presence server when a user arrives; and search satisfaction level is defined as the time it takes to search for the arriving user’s friend list. The results of simulations demonstrate
that PresenceCloud achieves performance gains in the search cost without compromising search satisfaction.
"Memorax: Fence Inference under the TSO Memory Model," The 19th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2013), March 2013.
Authors: Parosh Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Carl Leonardsson and Ahmed Rezine

Yu-FangChenAbstract:
We introduce MEMORAX, a tool for the verification of control state reachability (i.e., safety properties) of concurrent programs manipulating finite range and integer variables and running on top of weak memory models. The verification task is non-trivial as it involves exploring state spaces of arbitrary or even infinite sizes. Even for programs that only manipulate finite range variables, the sizes of the store buffers could grow unboundedly, and hence the state spaces that need to be explored could be of infinite size. In addition, MEMORAX in- corporates an interpolation based CEGAR loop to make possible the verification of control state reachability for concurrent programs involving integer variables. The reachability procedure is used to automatically compute possible memory fence placements that guarantee the unreachability of bad control states under TSO. In fact, for programs only involving finite range variables and running on TSO, the fence insertion functionality is complete, i.e., it will find all minimal sets of memory fence placements (minimal in the sense that removing any fence would result in the reachability of the bad control states). This makes MEMORAX the first freely available, open source, push-button verification and fence insertion tool for programs running under TSO with integer variables.
Current Research Results
Authors: Yao-Ming Chang, Chia-Lin Chang, Wen-Hsiung Li, Arthur Chun-Chieh Shih

Arthur Chun-ChiehShihAbstract:
C4 plants evolved from C3 plants through a series of complex evolutionary steps. On the basis of the evolution of key C4 enzyme genes, the evolution of C4 photosynthesis has been considered a story of gene/genome duplications and subsequent modifications of gene function. If whole-genome duplication has contributed to the evolution of C4 photosynthesis, other genes should have been duplicated together with these C4 genes. However, which genes were co-duplicated with C4 genes and whether they have also played a role in C4 evolution are largely unknown. In this study, we developed a simple method to characterize the historical profile of the paralogs of a gene by tracing back to the most recent common ancestor (MRCA) of the gene and its paralog(s) and then counting the number of paralogs at each MRCA. We clustered the genes into clusters with similar duplication profiles and inferred their functional enrichments. Applying our method to maize, a familiar C4 plant, we identified many genes that show similar duplication profiles with those of the key C4 enzyme genes and found that the functional preferences of the C4 gene clusters are not only similar to those identified by an experimental approach in a recent study but also highly consistent with the functions required for the C4 photosynthesis evolutionary model proposed by S.F. Sage. Some of these genes might have co-evolved with the key C4 enzyme genes to increase the strength of C4 photosynthesis. Moreover, our results suggested that most key C4 enzyme genes had different origins and have undergone a long evolutionary process before the emergence of C4 grasses (Andropogoneae), consistent with the conclusion proposed by previous authors.
Current Research Results
"SPAM: A Secure Password Authentication Mechanism for Seamless Handover in Proxy Mobile IPv6 Networks," IEEE System Journal, July 2013.
Authors: Ming-Chin Chuang, Jeng-Farn Lee and Meng Chang Chen

Meng ChangChenAbstract:
The Internet Engineering Task Force NETLMM Working Group recently proposed a network-based localized mobility management protocol called Proxy Mobile IPv6 (PMIPv6) to support mobility management without the participation of mobile nodes in any mobility-related signaling. Although PMIPv6 reduces the signaling overhead and the handover latency, it still suffers from packet loss problem and long authentication latency during handoff. In addition, there are many security threats to PMIPv6. In this paper, we perform a bicasting scheme for avoiding the packet loss problem, use the piggyback technique to reduce the signaling overhead, and provide a secure password authentication mechanism (SPAM) for protecting a valid user from attacks in PMIPv6 networks. SPAM provides high security properties, including anonymity, stolen-verified attack resistance, location privacy, mutual authentication, forgery attack resistance, no clock synchronization problem, modification attack resistance, replay attack resistance, fast error detection, choose and change password free, and session key agreement. Moreover, SPAM is an efficient authentication scheme that performs the authentication procedure locally and has low computational cost. From the analysis, we demonstrate that our scheme can resist various attacks and provides better performance than existing schemes.
"GamingAnywhere: An Open Cloud Gaming System," Proceedings of ACM Multimedia Systems 2013, February 2013.
Authors: Chun-Ying Huang, Cheng-Hsin Hsu, Yu-Chun Chang, and Kuan-Ta Chen

Sheng-Wei(Kuan-Ta)ChenAbstract:
Cloud gaming is a promising application of the rapidly expanding cloud computing infrastructure. Existing cloud gaming systems, however, are closed-source with proprietary protocols, which raises the bars to setting up testbeds for experiencing cloud games. In this paper, we present a complete cloud gaming system, called GamingAnywhere, which is to the best of our knowledge the first open cloud gaming system. In addition to its openness, we design GamingAnywhere for high extensibility, portability, and reconfigurability. We implement GamingAnywhere on Windows, Linux, and OS X, while its client can be readily ported to other OS’s, including iOS and Android. We conduct extensive experiments to quantify the performance of GamingAnywhere, and compare it against two well-known cloud gaming systems: OnLive and StreamMyGame. Our experimental results indicate that GamingAnywhere is efficient and achieves high responsiveness and video quality. For example, GamingAnywhere yields a per-frame processing delay of 41 ms, which is 4+ and 8+ times shorter than OnLive and StreamMyGame, respectively. Our experiments also reveal that all these performance gains are achieved without the expense of higher network loads; in fact, GamingAnywhere incurs less network traffic. The proposed GamingAnywhere can be employed by the researchers, game developers, service providers, and end users for setting up cloud gaming testbeds, which, we believe, will stimulate more research projects on cloud gaming systems.
Current Research Results
Authors: Lien-Chin Chen, Mei-Ying Liu, Yung-Chin Hsiao, Wai-Kok Chong, Hsin-Yi Wu, Wen-Lian Hsu, Pao-Chi Liao*, Ting-Yi Sung*, Shih-Feng Tsai*, Jau-Song Yu*, Yu-Ju Chen*

Ting-YiSungWen-LianHsuAbstract:
Chromosome 4 is the fourth largest chromosome, containing approximately 191 megabases (http://pubs.acs.org/appl/literatum/publisher/achs/journals/entities/223C.gif6.4% of the human genome) with 757 protein-coding genes. A number of marker genes for many diseases have been found in this chromosome, including genetic diseases (e.g., hepatocellular carcinoma) and biomedical research (cardiac system, aging, metabolic disorders, immune system, cancer and stem cell) related genes (e.g., oncogenes, growth factors). As a pilot study for the chromosome 4-centric human proteome project (Chr 4-HPP), we present here a systematic analysis of the disease association, protein isoforms, coding single nucleotide polymorphisms of these 757 protein-coding genes and their experimental evidence at the protein level. We also describe how the findings from the chromosome 4 project might be used to drive the biomarker discovery and validation study in disease-oriented projects, using the examples of secretomic and membrane proteomic approaches in cancer research. By integrating with cancer cell secretomes and several other existing databases in the public domain, we identified 141 chromosome 4-encoded proteins as cancer cell-secretable/shedable proteins. Additionally, we also identified 54 chromosome 4-encoded proteins that have been classified as cancer-associated proteins with successful selected or multiple reaction monitoring (SRM/MRM) assays developed. From literature annotation and topology analysis, 271 proteins were recognized as membrane proteins while 27.9% of the 757 proteins do not have any experimental evidence at the protein-level. In summary, the analysis revealed that the chromosome 4 is a rich resource for cancer-associated proteins for biomarker verification projects and for drug target discovery projects.
Current Research Results
Authors: Yu-Jung Chang, Chien-Chih Chen, Chuen-Liang Chen and Jan-Ming Ho

Jan-MingHoChien-Chih ChenYu-Jung ChangAbstract:

Background

State-of-the-art high-throughput sequencers, e.g., the Illumina HiSeq series, generate sequencing reads that are longer than 150 bp up to a total of 600 Gbp of data per run. The high-throughput sequencers generate lengthier reads with greater sequencing depth than those generated by previous technologies. Two major challenges exist in using the high-throughput technology for de novo assembly of genomes. First, the amount of physical memory may be insufficient to store the data structure of the assembly algorithm, even for high-end multicore processors. Moreover, the graph-theoretical model used to capture intersection relationships of the reads may contain structural defects that are not well managed by existing assembly algorithms.

Results

We developed a distributed genome assembler based on string graphs and MapReduce framework, known as the CloudBrush. The assembler includes a novel edge-adjustment algorithm to detect structural defects by examining the neighboring reads of a specific read for sequencing errors and adjusting the edges of the string graph, if necessary. CloudBrush is evaluated against GAGE benchmarks to compare its assembly quality with the other assemblers. The results show that our assemblies have a moderate N50, a low misassembly rate of misjoins, and indels of > 5 bp. In addition, we have introduced two measures, known as precision and recall, to address the issues of faithfully aligned contigs to target genomes. Compared with the assembly tools used in the GAGE benchmarks, CloudBrush is shown to produce contigs with high precision and recall. We also verified the effectiveness of the edge-adjustment algorithm using simulated datasets and ran CloudBrush on a nematode dataset using a commercial cloud. CloudBrush assembler is available at https://github.com/ice91/CloudBrush.
"Mining Publication Records in Personal Publication Web Pages Based on Conditional Random Fields," The 2012 IEEE/WIC/ACM International Conference on Web Intelligence, December 2012.
Authors: Jen-Ming Chung, Ya-Huei Lin, Hahn-Ming Lee, and Jan-Ming Ho

Jan-MingHoHahn-Ming LeeJen-Ming ChungAbstract:
A publication record is a list of semi-structured reference strings for publications of a research institute or an individual researcher. Publication records are integrated into a digital library which is an important knowledge base and also enables a variety of applications. A publication record is usually found among other information on a publication Web page (or ”publication page” for short). It is thus an interesting problem to extract publication record from such Web pages. The problem is difficult for several reasons including flexibility in formatting the metadata of a publication into a semi-structured reference string and flexibility in expressing the reference string visually presentation in HTML. Furthermore, two reference strings with a similar visual presentation on the same Web page may have different HTML constructs. In this paper, we present a content analysis approach, based on Conditional Random Fields and data region boundary analysis, the problem of automatically extracting publication records on a publication page. Experimental results show that our method performs well on a benchmark containing manually crafted publication pages. The precision rate and recall rate, and Fmeasure are 82.5%, 87.6%, and 85.0%, respectively. This is a significant improvement over previous results.
Current Research Results
Authors: Tsai, Z.T.Y., Tsai, H.K.*, Cheng, J.H., Lin, C.H., Tsai, Y.F. and Wang, D.*

Huai-KuangTsaiCheng, J.H.Tsai, Z.T.Y.Abstract:

Background

New genes that originate from non-coding DNA rather than being duplicated from parent genes are called de novo genes. Their short evolution time and lack of parent genes provide a chance to study the evolution of cis-regulatory elements in the initial stage of gene emergence. Although a few reports have discussed cis-regulatory elements in new genes, knowledge of the characteristics of these elements in de novo genes is lacking. Here, we conducted a comprehensive investigation to depict the emergence and establishment of cis-regulatory elements in de novo yeast genes.

Results

In a genome-wide investigation, we found that the number of transcription factor binding sites (TFBSs) in de novo genes of S. cerevisiae increased rapidly and quickly became comparable to the number of TFBSs in established genes. This phenomenon might have resulted from certain characteristics of de novo genes; namely, a relatively frequent gain of TFBSs, an unexpectedly high number of preexisting TFBSs, or lower selection pressure in the promoter regions of the de novo genes. Furthermore, we identified differences in the promoter architecture between de novo genes and duplicated new genes, suggesting that distinct regulatory strategies might be employed by genes of different origin. Finally, our functional analyses of the yeast de novo genes revealed that they might be related to reproduction.

Conclusions

Our observations showed that de novo genes and duplicated new genes possess mutually distinct regulatory characteristics, implying that these two types of genes might have different roles in evolution.

Current Research Results
"Coding-Aware Peer-to-Peer Data Repair in Multi-Rate Wireless Networks – A Game Theoretic Analysis," IEEE Journal on Selected Areas in Communications (JSAC), September 2013.
Authors: Hsiao-Chen Lu, Wanjiun Liao, Meng Chang Chen, and Musaed A. Alhussein

Meng ChangChenAbstract:
Recent research shows that in wireless wide area networks (WWANs), users who subscribe to multicast traffics from the WWAN can exchange network-coded packets with one another via their secondary radio interfaces such as Wi-Fi in order to efficiently recover lost packets from the WWAN. Different from existing works which assume users are cooperative, in this work, we model the users as selfish players in the network-coding based peer-to-peer packet repairing game. To stimulate the users’ cooperation, we introduce a payment-based incentive mechanism in the packet repairing game. The utility function of a user/player is also formulated to reflect both the number of useful packets and the available resource. Through analysis of the packet repairing game, we show that the optimal strategy for a user can be derived only with its local information. Moreover, the impact of the pricing rules and the convergence conditions of the packet repairing game have also been analyzed. We show theoretically as well as by simulation that under proper conditions, the packet repairing game can converge to the best case where each user can acquire all of its missing packets. Via computer simulations, we also show that with the proposed selection criteria, the packet repairing game is both effective and efficient; not only can the utilities of the players be greatly improved, but also the convergence time of the game and the utility gain of the players are comparable to those of the ideal case where each user is always willing to forward packets to others.
"Workload Characteristics-aware Virtual Machine Consolidation Algorithms," IEEE International Conference on Cloud Computing Technology and Science (IEEE CloudCom), December 2012.
Authors: Jyun-Shiung Yang, Pangfeng Liu, Jan-Jan Wu

Jan-JanWuAbstract:
An energy conservation strategy must address two issues – placement of virtual machine images and workload characteristics of virtual machines. For performance reason most cloud systems copy a prototype image into the local disk of a physical machine before starting a virtual machine. If the physical machine that stores the image of a virtual machine is off-line, then we cannot run this virtual machine. The workload characteristics of virtual machines determine whether it is data-intensive or CPU-intensive.We assume that the system has a distributed file system therefore a physical machine can run any virtual machine even if it does not have the image. However, we observe that the performance of a data-intensive virtual machine running on a physical machine without its image could result in 60% performance loss compared with running the same virtual machine on a physical machine that has the virtual machine image. On the other hand, the performance of a CPU-intensive virtual machine is almost independent of whether the physical machine has the image or not. As a result, an energy conservation algorithm must consider the workload characteristic of a virtual machine when finding a physical machine to run it, especially for data-intensive virtual machines.

This paper proposes a workload characteristics-aware virtual machine consolidation algorithms. We propose an approximation algorithm and two dynamic programmings to consolidate virtual machines and reduce the number of physical machines. We conduct experiments and compare the numbers of physical machines used by our approximation algorithm with the optimal number of physical machines found by our dynamic programming. The experiment results indicate that our approximation algorithm finds good solutions much faster than the dynamic programming.
Authors: Chin-Fu Ku, Kai-Hsiang Yang, and Jan-Ming Ho

Jan-MingHoAbstract:
In network applications with security constraints, it is usually desirable to disseminate a file from a server through trusted network channels to a set of peers. This is a problem seldom studied in the literature though the problems of data integrity and security have been studied by many. In this paper, we study the file dissemination problem with trust relation modeled as a rooted full binary tree. We present the OOFD algorithm to schedule dissemination of the file iteratively from each peer holding a replica of the file to one of its descendants on the binary tree. We show that, in a homogeneous network, OOFD algorithm is optimum in the sense that the time it takes to disseminate the file to all nodes is minimized.
Current Research Results
"U-Skyline: A New Skyline Query for Uncertain Databases," IEEE Trans. on Knowledge and Data Engineering, April 2013.
Authors: X. Liu, D.-N. Yang, M. Ye, and W.-C. Lee

De-NianYangAbstract:
The skyline query, aiming at identifying a set of skyline tuples that are not dominated by any other tuple, is particularly useful for multi-criteria data analysis and decision-making. For uncertain databases, a probabilistic skyline query, called P-Skyline, has been developed to return skyline tuples by specifying a probability threshold. However, the answer obtained via a P-Skyline query usually includes skyline tuples undesirably dominating each other when a small threshold is specified; or it may contain much fewer skyline tuples if a larger threshold is employed. To address this concern, we propose a new uncertain skyline query, called U-Skyline query, in this paper. Instead of setting a probabilistic threshold to qualify each skyline tuple independently, the U-Skyline query searches for a set of tuples that has the highest probability (aggregated from all possible scenarios) as the skyline answer. In order to answer U-Skyline queries efficiently, we propose a number of optimization techniques for query processing, including 1) computational simplification of U-Skyline probability, 2) pruning of unqualified candidate skylines and early termination of query processing, 3) reduction of the input dataset, and 4) partition and conquest of the reduced dataset. We perform a comprehensive performance evaluation on our algorithm and an alternative approach that formulates the U-Skyline processing problem by integer programming.
Current Research Results
"Power Domination in Circular-arc Graphs," Algorithmica, February 2013.
Authors: Chung-Shou Liao and D. T. Lee

Der- TsaiLeeAbstract:

A set S ⊆ V is a power dominating set (PDS) of a graph G =( V , E ) if every vertex and every edge in G can be observed based on the observation rules of power system monitoring. The power domination problem involves minimizing the cardinality of a PDS of a graph. We consider this combinatorial optimization problem and present a linear time algorithm for finding the minimum PDS of an interval graph if the interval ordering of the graph is provided. In addition, we show that the algorithm, which runs in Θ( n log n ) time, where n is the number of intervals, is asymptotically optimal if the interval ordering is not given. We also show that the results hold for the class of circular-arc graphs.


Current Research Results
"The Left and Right Context of a Word: Overlapping Chinese Syllable Word Segmentation with Minimal Context," Transactions on Asian Language Information Processing (TALIP), March 2013.
Authors: MIKE TIAN-JIAN JIANG, TSUNG-HSIEN LEE, WEN-LIAN HSU

Abstract:
Since a Chinese syllable can correspond to many characters (homophones), the syllable-to-character conversion task is quite challenging for Chinese phonetic input methods (CPIM). There are usually two stages in a CPIM: 1. segment the syllable sequence into syllable words; 2. select the most likely character words for each syllable word. A CPIM usually assumes that the input is a complete sentence, and evaluates the performance based on a well-formed corpus. However, in practice, most Pinyin users prefer progressive text entry in several short chunks, mainly in one or two words each (most Chinese words consist of two or more characters). Short chunks do not provide enough contexts to perform the best possible syllable-to-character conversion, especially when a chunk consists of overlapping syllable words. In such cases, a conversion system often selects the boundary of a word with the highest frequency. Short chunk input is even more popular on platforms with limited computing power, such as mobile phones. Based on the observation that the relative strength of a word can be quite different when calculated leftwards or rightwards, we propose a simple division of the word context into the left context and the right context. Furthermore, we design a double ranking strategy for each word to reduce the number of errors in Step 1. Our strategy is modeled as the minimum feedback arc set problem on bipartite tournament with approximate solutions derived from genetic algorithm. Experiments show that, compared to the frequency-based method (FBM) (low memory and fast) and the conditional random fields (CRF) model (larger memory and slower), our double ranking strategy has the benefits of less memory, low power requirement with competitive performance. We believe a similar strategy could also be adopted to disambiguate conflicting linguistic patterns effectively.

相關資訊

Academia Sinica 資訊科學研究所 Academia Sinica