Institute of Information Science
Recent Research Results
Current Research Results
"Byte-addressable Update Scheme to Minimize the Energy Consumption of PCM-based Storage Systems," ACM Transactions on Embedded Computing Systems (TECS), June 2016.
Authors: Ming-Chang Yang, Yuan-Hao Chang, and Che-Wei Tsao

In recent years, phase-change memory (PCM) has generated a great deal of interest because of its byte-addressability and non-volatility properties. It is regarded as a good alternative storage medium that can reduce the performance gap between the main memory and the secondary storage in computing systems. However its high energy consumption on writes is a challenging issue in the design of battery-powered mobile computing systems. To reduce the energy consumption, we exploit the byte-addressability and the asymmetric read-write energy/latency of PCM in an energy-efficient update scheme for journaling file systems. We also introduce a concept called the fifty-percent rule to determine/recommend the best update strategy for block updates. The proposed scheme only writes modified data, instead of the whole updated block, to PCM-based storage devices without extra hardware support. Moreover, it guarantees the sanity/integrity of file systems even if the computing system crashes or there is a power failure during the data update process. We implemented the proposed scheme on the Linux system and conducted a series of experiments to evaluate the scheme. The results are very encouraging.
Current Research Results
"A Computer-Assistance Learning System for Emotional Wording," IEEE Transactions on Knowledge and Data Engineering, May 2016.
Authors: Wei-Fan Chen, Mei-Hua Chen, Ming-Lung Chen, and Lun-Wei Ku

Lun-Wei KuWei-Fan ChenAbstract:
Language learners' limited lexical knowledge leads to imprecise wording. This is especially true when they attempt to express their emotions. Many learners rely heavily on the traditional thesaurus. Unfortunately, this fails to provide appropriate suggestions for lexical choices. To better aid English-as-a-second-language learners with word choices, we propose RESOLVE, which provides ranked synonyms of emotion words based on contextual information. RESOLVE suggests precise emotion words regarding the events in the relevant context. Patterns are learned to capture emotion events, and various factors are considered in the scoring function for ranking emotion words. We also describe an online writing system developed using RESOLVE and evaluate its effectiveness for learning assistance with a writing task. Experimental results showed that RESOLVE yielded a superior performance on NDCG@5 which significantly outperformed both PMI and SVM approaches, and offered better suggestions than Roget's Thesaurus and PIGAI (an online automated essay scoring system). Moreover, when applying it to the writing task, students' appropriateness with emotion words was 30 percent improved. Less-proficient learners benefited more from RESOLVE than highly-proficient learners. Post-tests also showed that after using RESOLVE, less-proficient learners' ability to use emotion words approached that of highly-proficient learners. RESOLVE thus enables learners to use precise emotion words.
Current Research Results
"SPIRIT: A Tree Kernel-based Method for Topic Person Interaction Detection," IEEE Transactions on Knowledge and Data Engineering (TKDE), May To Appear.
Authors: Yung-Chun Chang, Chien Chin Chen, and Wen-Lian Hsu

The development of a topic in a set of topic documents is constituted by a series of person interactions at a specific time and place. Knowing the interactions of the persons mentioned in these documents is helpful for readers to better comprehend the documents. In this paper, we propose a topic person interaction detection method called SPIRIT, which classifies the text segments in a set of topic documents that convey person interactions. We design the rich interactive tree structure to represent syntactic, context, and semantic information of text, and this structure is incorporated into a tree-based convolution kernel to identify interactive segments. Experiment results based on real world topics demonstrate that the proposed rich interactive tree structure effectively detects the topic person interactions and that our method outperforms many well-known relation extraction and protein-protein interaction methods.
"When Social Influence Meets Item Inference," ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM KDD), August 2016.
Authors: H.-J. Hung, H.-H. Shuai, D.-N. Yang, L.-H. Huang, W.-C. Lee, J. Pei, and M.-S. Chen

Research issues and data mining techniques for product recommendation and viral marketing have been widely studied. Existing works on seed selection in social networks do not take into account the effect of product recommendations in e-commerce stores. In this paper, we investigate the seed selection problem for viral marketing that considers both effects of item inference recommendation and social influence. We develop a new model, Social Item Graph (SIG), that captures both effects in form of hyperedges and learn the corresponding probability by EMS framework. Accordingly, we formulate a new optimization problem, called Social Item Maximization Problem (SIMP), to simultaneously exploit influence from friends and recommendations from stores for effective viral marketing. We prove that SIMP is inapproximable within $n^{c} $ for any $c<1$, and design an efficient $n$-approximation algorithm. Experimental results demonstrate the accuracy of the proposed model and show the effectiveness and efficiency of the proposed algorithm.
"Routing and Scheduling of Social Influence Diffusion in Online Social Networks," IEEE International Conference on Distributed Computing Systems (IEEE ICDCS), June 2016.
Authors: H.-J. Hung, D.-N. Yang, and W.-C. Lee

Owing to the rising popularity of online social networking services (OSNs), studies on social influence and its diffusion have received significant attention from the research community. Prior research has mostly studied the impact of single-hop influence diffusion and multi-hop influence broadcast in online social networks. Very little research explores the idea of guiding (routing) multi-hop social influence towards a specific target. In this paper, we motivate the needs of timely routing social influence to formulate a new optimization problem, namely, Routing And Scheduling of Target-Oriented Social Influence Diffusion (RAS-TOSID). Accordingly, we propose the Efficient Routing And Scheduling with Social and Temporal decOmposition (ERASSTO) algorithm, which finds the optimal solution to RAS-TOSID in polynomial time. We carry out a user study by implementing ERASSTO in Facebook and conduct a comprehensive evaluation on ERASSTO and alternative approaches by simulation. The result shows that ERASSTO significantly outperforms other algorithms regarding solution quality and computational efficiency.
Current Research Results
"Disturbance Relaxation for 3D Flash Memory," IEEE Transactions on Computers (TC), May 2016.
Authors: Yu-Ming Chang, Yuan-Hao Chang, Tei-Wei Kuo, Yung-Chun Li, and Hsiang-Pang Li

Even though 3D flash memory presents a grand opportunity to huge-capacity non-volatile memory, it suffers from serious program disturbance problems. In contrast to the past efforts in error correction codes and the work in trading the space utilization for reliability, we propose a disturbance-relaxation scheme that can alleviate the negative effects caused by program disturbance inside a physical block. This scheme does not introduce any extra overheads on encoding or storing of extra redundant data. In particular, a methodology is proposed to reduce the data error rate by distributing unavoidable disturbance errors to the flash-memory space of invalid data, with the considerations of the physical organization of 3D flash memory. A series of experiments was conducted based on real multi-layer 3D flash chips, and it showed that the proposed scheme could significantly enhance the reliability of 3D flash memory.
Current Research Results
"Reducing Data Migration Overheads of Flash Wear Leveling in a Progressive Way," IEEE Transactions on Very Large Scale Integration Systems (TVLSI), May 2016.
Authors: Ming-Chang Yang, Yuan-Hao Chang, Tei-Wei Kuo, and Fu-Hsin Chen

As the endurance of flash memory keeps deteriorating, exploiting wear leveling techniques to improve the lifetime/endurance of flash memory has become a critical issue in the design of flash storage devices. Nevertheless, the deteriorated access performance of high-density flash memory makes the performance overheads introduced by wear leveling non-negligible. In particular, the existing wear-leveling designs usually aggressively distribute the erases to all flash blocks evenly in a regular basis. As a result, a lot of non-negligible unnecessary data migrations would be imposed in the early stages of the device lifespan, and would be further exacerbated if a wear leveling design selects improper victim blocks for erases. In contrast to the existing wear-leveling approaches, we propose a progressive wear leveling design to perform wear leveling in a ``progressive'' way to prevent any block from being worn out prematurely with minimized performance overheads caused by the unnecessary data migration. The experiments were conducted based on representative realistic workloads to evaluate the efficacy of the proposed design. The results reveal that instead of sacrificing the device lifetime, performing wear leveling in such a progressive way can not only minimize the performance overheads but also have potentials to extend the device lifetime.
Current Research Results
"Optimal Time-Convex Hull for a Straight-Line Highway in Lp-Metrics," Computational Geometry: Theory and Applications, February 2016.
Authors: B.-S. Dai, Mong-Jen Kao and D.T. Lee

Der- TsaiLeeAbstract:
We consider the problem of computing the time-convex hull of a point set under the general Lp -metric in the presence of a straight-line highway in the plane. The traveling speed along the highway is assumed to be faster than that off the highway, and the shortest time-path between a distant pair may involve traveling along the highway. The time-convex hull TCH(P)of a point set P is the smallest set containing both P and all shortest time-paths between any two points in TCH(P). In this paper we give an algorithm that computes the time-convex hull under the Lp -metric in optimal O(nlogn) time for a given set of n points and a real number p with 1 ≤p ≤∞. 
Current Research Results
Authors: T. Mamie Lih, Wai-Kok Choong, Chen-Chun Chen, Cheng-Wei Cheng, Hsin-Nan Lin, Ching-Tai Chen, Hui-Yin Chang, Wen-Lian Hsu and Ting-Yi Sung

Ting-YiSungWen-LianHsuHui-YinChangChing-TaiCasterChenCheng-Wei ChengWai-Kok ChoongTung-ShingLihAbstract:
MAGIC-web is the first web server, to the best of our knowledge, that performs both untargeted and targeted analyses of mass spectrometry-based glycoproteomics data for site-specific N-linked glycoprotein identification. The first two modules, MAGIC and MAGIC+, are designed for untargeted and targeted analysis, respectively. MAGIC is implemented with our previously proposed novel Y1-ion pattern matching method, which adequately detects Y1- and Y0-ion without prior information of proteins and glycans, and then generates in silico MS 2 spectra that serve as input to a database search engine (e.g. Mascot) to search against a large-scale protein sequence database. On top of that, the newly implemented MAGIC+ allows users to determine glycopeptide sequences using their own protein sequence file. The third module, Reports Integrator, provides the service of combining protein identification results from Mascot and glycan-related information from MAGIC-web to generate a complete site-specific protein-glycan summary report. The last module, Glycan Search, is designed for the users who are interested in finding possible glycan structures with specific numbers and types of monosaccharides. The results from MAGIC, MAGIC+ and Reports Integrator can be downloaded via provided links whereas the annotated spectra and glycan structures can be visualized in the browser. MAGIC-web is accessible from
Current Research Results
Authors: Ya-Xuan Hung, Pei-Chen Huang, Kuan-Ta Chen, and Woeichyn Chu

Stroke is one of the most common causes of physical disability, and early, intensive, and repetitive rehabilitation exercises are crucial to the recovery of stroke survivors. Unfortunately, research shows that only one third of stroke patients actually perform recommended exercises at home, because of the repetitive and mundane nature of conventional rehabilitation exercises. Thus, to motivate stroke survivors to engage in monotonous rehabilitation is a significant issue in the therapy process. Game-based rehabilitation systems have the potential to encourage patients continuing rehabilitation exercises at home. However, these systems are still rarely adopted at patients' places. Discovering and eliminating the obstacles in promoting game-based rehabilitation at home is therefore essential. For this purpose, we conducted a study to collect and analyze the opinions and expectations of stroke patients and clinical therapists. The study is composed of two parts: (i) Rehab-preference survey: interviews to both patients and therapists to understand the current practices, challenges, and expectations on game-based rehabilitation systems, and (ii) Rehab-compatibility survey: a gaming experiment with therapists to elaborate what commercial games are compatible with rehabilitation. The study is conducted with 30 outpatients with stroke and 19 occupational therapists from two rehabilitation centers in Taiwan. Our surveys show that game-based rehabilitation systems can turn the rehabilitation exercises more appealing and provide personalized motivation for various stroke patients. Patients prefer to perform rehabilitation exercises with more diverse and fun games, and need cost-effective rehabilitation systems, which are often built on commodity hardware. Our study also sheds light on incorporating the existing design-for-fun games into rehabilitation system. We envision the results are helpful in developing a platform which enables rehab-compatible (i.e. existing, appropriately selected) games to be operated on commodity hardware and brings cost-effective rehabilitation systems to more and more patients' home for long-term recovery.
Current Research Results
"An Investigation of a Two-Tier Test Strategy in a University Calculus Course: Causes vs. Consequences," IEEE Transactions on Learning Technologies, To Appear.
Authors: Tzu Chi Yang, Sherry Y. Chen, Meng Chang Chen

Meng ChangChenAbstract:
Online tests have been identified as a core learning activity in higher education. Unlike conventional online tests, which cannot completely reflect students; learning status, two-tier tests not only consider students; answers, but also take into account reasons for their answers. Due to such significance, research into the two-tier test had mushroomed but few studies examined why the two-tier test approach was effective. To this end, we conducted an empirical study, where a lag sequential analysis was used to analyze behavior patterns while qualitative data from the questionnaire were applied to explain why these behavior patterns were happened. The results indicated students with a two-tier test tended to realize the rationale of a concept, instead of relying on their memories. In other words, the two-tier test can facilitate students to develop deep thinking skills. This may be because they considered the two-tier test as a learning tool, instead of an assessment tool only.
Current Research Results
"Capacity-independent Address Mapping for Flash Storage Devices with Explosively Growing Capacity," IEEE Transactions on Computers (TC), February 2016.
Authors: Ming-Chang Yang, Yuan-Hao Chang, Tei-Wei Kuo, and Po-Chun Huang

Address mapping for flash storage devices has been a challenging design issue for controllers because of rapidly growing device capacity. In contrast with existing mapping methods, this study proposes a capacity-independent address mapping method to decouple the required on-device RAM space from the capacity of a flash storage device. Especially, the required RAM size of the proposed method depends only on the user accessed data set, which is also referred to as the working set, while the page-level performance can be nearly achieved. In addition, a simple but practical wear-leveling design is proposed with the capability in lifetime estimation of flash storage devices. Experiments of the proposed scheme obtained encouraging results.
"Enabling Sub-blocks Erase Management to Boost the Performance of 3D NAND Flash Memory," ACM/IEEE Design Automation Conference (DAC), June 2016.
Authors: Tseng-Yi Chen, Yuan-Hao Chang, Chien-Chung Ho, and Shuo-Han Chen

3D NAND has been proposed to provide a large capacity storage with low-cost consideration due to its high density memory architecture. However, 3D NAND needs to consume enormous time for garbage collection because live-page copying overhead and long block erase time. To alleviate the impact of live-page copying on the performance of 3D NAND, a sub-block erase design has been designed.With sub-block erase design, this paper proposes a performance booster strategy to extremely boost the performance of garbage collection. As experimental results shows, the proposed strategy has a significant improvement on the average write response time.
Current Research Results
"Compressed Sensing-Based Clone Identification in Sensor Network," IEEE Trans. on Wireless Communications, April 2016.
Authors: Chia-Mu Yu, Chun-Shien Lu, and Sy-Yen Kuo

Clone detection, aimed at detecting illegal copies with all of the credentials of legitimate sensor nodes, is of great importance for sensor networks because of the severe impact of clones on network operations, like routing, data collection, and key distribution. Various detection methods have been proposed, but most of them are communication-inefficient due to the common use of the witness-finding strategy. In view of the sparse characteristic of replicated nodes, we propose a novel clone detection framework, called CSI, based on a state-of-the-art signal processing technology, compressed sensing. Specifically, CSI bases its detection effectiveness on the compressed aggregation of sensor readings. Due to its consideration of data aggregation, CSI not only achieves the asymptotically lowest communication cost but also makes the network traffic evenly distributed over sensor nodes. In particular, they are achieved by exploiting the sparse property of the clones within the sensor network caused by the clone attack. The performance and security of CSI will be demonstrated by numerical simulations, analyses, and prototype implementation.
"DigitSpace: Designing Thumb-To-Fingers Touch Input Interface for One-Handed and Eyes-Free Interactions," ACM SIGCHI Conference on Human Factors in Computing Systems (ACM CHI), March 2016.
Authors: D.-Y. Huang, S. Yang, F. Wang, R.-H. Liang, L. Chan, D.-N. Yang, Y.-P. Hung, B.-Y. Chen.

Thumb-to-fingers interfaces augment touch widgets on fingers, which are manipulated by the thumb. Such interfaces are ideal for one-handed eyes-free input since touch widgets on the fingers enable easy access by the stylus thumb. This study presents DigitSpace, a thumb-to-fingers interface that addresses two ergonomic problems: hand anatomy and touch precision. Hand anatomy restricts possible movements of a thumb, which further influences the physical comfort during the interactions. Touch precision is a human factor that determines how precisely users can manipulate touch widgets set on fingers, which determines effective layouts of the widgets. Buttons and touchpads were considered in our studies to enable discrete and continuous input in an eyes-free manner. The first study explores the regions of fingers where the interactions can be comfortably performed. According to the comfort regions, the second and third studies explore effective layouts for button and touchpad widgets. The experimental results indicate that participants could discriminate at least 16 buttons on their fingers. For touchpad, participants were asked to perform unistrokes. Our results revealed that since individual participant performed a coherent writing behavior, personalized $1 recognizers could offer 92% accuracy on a cross-finger touchpad. A series of design guidelines are proposed for designers, and a DigitSpace prototype that uses magnetic-tracking methods is demonstrated
"Precise Player Segmentation in Team Sports Videos Using Contrast-Aware Co-segmentation," IEEE International Conference on Acoustics, Speech, and Signal Processing, March 2016.
Authors: T.Y. Tsai, Y. Y. Lin, H.Y. Mark Liao, and S. K. Jeng

Player segmentation in team sports videos is challenging but crucial to video semantic understanding, such as player interaction identification and tactic analysis. We leverage the appearance similarity among players of the same team, and cast this task as a co-segmentation problem. In this way, the extra knowledge shared across players significantly reduces unfavorable uncertainty in segmenting individual players. We are also aware that the performance of co-segmentation highly depends on the used features, and further propose a contrastbased approach to estimate the discriminant power of each feature in an unsupervised manner. It turns out that our approach can properly fuse features by assigning higher weights to discriminant ones, and result in remarkable performance gains. The promising results on segmenting basketball players manifest the effectiveness of our approach.
Current Research Results
Authors: Hui-Yin Chang, Ching-Tai Chen, T. Mamie Lih, Ke-Shiuan Lynn, Chiun-Gung Juo, Wen-Lian Hsu, Ting-Yi Sung

Efficient and accurate quantitation of metabolites from LC-MS data has become an important topic. Here we present an automated tool, called iMet-Q (intelligent Metabolomic Quantitation), for label-free metabolomics quantitation from high-throughput MS1 data. By performing peak detection and peak alignment, iMet-Q provides a summary of quantitation results and reports ion abundance at both replicate level and sample level. Furthermore, it gives the charge states and isotope ratios of detected metabolite peaks to facilitate metabolite identification. An in-house standard mixture and a public Arabidopsis metabolome data set were analyzed by iMet-Q. Three public quantitation tools, including XCMS, MetAlign, and MZmine 2, were used for performance comparison. From the mixture data set, seven standard metabolites were detected by the four quantitation tools, for which iMet-Q had a smaller quantitation error of 12% in both profile and centroid data sets. Our tool also correctly determined the charge states of seven standard metabolites. By searching the mass values for those standard metabolites against Human Metabolome Database, we obtained a total of 183 metabolite candidates. With the isotope ratios calculated by iMet-Q, 49% (89 out of 183) metabolite candidates were filtered out. From the public Arabidopsis data set reported with two internal standards and 167 elucidated metabolites, iMet-Q detected all of the peaks corresponding to the internal standards and 167 metabolites. Meanwhile, our tool had small abundance variation (≤0.19) when quantifying the two internal standards and had higher abundance correlation (≥0.92) when quantifying the 167 metabolites. iMet-Q provides user-friendly interfaces and is publicly available for download at
Current Research Results
Authors: Chih-Hao Fang, Yu-Jung Chang, Wei-Chun Chung, Ping-Heng Hsieh, Chung-Yen Lin and Jan-Ming Ho

We developed two subset selection methods for single-end reads and a method for paired-end reads based on base quality scores and other read analytic tools using the MapReduce framework. We proposed two strategies to select reads: MinimalQ and ProductQ. MinimalQ selects reads with minimal base-quality above a threshold. ProductQ selects reads with probability of no incorrect base above a threshold. In the single-end experiments, we used Escherichia coli and Bacillus cereus datasets of MiSeq, Velvet assembler for genome assembly, and GAGE benchmark tools for result evaluation. In the paired-end experiments, we used the giant grouper (Epinephelus lanceolatus) dataset of HiSeq, ALLPATHS-LG genome assembler, and QUAST quality assessment tool for comparing genome assemblies of the original set and the subset. The results show that subset selection not only can speed up the genome assembly but also can produce substantially longer scaffolds. Availability: The software is freely available at
Current Research Results
"Efficient Victim Block Selection for Flash Storage Devices," IEEE Transactions on Computers (TC), December 2015.
Authors: Che-Wei Tsao, Yuan-Hao Chang, Ming-Chang Yang, and Po-Chun Huang

Motivated by the needs to enhance the performance of garbage collection in low-cost flash storage devices, we propose a victim block selection design to efficiently identify the blocks for erases and reclaim the space of invalid data without extensively scanning flash memory for the data status stored in the storage, so as to improve the garbage collection performance on reclaiming the space of invalid data. Moreover, this design could easily identify and reclaim the space released by file systems. Experiments based on benchmark traces show significant performance improvement of garbage collection with limited system overheads.
Current Research Results
"An Acoustic-Phonetic Model of F0 Likelihood for Vocal Melody Extraction," IEEE/ACM Transactions on Audio, Speech, and Language Processing, September 2015.
Authors: Yu-Ren Chien, Hsin-Min Wang, and Shyh-Kang Jeng

This paper presents a novel approach to extraction of vocal melodies from accompanied singing recordings. Central to our approach is a model of vocal fundamental frequency (F0) likelihood that integrates acoustic-phonetic knowledge and real-world data. This model consists of a timbral fitness score and a loudness measure of each F0 candidate. Timbral fitness is measured for the partial amplitudes of an F0 candidate, with respect to a small set of vocal timbre examples. This F0-specific measurement of timbral fitness depends on an acoustic-phonetic F0 modification of each timbre example. In the loudness part of the likelihood model, sinusoids are detected, tracked, and pruned to give loudness values that minimize interference from the accompaniment. A final F0 estimate is determined by a prior model of F0 sequence in addition to the likelihood model. Melody extraction is completed by detecting voiced time positions according to the singing voice loudness variations given by the estimated F0 sequence. The numerical parameters involved in our approach were optimized on three development sets from different sources before the system was evaluated on ten test sets separate from these development sets. Controlled experiments show that use of the timbral fitness score accounts for a 13% difference in overall accuracy.


Academia Sinica Institue of Information Science Academia Sinica