Institute of Information Science
"Enabling Write-Reduction Strategy for Journaling File Systems over Byte-addressable NVRAM," ACM/IEEE Design Automation Conference (DAC), June 2017.
Authors: Tseng-Yi Chen, Yuan-Hao Chang, Shuo-Han Chen, Chih-Ching Kuo, Ming-Chang Yang, Hsin-Wen Wei, and Wei-Kuan Shih

Non-volatile random-access memory (NVRAM) becomes a mainstream storage device in embedded systems due to its favorable features, such as small size, low power consumption, and short read/write latency. Unlike dynamic random access memory (DRAM), on NVRAM, a write operation consumes more energy and time than a read operation. However, current mobile/embedded file systems, such as EXT2/3 and EXT4, are very unfriendly for NVRAM devices. The reason is that a journaling mechanism writes the same data twice during data commitment and checkpoint. Such observations motivate this paper to design a two-phase write reduction journaling file system called wrJFS. In the first phase, wrJFS classified data into two categories: Metadata and user data. Metadata will be handled by partial byte-enabled journaling strategy, and user data will be processed in the second phase. In the second phase, user data will be compressed by hardware encoder so as to reduce the write size, and managed compressed-enabled journaling strategy to avoid the write amplification. The experimental results show that the proposed wrJFS can reduce the size of the write request by 89.7% on average, compared with the original EXT3.
"VirtualGC: Enabling Erase-free Garbage Collection to Upgrade the Performance of Rewritable SLC NAND Flash Memory," ACM/IEEE Design Automation Conference (DAC), June 2017.
Authors: Tseng-Yi Chen, Yuan-Hao Chang, Yuan-Hung Kuan, and Yu-Ming Chang,

Since 3D NAND flash memory could provide more reliable storage than a 2D planar flash memory by relaxing the design rule of a memory cell, a kind of brand new programming technique, namely erase-free scheme, has been proposed to further enhance the endurance of a 3D SLC NAND flash memory. The erase-free scheme brings tons of benefits to flash memory performance and endurance. For example, the erase-free scheme could reclaim invalid (page) space without physically erasing a flash block. However, current flash management designs could not fully exploit the benefits of the erase-free scheme. With the considerations of the features of the erase-free scheme, this paper is the first work to propose a novel flash management design, namely VirtualGC strategy, to deal with the erase-free garbage collection process. By taking the advantages of the erase-free scheme, the proposed strategy reduces the overhead of copying live pages so as to increase flash memory performance. The results show that the proposed strategy significantly improves the performance of rewritable 3D flash memory drives.
"A Pattern-aware Write Strategy to Enhance the Reliability of Flash-Memory Storage Systems," ACM Symposium on Applied Computing (SAC), April 2017.
Authors: Tseng-Yi Chen, Yuan-Hao Chang, Yuan-Hung Kuan, Ming-Chang Yang, Yu-Ming Chang, and Pi-Cheng Hsiu

Owing to high cell density caused by the advanced manufacturing process, the reliability of flash drives turns out to be rather challenging in flash system designs. In order to enhance the reliability of flash drives, error-correcting code (ECC) has been widely utilized in flash drives to correct error bits during programming/reading data to/from flash drives. Although ECC can effectively enhance the reliability of flash drives by correcting error bits, the capability of ECC would degrade while the program/erase (P/E) cycles of flash blocks is increased. Finally, ECC could not correct a flash page because a flash page contains too many error bits. As a result, reducing error bits is an effective solution to further improve the reliability of flash drives when a specific ECC is adopted in the flash drive. This work focuses on how to reduce the probability of producing error bits in a flash page. Thus, we propose a pattern-aware write strategy that allocates young blocks (i.e., blocks with low P/E cycles) for storing hot data and executes bit-flip operations on the written data so as to reduce the number of error bits in a flash page. By considering both the P/E cycles of blocks and the pattern of written data, the proposed pattern-aware write strategy can effectively improve the reliability of flash drives. The experimental results show that the proposed strategy can reduce the number of error pages by up to 40%, compared with the well-known DFTL solution. Moreover, the proposed strategy is orthogonal with all ECC mechanisms so that the reliability of the flash drives with ECC mechanisms can be further improved by the proposed strategy.
"Efficient Cache Update for In-Memory Cluster Computing with Spark," 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2017.
Authors: Li-Yung Ho, Jan-Jan Wu, Pangfeng Liu, Chia-Chun Shih, Chi-Chang Huang and Chao-Wen Huang

This paper proposes a scalable and efficient billing system for Chunghwa Telecom, the largest telecom company in Taiwan. We use the popular in-memory clustering computing framework – Spark, to develop our system. Despite the memory cache speeds up the data processing in Spark, its data immutability assumption makes the RDD replacement inefficient. To address this problem, we propose partial-update RDD, which enables users to replace individual partition of an RDD. We formulate this RDD partition problem, which address the issues of partition replacement efficiency. We develop two solutions to the problem – a dynamic programming algorithm and a nonlinear programming method. Experiment results suggest that, partial-update RDD achieves 4.32x speedup when compared with the original RDD in Spark. The proposed billing system outperforms the original billing system in CHT by a factor of 24x in throughput.
"Deep-Net Fusion to Classify Shots in Concert Videos," IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017), March 2017.
Authors: Wen-Li Wei, Jen-Chun Lin, Tyng-Luh Liu, Yi-Hsuan Yang, Hsin-Min Wang, Hsiao-Rong Tyan, and Hong-Yuan Mark Liao

Varying types of shots is a fundamental element in the language of film, commonly used by a visual storytelling director to convey the emotion, ideas, and art. To classify such types of shots from images, we present a new framework that facilitates the intriguing task by addressing two key issues. We first focus on learning more effective features by fusing the layer-wise outputs extracted from a deep convolutional neural network (CNN), pre-trained on a large-scale dataset for object recognition. We then introduce a probabilistic fusion model, termed as error weighted deep crosscorrelation model (EW-Deep-CCM), to boost the classification accuracy. Specifically, the deep neural network-based cross-correlation model (Deep-CCM) is constructed to not only model the extracted feature hierarchies of CNN independently but also relate the statistical dependencies of paired features from different layers. Then, a Bayesian error weighting scheme for classifier combination is adopted to explore the contributions from individual Deep-CCM classifiers to enhance the accuracy of shot classification. We provide extensive experimental results on a dataset of live concert videos to demonstrate the advantage of the proposed EW-Deep-CCM over existing popular fusion approaches.

The video demos can be found at
"General Randomness Amplification with Non-signaling Security," The 20th Annual Conference on Quantum Information Processing (QIP2017), January 2017.
Authors: Kai-Min Chung, Yaoyun Shi and Xiaodi Wu

Highly unpredictable events appear to be abundant in life. However, when modeled rigorously, their existence in nature is far from evident. In fact, the world can be deterministic while at the same time the predictions of quantum mechanics are consistent with observations. Assuming that randomness does exist but only in a weak form, could highly random events be possible? This fundamental question was first raised by Colbeck and Renner (Nature Physics, 8:450–453, 2012). In this work, we answer this question positively, without the various restrictions assumed in the previous works. More precisely, our protocol uses quantum devices, a single weak randomness source quantified by a general notion of non-signaling min-entropy, tolerates a constant amount of device imperfection, and the security is against an all-powerful non-signaling adversary. Unlike the previous works proving non-signaling security, our result does not rely on any structural restrictions or independence assumptions. Thus it implies a stronger interpretation of the dichotomy statement articulated by Gallego et al. (Nature Communications, 4:2654, 2013): “[e]ither our world is fully deterministic or there exist in nature events that are fully random.”

Note: This is a new work after our QIP 2014 paper, where the security proved is against a quantum, as opposed to non-signaling, adversary. 
Current Research Results
Authors: Anderson B. Mayfield, Yu-Bin Wang, Chii-Shiarng Chen, Shu-Hwa Chen, and Chung-Yen Lin*

As significant anthropogenic pressures are putting undue stress on the world's oceans, there has been a concerted effort to understand how marine organisms respond to environmental change. Transcriptomic approaches, in particular, have been readily employed to document the mRNA-level response of a plethora of marine invertebrates exposed to an array of simulated stress scenarios, with the tacit and untested assumption being that the respective proteins show a corresponding trend. To better understand the degree of congruency between mRNA and protein expression in an endosymbiotic marine invertebrate, mRNAs and proteins were sequenced from the same samples of the common, Indo-Pacific coral Seriatopora hystrix exposed to stable or upwelling-simulating conditions for 1 week. Of the 167 proteins downregulated at variable temperature, only two were associated with mRNAs that were also differentially expressed between treatments. Of the 378 differentially expressed genes, none were associated with a differentially expressed protein. Collectively, these results highlight the inherent risk of inferring cellular behaviour based on mRNA expression data alone and challenge the current, mRNA-focused approach taken by most marine and many molecular biologists.

Reference website: .
Current Research Results
"Utilization-aware Self-tuning Design for TLC Flash Storage Devices," IEEE Transactions on Very Large Scale Integration Systems (TVLSI), October 2016.
Authors: Ming-Chang Yang, Yuan-Hao Chang, Che-Wei Tsao, and Chung-Yu Liu

The high-density, low-cost triple-level-cell (TLC) flash memory has gradually dominated the flash storage market because of the fast-growing demand for storage capacity. However, the advances of manufacturing technologies also make TLC flash memory suffer serious performance degradation compared with the low-density, high-performance single-level-cell (SLC) flash memory. To address this issue, some vendors enable blocks of TLC flash memory to work as high-performance, low-density SLC blocks. In contrast to the past research that allocates a fixed number of TLC blocks as SLC blocks to improve the device performance to a certain degree, we propose a utilization-aware self-tuning design to trade more unused storage capacity for better system performance. The introduced design dynamically adjusts and maximizes the number of SLC blocks according to the amount of data stored in the storage device at runtime. With the self-tuning design, a flash storage device can not only achieve high access performance but also provide enough storage capacity. The performance and capability of proposed design were evaluated by a series of experiments, and the results are very encouraging.
Current Research Results
"Graceful Space Degradation: An Uneven Space Management for Flash Storage Devices," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), September 2016.
Authors: Ming-Chang Yang, Yuan-Hao Chang, Yuan-Hung Kuan, and Che-Wei Tsao

The high cell density, multilevel-cell programming, and manufacturing process variance force the new coming flash memory to have large bit-error-rate variance among blocks and pages, where a flash chip consists of multiple blocks and each block consists of a fixed number of pages. In order to avoid storing the crucial user data in more fragile pages, conventional flash management software tends to aggressively discard the high bit-error-rate area in the unit of a block. However, together with the aggressive discarding strategies and the enlarging sizes of pages/blocks of next generation flash memory, the available space of flash devices might encounter a very sharp degradation and therefore result in rapidly-shortened device lifespan. Thus, we advocate the concept of “graceful space degradation” to mitigate this problem by discarding the high bit-error-rate (or worn-out) area in the unit of pages (instead of blocks). To furthermore realize this concept, we are the pioneer to put forward an “uneven space management” to manage flash blocks containing different number of bad pages. Our design especially focuses on placing data with different access behaviors to make the best uses of blocks with different available space so as to ultimately prolong the device lifespan with good access performance. The experiments were conducted based on representative realistic workloads, and the results reveal that the proposed design can extend the device lifetime by at least 2.38 times of that of existent approaches, with very limited performance overheads.
"Exploiting Longer SIMD Lanes in Dynamic Binary Translation," IEEE International Conference on Parallel and Distributed Systems (ICPADS), December 2016.
Authors: Ding-Yong Hong, Sheng-Yu Fu, Yu-Ping Liu, Jan-Jan Wu, and Wei-Chung Hsu

Recent trends in SIMD architecture have tended toward longer vector lengths and more enhanced SIMD features have been introduced in the newer vector instruction sets. However, legacy or proprietary applications compiled with short-SIMD ISA cannot benefit from the long-SIMD architecture, which supports improved parallelism and enhanced vector primitives, and thus only achieve a small fraction of potential peak performance. This paper presents a dynamic binary translation technique that enables short-SIMD binaries to exploit the benefits of the new SIMD architecture by rewriting short-SIMD loop code. We propose a general approach that translates loops consisting of short-SIMD instructions to machine-independent IR, conducts SIMD loop transformation/optimization at this IR level, and finally translates to long-SIMD instructions. Two solutions are presented to enforce SIMD load/store alignment, one for the problem caused by the binary translators internal translation condition and one general approach using loop peeling optimization. The benchmark results show that an average speedup of 1.45X is achieved for NEON to AVX2 loop transformation.
"Framework Designs to Enhance Reliable and Timely Services of Disaster Management Systems," ACM/IEEE International Conference on Computer-Aided Design (ICCAD), November 2016.
Authors: Chi-Sheng Shih, Pi-Cheng Hsiu, Yuan-Hao Chang, Tei-Wei Kuo

How to tolerate fault is a fundamental requirement to the designs of many cyber-physical systems. Devices or sensors might have different requirements on their levels of reliability and/or timely services in the composition of a cyber-physical system. In this work, a system framework is explored to virtualize devices/sensors and service migration is considered during run time, so that faults are masked and the timeliness in services is enhanced. In particular, a disaster messaging system supporting seamlessly service recovery in small and large scale network is developed and evaluated, where an acceptable level of connectivity in the face of numerous faults for responsive deliveries of information critical to the success of emergency response and rescue operations. The framework also takes into account the energy consumption and reliability of sensing services, while using different types of memory components. Last but not least, augmented sensing using smart phones allows users to receive the sensed information nearby; this is critical when communication infrastructures are damaged.
Current Research Results
"Virtual Flash Chips: Reinforcing the Hardware Abstraction Layer to Improve Data Recoverability of Flash Devices," IEEE Transactions on Computers (TC), September 2016.
Authors: Ming-Chang Yang, Yuan-Hao Chang, and Tei-Wei Kuo

The market trend of flash memory chips has been toward high density but with low reliability. The rapidly increasing bit error rates and emerging reliability issues of the coming triple-level cell and even three-dimensional flash chips will expose users to extremely high risks for storing data in such low reliability storage media. With these concerns in mind, this paper rethinks the layer design of flash devices and proposes a complete paradigm shift to re-configure physical flash chips of potentially massive parallelism into better virtual chips, in order to improve the data recoverability in a modular and low-cost way. The concept of virtual chips is realized by reinforcing the hardware abstraction layer without continually complicating the conventional flash management software of the flash translation layer. The capability and compatibility of the proposed design were verified by both property analysis and a series of experiments with encouraging results.
Current Research Results
"Alignment of Lyrics With Accompanied Singing Audio Based on Acoustic-Phonetic Vowel Likelihood Modeling," IEEE/ACM Transactions on Audio, Speech, and Language Processing, November 2016.
Authors: Yu-Ren Chien, Hsin-Min Wang, and Shyh-Kang Jeng

This study addresses the task of aligning lyrics with accompanied singing recordings.With a vowel-only representation
of lyric syllables, our approach evaluates likelihood scores of vowel types with glottal pulse shapes and formant frequencies extracted from a small set of singing examples. The proposed vowel likelihood model is used in conjunction with a prior model of frame-wise syllable sequence in determining an optimal evolution of syllabic position. In lyrics alignment experiments, we optimized numerical parameters on two independent development sets and then tested the optimized system on two other datasets. New objective performance measures are introduced in the evaluation to provide further insight into the quality of alignment. Use of glottal pulse shapes and formant frequencies is shown by a controlled experiment
to account for a 0.07 difference in average normalized alignment error.Another controlled experiment demonstrates that,with
a difference of 0.03, F0-invariant glottal pulse shape gives a lower average normalized alignment error than does F0-invariant spectrum envelope, the latter being assumed by MFCC-based timbre models.
Current Research Results
"Constrained Null Space Component Analysis for Semi-Blind Source Separation Problem," IEEE Transactions on Neural Networks and Learning Systems, November 2016.
Authors: Wen-Liang Hwang, Keng-Shih Lu, and Jinn Ho

The blind source separation (BSS) problem extracts unknown sources from observations of their unknown mixtures. A current trend in BSS is the semi-blind approach, which incorporates prior information on sources or how the sources are mixed. The constrained ICA (c-ICA) approach has been studied to impose constraints on the famous ICA framework. We introduced an alternative approach based on the null space component (NCA) framework and referred to the approach as the c-NCA approach. We also presented the c-NCA algorithm that uses signal-dependent semi-definite operators, which is a bilinear mapping, as signatures for operator design in the c-NCA approach. Theoretically, we showed that the source estimation of the c-NCA algorithm converges with a convergence rate dependent on the decay of the sequence, obtained by applying the estimated operators on corresponding sources. The c-NCA can be formulated as a deterministic constrained optimization method, thus it can take advantage of solvers developed in optimization society for solving the BSS problem. As examples, we demonstrated electroencephalogram (EEG) interference rejection problems can be solved by the c-NCA with proximal splitting algorithms by incorporating a sparsity-enforcing separation model and considering the case when reference signals are available.
"Cross-batch Reference Learning for Deep Classification and Retrieval," ACM International Conference on Multimedia 2016, ACM MM 2016 (full paper), October 2016.
Authors: Huei-Fang Yang, Kevin Lin, and Chu-Song Chen

Learning feature representations for image retrieval is essential to multimedia search and mining applications. Recently, deep convolutional networks (CNNs) have gained much attention due to their impressive performance on object detection and image classification, and the feature representations learned from a large-scale generic dataset (\\eg, ImageNet) can be transferred to or fine-tuned on the datasets of other domains. However, when the feature representations learned with a deep CNN are applied to image retrieval, the performance is still not as good as they are used for classification, which restricts their applicability to relevant image search. To ensure the retrieval capability of the learned feature space, we introduce a new idea called \\textit{cross-batch reference} (CBR) to enhance the stochastic-gradient-descent (SGD) training of CNNs. In each iteration of our training process, the network adjustment relies not only on the training samples in a single batch, but also on the information passed by the samples in the other batches. This inter-batches communication mechanism is formulated as a cross-batch retrieval process based on the \\textit{mean average precision} (MAP) criterion, where the relevant and irrelevant samples are encouraged to be placed on top and rear of the retrieval list, respectively. The learned feature space is not only discriminative to different classes, but the samples that are relevant to each other or of the same class are also enforced to be centralized. To maximize the cross-batch MAP, we design a loss function that is an approximated lower bound of the MAP on the feature layer of the network, which is differentiable and easier for optimization. By combining the intra-batch classification and inter-batch cross-reference losses, the learned features are effective for both classification and retrieval tasks. Experimental results on various benchmarks demonstrate the effectiveness of our approach.
"UTCNN: a Deep Learning Model of Stance Classification on Social Media Text," the 26th International Conference on Computational Linguistics (COLING 2016), 2016.
Authors: Wei-Fan Chen and Lun-Wei Ku

Lun-WeiKuWei-Fan ChenAbstract:
Most neural network models for document classification on social media focus on text infor-mation to the neglect of other information on these platforms.  In this paper, we classify poststance on social media channels and develop UTCNN, a neural network model that incorporatesuser tastes, topic tastes, and user comments on posts.  UTCNN not only works on social mediatexts, but also analyzes texts in forums and message boards. Experiments performed on ChineseFacebook data and English online debate forum data show that UTCNN achieves a 0.755 macro-average f-score for supportive, neutral, and unsupportive stance classes on Facebook data, whichis significantly better than models in which either user, topic, or comment information is with-held. This model design greatly mitigates the lack of data for the minor class without the use ofoversampling. In addition, UTCNN yields a 0.842 accuracy on English online debate forum data,which also significantly outperforms results from previous work as well as other deep learningmodels, showing that UTCNN performs well regardless of language or platform.
"Intelligent Indoor Emergency Evacuation Systems - Reference Architecture and Data Requirements," Proceedings of Future Technologies Conference (FTC 2016), December 2016, December 2016.
Authors: J. W. S. Liu, E. T.H. Chu, F. T Lin and Z. L. Zhong

Jane Win ShihLiuAbstract:

An intelligent indoor emergency evacuation system (IES) can take appropriate risk reduction actions in response to alerts from building safety systems warning of emergencies originated within the building and from government agencies warning of natural disasters affecting surrounding areas. This paper presents data model and reference architecture of IES for large public buildings and case studies to capture their data requirements. Many technical and practical challenges can raise barriers to the wide deployment of IES. Examples of such challenges and solutions to overcome them are also presented.

"Learning to Distill: The Essence Vector Modeling Framework," International Conference on Computational Linguistics (COLING2016), December 2016.
Authors: Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen and Hsin-Min Wang

In the context of natural language processing, representation learning has emerged as a newly active research subject because of its excellent performance in many applications. Learning representations of words is a pioneering study in this school of research. However, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as sentiment classification and document summarization. Nevertheless, as far as we are aware, there is relatively less work focusing on the development of unsupervised paragraph embedding methods. Classic paragraph embedding methods infer the representation of a given paragraph by considering all of the words occurring in the paragraph. Consequently, those stop or function words that occur frequently may mislead the embedding learning process to produce a misty paragraph representation. Motivated by these observations, our major contributions in this paper are twofold. First, we propose a novel unsupervised paragraph embedding method, named the essence vector (EV) model, which aims at not only distilling the most representative information from a paragraph but also excluding the general background information to produce a more informative low-dimensional vector representation for the paragraph. We evaluate the proposed EV model on benchmark sentiment classification and multi-document summarization tasks. The experimental results demonstrate the effectiveness and applicability of the proposed embedding method. Second, in view of the increasing importance of spoken content processing, an extension of the EV model, named the denoising essence vector (D-EV) model, is proposed. The D-EV model not only inherits the advantages of the EV model but also can infer a more robust representation for a given spoken paragraph against imperfect speech recognition. The utility of the D-EV model is evaluated on a spoken document summarization task, confirming the practical merits of the proposed embedding method in relation to several well-practiced and state-of-the-art summarization methods.
"Delegating RAM Computations with Adaptive Soundness and Privacy," Fourteenth IACR Theory of Cryptography Conference - TCC 2016-B, November 2016.
Authors: Prabhanjan Ananth, Yu-Chi Chen, Kai-Min Chung, Huijia Lin and Wei-Kai Lin

    We consider the problem of delegating RAM computations over persistent databases: A user wishes to delegate a sequence of computations over a database to a server, where each compuation may read and modify the database and the modifications persist between computations. For the efficiency of the server, it is important that computations are modeled as RAM programs, for their runtime may be sub-linear in the size of the database.
    Two security needs arise in this context: Ensuring Intergrity, by designing means for the server to compute short proofs that allows the user to efficiently verify the correctness of the server computation, and privacy, providing means for the user to hide his private databases and programs from a malicious server. In this work, we aim to address both security needs, especially in the stringent, adaptive, setting, where the sequence of RAM computations are (potentially) chosen adaptively by a malicious server depending on the messages from an honest user.
    To this end, we construct the first RAM delegation scheme achieving both adaptive integrity (a.k.a. soundness) and adaptive privacy, assuming the existence of indistinguishability obfuscation for circuits and a variant of the two-to-one somewhere perfectly binding hash [Okamoto et al. ASIACRYPT’15] (the latter can be based on the decisional Diffie-Hellman assumption). Prior works focused either only on adaptive soundness [Kalai and Paneth, ePrint’15] or on the weaker variant, selective soundness and privacy [Chen et al. ITCS’16, Canetti and Holmgren ITCS’16].
    At a high-level, our result is obtained by applying a generic “security lifting technique” to the delegation scheme of Chen et al. and its proof of selective soundness and privacy. The security lifting technique formalizes an abstract framework of selective security proofs, and generically “lifts” such proofs into proofs of adaptive security. We believe that this technique can potentially be applied to other cryptographic schemes and is of independent interest.
Current Research Results
"Space-Efficient Index Scheme for PCM-based Multiversion Databases in Cyber-Physical Systems," ACM Transactions on Embedded Computing Systems (TECS), October 2016.
Authors: Yuan-Hung Kuan, Yuan-Hao Chang, Tseng-Yi Chen, Po-Chun Huang, and Kam-Yiu Lam

In this article, we study the indexing problem of usingPCMas the storagemedium for embeddedmultiversion databases in cyber-physical systems (CPSs). Although the multiversion B+-tree (MVBT) index has been shown to be efficient in managing multiple versions of data items in a database, MVBT is designed for databases residing in traditional block-oriented storage devices. It can have serious performance problems when the databases are on phase-change memory (PCM). Since the embeddedmultiversion database in CPSsmay have limited storage space and are update intensive, to resolve the problems of MVBT of lack of space efficiency and heavy update cost, we propose a new index scheme, called space-efficient multiversion index (SEMI), to enhance the space utilization and access performance in serving various types of queries. In SEMI, since the number of keys in the database may be small, instead of using a B-tree index, we propose to use a binary-search tree to organize the index keys. Furthermore, multiple versions of the same data item may be stored consecutively and indexed by a single entry to maximize the space utilization and at the same time to enhance the performance in serving version-range queries. Analytical studies have been conducted on SEMI, and a series of experiments have been performed to evaluate its performance as compared withMVBT under different workloads. The experimental results have demonstrated that SEMI can achieve very high space utilization and has better performance in serving update transactions and range queries as compared with MVBT.
Current Research Results
"SPIRIT: A Tree Kernel-based Method for Topic Person Interaction Detection," IEEE Transactions on Knowledge and Data Engineering (TKDE), August 2016.
Authors: Yung-Chun Chang, Chien Chin Chen, and Wen-Lian Hsu

The development of a topic in a set of topic documents is constituted by a series of person interactions at a specific time and place. Knowing the interactions of the persons mentioned in these documents is helpful for readers to better comprehend the documents. In this paper, we propose a topic person interaction detection method called SPIRIT, which classifies the text segments in a set of topic documents that convey person interactions. We design the rich interactive tree structure to represent syntactic, context, and semantic information of text, and this structure is incorporated into a tree-based convolution kernel to identify interactive segments. Experiment results based on real world topics demonstrate that the proposed rich interactive tree structure effectively detects the topic person interactions and that our method outperforms many well-known relation extraction and protein-protein interaction methods.
Current Research Results
Authors: Yuxiang Jiang, ..., Caster Chen, Wen-Lian Hsu et al.

Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
Current Research Results
Authors: Sheng-Yao Su, Shu-Hwa Chen, I-Hsuan Lu, Yih-Shien Chiang, Yu-Bin Wang, Pao-Yang Chen, Chung-Yen Lin*

Chung-YenLinI-Hsuan LuShu-Hwa ChenSheng-Yao SuAbstract:
Background: Bisulfite sequencing (BS-seq) has become a standard technology to profile genome-wide DNA methylation at single-base resolution. It allows researchers to conduct genome-wise cytosine methylation analyses on issues about genomic imprinting, transcriptional regulation, cellular development and differentiation. One single data from a BS-Seq experiment is resolved into many features according to the sequence contexts, making methylome data analysis and data visualization a complex task. Results: We developed a streamlined platform, TEA, for analyzing and visualizing data from whole-genome BS-Seq (WGBS) experiments conducted in the model plant Arabidopsis thaliana. To capture the essence of the genome methylation level and to meet the efficiency for running online, we introduce a straightforward method for measuring genome methylation in each sequence context by gene. The method is scripted in Java to process BS-Seq mapping results. Through a simple data uploading process, the TEA server deploys a web-based platform for deep analysis by linking data to an updated Arabidopsis annotation database and toolkits. Conclusions: TEA is an intuitive and efficient online platform for analyzing the Arabidopsis genomic DNA methylation landscape. It provides several ways to help users exploit WGBS data.
TEA is freely accessible for academic users at:
Current Research Results
Authors: Jen-Chieh Lee, Sheng-Yao Su, Chun A, Changou, Rong-Sen Yang, Keh-Sung Tsai, Michael T. Collins, Eric S. Orwoll, Chung-Yen Lin, Shu-Hwa Chen, Shyang-Rong Shih, Chen-Han Lee, Yoshinao Oda, Steven D. Billings, Chien-Feng Li, G. Petur Nielsen, Eiichi Konishi, Fredrik Petersson, Thomas O. Carpenter, Hsuan-Ying Huang, and Andrew L. Folpe

Phosphaturic mesenchymal tumors typically cause paraneoplastic osteomalacia, chiefly as a result of FGF23 secretion. In a prior study, we identified FN1-FGFR1 fusion in 9 of 15 phosphaturic mesenchymal tumors. In this study, a total of 66 phosphaturic mesenchymal tumors and 7 tumors resembling phosphaturic mesenchymal tumor but without known phosphaturia were studied. A novel FN1-FGF1 fusion gene was identified in two cases without FN1-FGFR1 fusion by RNA sequencing and cross-validated with direct sequencing and western blot. Fluorescence in situ hybridization analyses revealed FN1-FGFR1 fusion in 16 of 39 (41%) phosphaturic mesenchymal tumors and identified an additional case with FN1-FGF1 fusion. The two fusion genes were mutually exclusive. Combined with previous data, the overall prevalence of FN1-FGFR1 and FN1-FGF1 fusions was 42% (21/50) and 6% (3/50), respectively. FGFR1 immunohistochemistry was positive in 82% (45/55) of phosphaturic mesenchymal tumors regardless of fusion status. By contrast, 121 cases of potential morphologic mimics (belonging to 13 tumor types) rarely expressed FGFR1, the main exceptions being solitary fibrous tumors (positive in 40%), chondroblastomas (40%), and giant cell tumors of bone (38%), suggesting a possible role for FGFR1 immunohistochemistry in the diagnosis of phosphaturic mesenchymal tumor. With the exception of one case reported in our prior study, none of the remaining tumors resembling phosphaturic mesenchymal tumor had either fusion type or expressed significant FGFR1. Our findings provide insight into possible mechanisms underlying the pathogenesis of phosphaturic mesenchymal tumor and imply a central role of the FGF1-FGFR1 signaling pathway. The novel FN1-FGF1 protein is expected to be secreted and serves as a ligand that binds and activates FGFR1 to achieve an autocrine loop. Further study is required to determine the functions of these fusion proteins.Modern Pathology advance online publication, 22 July 2016; doi:10.1038/modpathol.2016.137.
Current Research Results
"Efficient Warranty-Aware Wear-Leveling for Embedded Systems with PCM Main Memory," IEEE Transactions on Very Large Scale Integration Systems (TVLSI), July 2016.
Authors: Sheng-Wei Cheng, Yuan-Hao Chang, Tseng-Yi Chen, Yu-Fen Chang, Hsin-Wen Wei, and Wei-Kuan Shih

Recently, Phase Change Memory (PCM) becomes a promising candidate to replace DRAM as main memory due to its low power consumption, fast I/O performance, and byte addressability. Accompanied with the merits, the adoption of PCM may suffer from its physical characteristic of limited write endurance. Wear leveling is a well-known approach to address this issue. For PCM main memory, the design of wear leveling should stress operation efficiency and overhead reduction. Nevertheless, conventional designs are usually dedicated to prolonging the lifetime of PCM in the best effort. In this paper, we propose a novel perspective that, instead of valuing PCM lifetime exploitation as the first priority, we turn to satisfy the product warranty period. With such a paradigm shift, the management overhead of wear leveling mechanisms could be reduced so as to achieve further enhancement of operation efficiency. To this end, we propose a warranty-aware page management design that introduces novel criteria used to determine the state of a page by taking both the product warranty period and the write cycles of a page into consideration. Theoretical analysis is also conducted to investigate properties and performance of the proposed management. To show the effectiveness of the proposed design, we collected real traces by running SPEC2006 benchmarks with different write intensity workloads. The experimental results showed that our design reduced the overhead to one-third that of the state-of-the-art designs while still providing the same level of performance.
Current Research Results
"Improving PCM Endurance with a Constant-cost Wear Leveling Design," ACM Transactions on Design Automation of Electronic Systems (TODAES), June 2016.
Authors: Yu-Ming Chang, Pi-Cheng Hsiu, Yuan-Hao Chang, Chi-Hao Chen, Tei-Wei Kuo, and Cheng-Yuan Michael Wang

Improving PCM endurance is a fundamental issue when it is considered as an alternative to replace DRAM as main memory. Memory-based wear leveling is an effective way to improve PCM endurance, but its major challenge is how to efficiently determine the appropriate memory pages for allocation or swapping. In this paper, we present a constant-cost wear leveling design that is compatible with existing memory management. Two implementations, namely bucket-based and array-based wear leveling, with constant-time (or nearly zero) search cost are proposed to be integrated into the OS layer and the hardware layer respectively, as well as to trade between time and space complexity. The results of experiments conducted based on an implementation in Android, as well as simulations with popular benchmarks, to evaluate the effectiveness of the proposed design are very encouraging.
Current Research Results
Authors: Tzu-Po Chuang, Jaw-Yuan Wang, Shu-Wen Jao, Chang-Chieh Wu, Jiann-Hwa Chen, Koung-Hung Hsiao, Chung-Yen Lin, Shu-Hwa Chen, Sheng-Yao Su, Ying-Ju Chen, Yuan-Tsong Chen, Deng-Chyang Wu, Ling-Hui Li

Development of colorectal cancer (CRC) involves sequential transformation of normal mucosal tissues into benign adenomas and then adenomas into malignant tumors. The identification of genes crucial for malignant transformation in colorectal adenomas (CRAs) has been based primarily on cross-sectional observations. In this study, we identified relevant genes using autologous samples. By performing genome-wide SNP genotyping and RNA sequencing analysis of adenocarcinomas, adenomatous polyps, and non-neoplastic colon tissues (referred as tri-part samples) from individual patients, we identified 68 genes with differential copy number alterations and progressively dysregulated expression. Aurora A, SKA3, and DSN1 protein levels were sequentially up-regulated in the samples, and this overexpression was associated with chromosome instability (CIN). Knockdown of SKA3 in CRC cells dramatically reduced cell growth rates and increased apoptosis. Depletion of SKA3 or DSN1 induced G2/M arrest and decreased migration, invasion, and anchorage-independent growth. AURKA and DSN1 are thus critical for chromosome 20q amplification-associated malignant transformation in CRA. Moreover, SKA3 at chromosome 13q was identified as a novel gene involved in promoting malignant transformation. Evaluating the expression of these genes may help identify patients with progressive adenomas, helping to improve treatment.


Academia Sinica 資訊科學研究所 Academia Sinica