Institute of Information Science
Recent Research Results
Current Research Results
Authors: Lin CH, Chen SH,Wang YB, Hsiung CA, Lin CY*

Enteroviruses (EV) with different genotypes cause diverse infectious diseases in humans and mammals. A correct EV typing result is crucial for effective medical treatment and disease control; however, the emergence of novel viral strains has impaired the performance of available diagnostic tools. Here, we present a web-based tool, named EVIDENCE (EnteroVirus In DEep coNCEption), for EV genotyping and recombination detection. We introduce the idea of using mixed–ranking scores to evaluate the fitness of prototypes based on relatedness and on the genome regions of interest. Using phylogenetic methods, the most possible genotype is determined based on the closest neighbor among the selected references. To detect possible recombination events, EVIDENCE calculates the sequence distance and phylogenetic relationship among sequences of all sliding windows scanning over the whole genome. Detected recombination events are plotted in an interactive figure for viewing of fine details. In addition, all EV sequences available in GenBank were collected and revised using the latest classification and nomenclature of EV in EVIDENCE. These sequences are built into the database and are retrieved in an indexed catalog, or can be searched for by keywords or by sequence similarity. EVIDENCE is the first web-based tool containing pipelines for genotyping and recombination detection, with updated, built-in, and complete reference sequences to improve sensitivity and specificity. The use of EVIDENCE can accelerate genotype identification, aiding clinical diagnosis and enhancing our understanding of EV evolution. 
Reference URL:
"Access Pattern Reshaping for eMMC-enabled SSDs," ACM/IEEE International Conference on Computer-Aided Design (ICCAD), November 2015.
Authors: Chien-Chung Ho, Yuan-Hao Chang, and Tei-Wei Kuo

The growing popularity of embedded Multi-Media Controllers (eMMCs) presents a unique opportunity to design solid-state drives of commodity products. This work addresses the essential design issues of such drives and introduces a light-weighted FTL design. In particular, access patterns to an eMMC-enabled solid-state drive are reshaped to accommodate the characteristics of eMMCs. That is, accesses to such a drive are reshaped to create sequential access patterns and writes of specific sizes preferred to eMMCs without resorting to some ordinary address translation design of FTL. In the meantime, garbage collection overheads should be minimized with reliability considerations, where eMMCs are usually not of a powerful controller of a sophisticated design. The capability of the proposed design is evaluated by a series of experiments, for which we have very encouraging results.
"A Light-Weighted Software-Controlled Cache for PCM-based Main Memory Systems," ACM/IEEE International Conference on Computer-Aided Design (ICCAD), November 2015.
Authors: Hung-Sheng Chang, Yuan-Hao Chang, Tei-Wei Kuo, and and Hsiang-Pang Li

The replacement of DRAM with non-volatile memory relies on solutions to resolve the wear leveling and slow write problems. Different from the past work in compiler-assisted optimization or joint DRAM-PCM management strategies, we explore a light-weighted software-controlled DRAM cache design for the non-volatile-memory-based main memory. The run-time overheads in the management of the DRAM cache is minimized by utilizing the information from a miss of the translation lookaside buffer (TLB) or the cache. Experiments were conducted based on a series of the well-known benchmarks to evaluate the effectiveness of the proposed design, for which the results are very encouraging.
"Spatio-Temporal Learning of Basketball Offensive Strategies," 2015 ACM Multimedia Conference, October 2015.
Authors: Ching-Hang Chen, Tyng-Luh Liu, Yu-Shuen Wang, Hung-Kuo Chu, Nick C. Tang, Hong-Yuan Mark Liao

Video-based group behavior analysis is drawing attention to its rich applications in sports, military, surveillance and biological observations. The recent advances in tracking techniques, based on either computer vision methodology or hardware sensors, further provide the opportunity of better solving this challenging task. Focusing specically on the analysis of basketball oensive strategies, we introduce a systematic approach to establishing unsupervised modeling of group behaviors. In view that a possible group behavior (oensive strategy) could be of dierent duration and represented by dynamic player trajectories, the crux of our method is to automatically divide training data into meaningful clusters and learn their respective spatio-temporal model, which is established upon Gaussian mixture regression to account for intra-class spatio-temporal variations. The resulting strategy representation turns out to be exible that can be used to not only establish the discriminant functions but also improve learning the models. We demonstrate the usefulness of our approach by exploring its eectiveness in analyzing a set of given basketball video clips.
"System Combination for Machine Translation through Paraphrasing," Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), September 2015.
Authors: Wei-Yun Ma and Kathleen McKeown

In this paper, we propose a paraphrasing model to address the task of system combination for machine translation. We dynamically learn hierarchical paraphrases from target hypotheses without any syntactic annotations and form a synchronous context-free grammar to guide a series of transformations of target hypotheses into fused translations. The model is able to exploit phrasal and structural system-weighted consensus and also to utilize existing information about word ordering present in the target hypotheses. In addition, to consider a diverse set of plausible fused translations, we develop a hybrid combination architecture, where we paraphrase every target hypothesis to obtain a fused translation for each target, and then make the final selection among all fused translations. Our experimental results show that our approach can achieve a significant improvement over combination baselines.
Current Research Results
"Extractive broadcast news summarization leveraging recurrent neural network language modeling techniques," IEEE/ACM Transactions on Audio, Speech, and Language Processing, August 2015.
Authors: Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang, Ea-Ee Jan, Wen-Lian Hsu, Hsin-Hsi Chen

Extractive text or speech summarization manages to select a set of salient sentences from an original document and concatenate them to form a summary, enabling users to better browse through and understand the content of the document. A recent stream of research on extractive summarization is to employ the language modeling (LM) approach for important sentence selection, which has proven to be effective for performing speech summarization in an unsupervised fashion. However, one of the major challenges facing the LM approach is how to formulate the sentence models and accurately estimate their parameters for each sentence in the document to be summarized. In view of this, our work in this paper explores a novel use of recurrent neural network language modeling (RNNLM) framework for extractive broadcast news summarization. On top of such a framework, the deduced sentence models are able to render not only word usage cues but also long-span structural information of word co-occurrence relationships within broadcast news documents, getting around the need for the strict bag-of-words assumption. Furthermore, different model complexities and combinations are extensively analyzed and compared. Experimental results demonstrate the performance merits of our summarization methods when compared to several well-studied state-of-the-art unsupervised methods.
"Linguistic Template Extraction for Recognizing Reader-Emotion and Emotional Resonance Writing Assistance," The 53rd Annual Meeting of the Association for Computational Linguistics and the 7rd International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2015), July 2015.
Authors: Yung-Chun Chang, Cen-Chieh Chen, Yu-Lun Hsieh, Chien Chin Chen, Wen-Lian Hsu

In this paper, we propose a flexible principle-based approach (PBA) for reader-emotion classification and writing assistance. PBA is a highly automated process that learns emotion templates from raw texts to characterize an emotion and is comprehensible for humans. These templates are adopted to predict reader-emotion, and may further assist in emotional resonance writing. Results demonstrate that PBA can effectively detect reader-emotions by exploiting the syntactic structures and semantic associations in the context, thus outperforming wellknown statistical text classification methods and the state-of-the-art reader-emotion classification method. Moreover, writers are able to create more emotional resonance in articles under the assistance of the generated emotion templates. These templates have been proven to be highly interpretable, which is an attribute that is difficult to accomplish in traditional statistical methods.
Current Research Results
Authors: Kuo, C. Y., Chen, C. H., Chen, S. H., Lu, I. H., Lu, Huang, L. C., Lin, C. Y., Lin, Chen, C. Y., Lo, H. F., Jeng, S. T., Chen, L. F. O.

Agarwood, a heartwood derived from Aquilaria trees, is a valuable commodity that has seen prevalent use among many cultures. In particular, it is widely used in herbal medicine and many compounds in agarwood are known to exhibit medicinal properties. Although there exists much research into medicinal herbs and extraction of high value compounds, few have focused on increasing the quantity of target compounds through stimulation of its related pathways in this species.
In this study, we observed that cucurbitacin yield can be increased through the use of different light conditions to stimulate related pathways and conducted three types of high-throughput sequencing experiments in order to study the effect of light conditions on secondary metabolism in agarwood. We constructed genome-wide profiles of RNA expression, small RNA, and DNA methylation under red light and far-red light conditions. With these profiles, we identified a set of small RNA which potentially regulates gene expression via the RNA-directed DNA methylation pathway.
We demonstrate that light conditions can be used to stimulate pathways related to secondary metabolism, increasing the yield of cucurbitacins. The genome-wide expression and methylation profiles from our study provide insight into the effect of light on gene expression for secondary metabolism in agarwood and provide compelling new candidates towards the study of functional secondary metabolic components.
Reference website:
Current Research Results
"Court Reconstruction for Camera Calibration in Broadcast Basketball Videos," IEEE Transactions on Visualization and Computer Graphics, To Appear.
Authors: P. C. Wen, W. C. Cheng, Y. S. Wang, H. K. Khu, Nick C. Tang, and H. Y. Mark Liao

We introduce a technique of calibrating camera motions in basketball videos. Our method particularly transforms player positions to standard basketball court coordinates and enables applications such as tactical analysis and semantic basketball video retrieval. To achieve a robust calibration, we reconstruct the panoramic basketball court from a video, followed by warping the panoramic court to a standard one. As opposed to previous approaches, which individually detect the court lines and corners of each video frame, our technique considers all video frames simultaneously to achieve calibration; hence, it is robust to illumination changes and player occlusions. To demonstrate the feasibility of our technique, we present a stroke-based system that allows users to retrieve basketball videos. Our system tracks player trajectories from broadcast basketball videos. It then rectifies the trajectories to a standard basketball court by using our camera calibration method. Consequently, users can apply stroke queries to indicate how the players move in gameplay during retrieval. The main advantage of this interface is an explicit query of basketball videos so that unwanted outcomes can be prevented. We show the results in Figures 1, 7, 9, 10 and our accompanying video to exhibit the feasibility of our technique.
"On Relaxing Page Program Disturbance over 3D MLC Flash Memory," ACM/IEEE International Conference on Computer-Aided Design (ICCAD), November 2015.
Authors: Yu-Ming Chang, Yung-Chun Li, Yuan-Hao Chang, Tei-Wei Kuo, Chih-Chang Hsieh, and Hsiang-Pang Li

With the rapidly-increasing capacity demand over flash memory, 3D NAND flash memory has drawn tremendous attention as a promising solution to further reduce the bit cost and to increase the bit density. However, such advanced 3D devices will suffer more intensive program disturbance, compared to 2D NAND flash memory. Especially when multi-level-cell (MLC) technology is adopted, the deteriorated disturbance due to the program operations of intra and inter pages will become even more critical for reliability. In contrast to the past efforts that try to resolve the reliability issue with error correction codes or hardware designs, this work seeks for the redesign of the program operation. A disturb-aware programming scheme is proposed to not only relax the disturbance induced by slow cells as much as possible but also reduce the possibility in requiring a high voltage to program the slow cells. A series of experiments was conducted based on real 3D MLC flash chips, and the results demonstrate that the proposed scheme is extremely effective on reducing the disturbance as well as the bit error rate.
"How to Improve the Space Utilization of Dedup-based PCM Storage Devices," ACM/IEEE International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), October 2015.
Authors: Chun-Ta Lin, Yuan-Hao Chang, Tei-Wei Kuo, Hung-Sheng Chang, and Hsiang-Pang Li

There is a growing demand to introduce more and more intelligence to storage devices in recent years, especially with the rapid increasing of hardware computing power. This paper exploits essential design issues in space utilization for dedup-based non-volatile phase-change memory (PCM). We explore the adoption of data duplication techniques to reduce potential data duplicates over PCM storage devices to provide more storage space than the physical storage space does. Among various data deduplication techniques, variable-sized chunking is considered in less cost-effective PCM-based storage devices because variable-sized chunking has better data deduplication capability than fixed-sized chunking. However, in a typical system architecture, data are written or updated in the fixed management units (e.g., LBAs). Thus, to ultimately improve the space utilization of PCM-based storage device, the technical problem  falls on (1) how to map fixed-sized LBAs to variable-sized chunks and (2) how to efficiently manage (i.e., allocated and deallocate) free PCM storage space for variable-sized chunks. In this work, we propose a free space manager, called container-based space manager, to resolve the above two issues by exploiting the fact that (1) a storage system initially has more free space to relax the complexity on space management and (2) the space optimization of a storage system can grow with the time when it contains more and more data. The proposed design is evaluated over popular benchmarks, for which we have very encouraging results.
Current Research Results
"A Proximal Method for Dictionary Updating in Sparse Representations," IEEE Transactions on Signal Processing, August 2015.
Authors: Guan-Ju Peng and Wen-Liang Hwang

Guan-Ju PengWen-LiangHwangAbstract:
In this paper, we propose a new dictionary updating method for sparse dictionary learning. Our method imposes the $ell_0$ norm constraint on coefficients as well as a proximity regularization on the distance of dictionary modifications in the dictionary updating process. We show that the derived dictionary updating rule is a generalization of the K-SVD method. We study the convergence and the complexity of the proposed method. We also compare its performance with that of other methods.
"Predicting Winning Price in Real Time Bidding with Censored Data," 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2015), August 2015.
Authors: Wush Chi-Hsuan Wu, Mi-Yen Yeh, and Ming-Syang Chen

In the aspect of a Demand-Side Platform (DSP), which is the agent of advertisers, we study how to predict the winning price such that the DSP can win the bid by placing a proper bidding value in the real-time bidding (RTB) auction. We propose to leverage the machine learning and statistical methods to train the winning price model from the bidding history. A major challenge is that a DSP usually suers from the censoring of the winning price, especially for those lost bids in the past. To solve it, we utilize the censored regression model, which is widely used in the survival analysis and econometrics, to t the censored bidding data. Note, however, the assumption of censored regression does not hold on the real RTB data. As a result, we further propose a mixture model, which combines linear regression on bids with observable winning prices and censored regression on bids with the censored winning prices, weighted by the winning rate of the DSP. Experiment results show that the proposed mixture model in general prominently outperforms linear regression in terms of the prediction accuracy.
"Constant-Round Concurrent Zero-knowledge from Indistinguishability Obfuscation," The 35th International Cryptology Conference (CRYPTO 2015), 2015.
Authors: Kai-Min Chung and Huijia Lin and Rafael Pass

We present a constant-round concurrent zero-knowledge protocol for NP. Our protocol relies on the existence of families of collision-resistant hash functions, one-way permutations, and indistinguishability obfuscators for P poly (with slightly super-polynomial security).

"Energy Stealing - An Exploration into Unperceived Activities on Mobile Systems," ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), July 2015.
Authors: Chi-Hsuan Lin, Yu-Ming Chang, Pi-Cheng Hsiu, and Yuan-Hao Chang

Understanding the implications in smartphone usage and the power breakdown among hardware components has led to various energy-efficient designs for mobile systems. While energy consumption has been extensively explored, one critical dimension is often overlooked - unperceived activities that could steal a significant amount of energy behind users' back potentially. In this paper, we conduct the first exploration of unperceived activities in mobile systems. Specifically, we design a series of experiments to reveal, characterize, and analyze unperceived activities invoked by popular resident applications when an Android smartphone is left unused. We draw possible solutions inspired by the exploration and demonstrate that even an immediate remedy can mitigate energy dissipation to some extent.
"Large-Scale Secure Computation: Multi-party Computation for (Parallel) RAM Programs," The 35th International Cryptology Conference (CRYPTO 2015), 2015.
Authors: Elette Boyle and Kai-Min Chung and Rafael Pass

We present the first efficient (i.e., polylogarithmic overhead) method for securely and privately processing large data sets over multiple parties with parallel, distributed algorithms. More specifically, we demonstrate load-balanced, statistically secure computation protocols for computing Parallel RAM (PRAM) programs, handling (1/3−\epsilon) fraction malicious players, while preserving up to polylogarithmic factors the computation and memory complexities of the PRAM program, aside from a one-time execution of a broadcast protocol per party. Additionally, our protocol has polylog communication locality—that is, each of the n parties speaks only with polylog(n) other parties.
"Job Dispatching and Scheduling for Heterogeneous Clusters – a Case Study on the Billing Subsystem of XYZ Telecomunication," the 39th IEEE Annual International Computers, Software and Applications Conference (COMPSAC 2015), July 2015.
Authors: Ting-Chou Lin, Ching-Chi Lin, Ting-Weii Chang, Pangfeng Liu, Jan-Jan Wu, Chia-Chun Shih, Chao-Wen Huang

Ching-Chi LinJan-JanWuAbstract:
Many enterprises or institutes are building private clouds within their own data centers. Data centers may have different batches of physical machines due to annual upgrades, but the number of machines is fixed most of the time. Consequently it is crucial to schedule jobs with different resource requirements and characteristics to meet different job timing constraints, in such heterogeneous yet most of the time static environments. This paper describes a cloud resource management framework that dynamically allocates and reallocates computation resources for jobs that have different requirements, including deadline and priority. This framework makes decisions according to specified policies, and the framework provides four default policies for system administrators to choose to fit their specific needs. The framework is designed to be componentpluggable. The components of the framework can be hotswapped, i.e., replaced without shutting down the services. In addition, the framework can work as an individual cloud computing system, or as an extension of an existing cloud system. Our experiment results demonstrate that our system is capable of dynamically adjusting the resource allocation plan according to run-time statistics collected. The system also tolerates hardware failures, and will dynamically reallocate workers to compensate for the downtime in order to finish the jobs before deadline. Our experiments also suggest a trade-off between priority and deadline.
Current Research Results
Authors: Ke-Shiuan Lynn, Mei-Ling Cheng, Yet-Ran Chen, Chin Hsu, Ann Chen, T. Mamie Lih, Hui-Yin Chang, Ching-jang Huang, Ming-Shi Shiao, Wen-Harn Pan*, Ting-Yi Sung*, and Wen-Lian Hsu*

Ke-Shiuan LynnWen-LianHsuTing-YiSungAbstract:
Metabolite identification remains a bottleneck in mass spectrometry (MS)-based metabolomics. Currently, this process relies heavily on tandem mass spectrometry (MS/MS) spectra generated separately for peaks of interest identified from previous MS runs. Such a delayed and labor-intensive procedure creates a barrier to automation. Further, information embedded in MS data has not been used to its full extent for metabolite identification. Multimers, adducts, multiply charged ions, and fragments of given metabolites occupy a substantial proportion (40-80%) of the peaks of a quantitation result. However, extensive information on these derivatives, especially fragments, may facilitate metabolite identification. We propose a procedure with automation capability to group and annotate peaks associated with the same metabolite in the quantitation results of opposite modes and to integrate this information for metabolite identification. In addition to the conventional mass and isotope ratio matches, we would match annotated fragments with low-energy MS/MS spectra in public databases. For identification of metabolites without accessible MS/MS spectra, we have developed characteristic fragment and common substructure matches. The accuracy and effectiveness of the procedure were evaluated using one public and two in-house liquid chromatography-mass spectrometry (LC-MS) data sets. The procedure accurately identified 89% of 28 standard metabolites with derivative ions in the data sets. With respect to effectiveness, the procedure confidently identified the correct chemical formula of at least 42% of metabolites with derivative ions via MS/MS spectrum, characteristic fragment, and common substructure matches. The confidence level was determined according to the fulfilled identification criteria of various matches and relative retention time.
Current Research Results
"Robust Action Recognition via Borrowing Information Across Video Modalities," IEEE Transactions on Image Processing, February 2015.
Authors: Nick C. Tang, Yen-Yu Lin, Ju-Hsuan Hua, Shih-En Wei, Ming-Fang Weng, and Hong-Yuan Mark Liao

Yen-Yu LinMarkLiaoAbstract:
The recent advances in imaging devices have opened the opportunity of better solving the tasks of video content analysis and understanding. Next-generation cameras, such as the depth or binocular cameras, capture diverse information, and complement the conventional 2D RGB cameras. Thus, investigating the yielded multimodal videos generally facilitates the accomplishment of related applications. However, the limitations of the emerging cameras, such as short effective distances, expensive costs, or long response time, degrade their applicability, and currently make these devices not online accessible in practical use. In this paper, we provide an alternative scenario to address this problem, and illustrate it with the task of recognizing human actions. In particular, we aim at improving the accuracy of action recognition in RGB videos with the aid of one additional RGB-D camera. Since RGB-D cameras, such as Kinect, are typically not applicable in a surveillance system due to its short effective distance, we instead offline collect a database, in which not only the RGB videos but also the depth maps and the skeleton data of actions are available jointly. The proposed approach can adapt the interdatabase variations, and activate the borrowing of visual knowledge across different video modalities. Each action to be recognized in RGB representation is then augmented with the borrowed depth and skeleton features. Our approach is comprehensively evaluated on five benchmark data sets of action recognition. The promising results manifest that the borrowed information leads to remarkable boost in recognition accuracy.
Current Research Results
"Cross-Camera Knowledge Transfer for Multiview People Counting," IEEE Transactions on Image Processing, January 2015.
Authors: Nick C. Tang, Yen-Yu Lin, Ming-Fang Weng, and Hong-Yuan Mark Liao

Nick C. TangMarkLiaoAbstract:
We present a novel two-pass framework for counting the number of people in an environment, where multiple cameras provide different views of the subjects. By exploiting the complementary information captured by the cameras, we can transfer knowledge between the cameras to address the difficulties of people counting and improve the performance. The contribution of this paper is threefold. First, normalizing the perspective of visual features and estimating the size of a crowd are highly correlated tasks. Hence, we treat them as a joint learning problem. The derived counting model is scalable and it provides more accurate results than existing approaches. Second, we introduce an algorithm that matches groups of pedestrians in images captured by different cameras. The results provide a common domain for knowledge transfer, so we can work with multiple cameras without worrying about their differences. Third, the proposed counting system is comprised of a pair of collaborative regressors. The first one determines the people count based on features extracted from intracamera visual information, whereas the second calculates the residual by considering the conflicts between intercamera predictions. The two regressors are elegantly coupled and provide an accurate people counting system. The results of experiments in various settings show that, overall, our approach outperforms comparable baseline methods. The significant performance improvement demonstrates the effectiveness of our two-pass regression framework.
Current Research Results
"Per-Cluster Ensemble Kernel Learning for Multi-Modal Image Clustering With Group-Dependent Feature Selection," IEEE Transactions on Multimedia, December 2014.
Authors: Jeng-Tsung Tsai, Yen-Yu Lin, and Hong-Yuan Mark Liao

Yen-Yu LinMarkLiaoAbstract:
In this paper, we present a clustering approach, MK-SOM, that carries out cluster-dependent feature selection, and partitions images with multiple feature representations into clusters. This work is motivated by the observations that human visual systems (HVS) can receive various kinds of visual cues for interpreting the world. Images identified by HVS as the same category are typically coherent to each other in certain crucial visual cues, but the crucial cues vary from category to category. To account for this observation and bridge the semantic gap, the proposed MK-SOM integrates multiple kernel learning (MKL) into the training process of self-organizing map (SOM), and associates each cluster with a learnable, ensemble kernel. Hence, it can leverage information captured by various image descriptors, and discoveries the cluster-specific characteristics via learning the per-cluster ensemble kernels. Through the optimization iterations, cluster structures are gradually revealed via the features specified by the learned ensemble kernels, while the quality of these ensemble kernels is progressively improved owing to the coherent clusters by enforcing SOM. Besides, MK-SOM allows the introduction of side information to improve performance, and it hence provides a new perspective of applying MKL to address both unsupervised and semi-supervised clustering tasks. Our approach is comprehensively evaluated in the two applications. The superior and promising results manifest its effectiveness
Current Research Results
"Block-based Multi-version B+-Tree for Flash-based Embedded Database Systems," IEEE Transactions on Computers, April 2015.
Authors: Jian-Tao Wang, Kam-Yiu Lam, Yuan-Hao Chang, Jen-Wei Hsieh, and Po-Chun Huang

In this paper, we propose a novel multi-version B+-tree index structure, called block-based multi-version B+-tree (BbMVBT), for indexing multi-versions of data items in an embedded multi-version database (EMVDB) on flash memory. An EMVDB needs to support streams of update transactions and version-range queries to access different versions of data items maintained in the database. In BbMVBT, the index is divided into two levels. At the higher level, a multi-version index is maintained for keeping successive versions of each data item. These versions are allocated consecutively in a version block. At the lower level, a version array is used to search for a specific data version within a version block. With the reduced index structure of BbMVBT, the overhead for managing the index in processing update operations can be greatly reduced. At the same time, BbMVBT can also greatly reduce the number of accesses to the index in processing versionrange queries. To ensure sufficient free blocks for creating version blocks for efficient execution of BbMVBT, in this paper, we also discuss how to perform garbage collection using the purging-range queries for reclaiming “old” versions of data items and their associated entries in the index nodes. Analysis of the performance of BbMVBT is presented and verified with performance studies using both synthetic and real workloads. The performance results illustrate that BbMVBT can significantly improve the read and write performance to the multi-version index as compared with MVBT even though the sizes of the version blocks are not large. 
Current Research Results
"Modeling the Affective Content of Music with a Gaussian Mixture Model," IEEE Transactions on Affective Computing, March 2015.
Authors: Ju-Chiang Wang, Yi-Hsuan Yang, Hsin-Min Wang, and Shyh-Kang Jeng

Modeling the association between music and emotion has been considered important for music information retrieval and affective human computer interaction. This paper presents a novel generative model called acoustic emotion Gaussians (AEG) for computational modeling of emotion. Instead of assigning a music excerpt with a deterministic (hard) emotion label, AEG treats the affective content of music as a (soft) probability distribution in the valence-arousal space and parameterizes it with a Gaussian mixture model (GMM). In this way, the subjective nature of emotion perception is explicitly modeled. Specifically, AEG employs two GMMs to characterize the audio and emotion data. The fitting algorithm of the GMM parameters makes the model learning process transparent and interpretable. Based on AEG, a probabilistic graphical structure for predicting the emotion distribution from music audio data is also developed. A comprehensive performance study over two emotion-labeled datasets demonstrates that AEG offers new insights into the relationship between music and emotion (e.g., to assess the “affective diversity” of a corpus) and represents an effective means of emotion modeling. Readers can easily implement AEG via the publicly available codes. As the AEG model is generic, it holds the promise of analyzing any signal that carries affective or other highly subjective information.
Current Research Results
Authors: Jen-Chieh Lee,Yung-Ming Jeng, Sheng-Yao Su, Chen-Tu Wu, Keh-Sung Tsai, Cheng-Han Lee, Chung-Yen Lin, Jodi M. Carter, Jeng-Wen Huang, Shu-Hwa Chen, Shyang-Rong Shih, Adrián Mariño-Enríquez, Chih-Chih Chen, Andrew L. Folpe, Yih-Leong Chang and Cher-Wei Liang

Shu-Hwa ChenDanielChung-YenLinAbstract:
Phosphaturic mesenchymal tumours (PMTs) are uncommon soft tissue and bone tumours that typically cause hypophosphataemia and tumour-induced osteomalacia (TIO) through secretion of phosphatonins including fibroblast growth factor 23 (FGF23). PMT has recently been accepted by the World Health Organization as a formal tumour entity. The genetic basis and oncogenic pathways underlying its tumourigenesis remain obscure. In this study, we identified a novel FN1–FGFR1 fusion gene in three out of four PMTs by next-generation RNA sequencing. The fusion transcripts and proteins were subsequently confirmed with RT-PCR and western blotting. Fluorescence in situ hybridization analysis showed six cases with FN1–FGFR1 fusion out of an additional 11 PMTs. Overall, nine out of 15 PMTs (60%) harboured this fusion. The FN1 gene possibly provides its constitutively active promoter and the encoded protein's oligomerization domains to overexpress and facilitate the activation of the FGFR1 kinase domain. Interestingly, unlike the prototypical leukaemia-inducing FGFR1 fusion genes, which are ligand-independent, the FN1–FGFR1 chimeric protein was predicted to preserve its ligand-binding domains, suggesting an advantage of the presence of its ligands (such as FGF23 secreted at high levels by the tumour) in the activation of the chimeric receptor tyrosine kinase, thus effecting an autocrine or a paracrine mechanism of tumourigenesis.
Current Research Results
"Marching-based Wear Leveling for PCM-based Storage Systems," ACM Transactions on Design Automation of Electronic Systems, February 2015.
Authors: Hung-Sheng Chang, Yuan-Hao Chang, Pi-Cheng Hsiu, Tei-Wei Kuo, and Hsiang-Pang Li

Improving the performance of storage systems without losing the reliability and sanity/integrity of file systems is a major issue in storage system designs. In contrast to existing storage architectures, we consider a PCM-based storage architecture to enhance the reliability of storage systems. In PCM-based storage systems, the major challenge falls on how to prevent the frequently updated (meta)data from wearing out their residing PCM cells without excessively searching and moving metadata around the PCM space and without extensively updating the index structures of file systems. In this work, we propose an adaptive wearleveling mechanism to prevent any PCM cell from being worn out prematurely by selecting appropriate data for swapping with constant search/sort cost. Meanwhile, the concept of indirect pointers is designed in the proposed mechanism to swap data without any modification to the file system’s indexes. Experiments were conducted based on well-known benchmarks and realistic workloads to evaluate the effectiveness of the proposed design, for which the results are encouraging.
"Virtual Flash Chips: Rethinking the Layer Design of Flash Devices to Improve the Data Recoverability by Trading Potentially Massive Parallelism," ACM/IEEE Design Automation Conference (DAC), June 2015.
Authors: Ming-Chang Yang, Yuan-Hao Chang, and Tei-Wei Kuo

The market trend of flash memory chips has been going for high density but low reliability. The rapidly increasing bit error rates and emerging reliability issues of the coming triple-level cell (TLC) and even three-dimensional (3D) flash chips would let users take an extremely high risk to store data in such low reliability storage media. With the observations in mind, this paper rethinks the layer design of flash devices and propose a complete paradigm shift to re-configure physical flash chips of potentially massive parallelism into better ??Virtual chips?? in order to improve the data recoverability in a modular and low-cost way. The concept of virtual chips is realized at hardware abstraction layer (HAL) without continually complicating the conventional flash management software of flash translation layer (FTL). The capability and compatibility of the proposed design are then both verified by a series of experiments with encouraging results.
"Achieving SLC Performance with MLC Flash Memory," ACM/IEEE Design Automation Conference (DAC), June 2015.
Authors: Yu-Ming Chang, Yuan-Hao Chang, Tei-Wei Kuo, Yung-Chun Li, and Hsiang-Pang Li

Although the Multi-Level-Cell technique is widely adopted by flash-memory vendors to boost the chip density and to lower the cost, it results in serious performance and reliability problems. Different from the past work, a new cell programming method is proposed to not only significantly improve the chip performance but also reduce the potential bit error rate. In particular, a Single-Level-Cell-like programming style is proposed to better explore the threshold-voltage relationship to denote different Multi-Level-Cell bit information, which in turn drastically provides a larger window of threshold voltage similar to that found in Single-Level-Cell chips. It could result in less programming iterations and simultaneously a much less reliability problem in programming flash-memory cells. In the experiments, the new programming style could accelerate the programming speed up to 742% and even reduce the bit error rate up to 471% for Multi-Level-Cell pages.
"From Weak to Strong Zero-Knowledge and Applications," The 12th Theory of Cryptography Conference (TCC 2015), 2015.
Authors: Kai-Min Chung and Edward Lui and Rafael Pass

The notion of zero-knowledge is formalized by requiring that for every malicious efficient verifier V* simulator S that can reconstruct the view of V* the prover, in a way that is indistinguishable to every polynomial-time distinguisher. Weak zero-knowledge weakens this notions by switching the order of the quantifiers and only requires that for every distinguisher D, there exists a (potentially different) simulator SD
In this paper we consider various notions of zero-knowledge, and investigate whether their weak variants are equivalent to their strong variants. Although we show (under complexity assumption) that for the standard notion of zero-knowledge, its weak and strong counterparts are not equivalent, for meaningful variants of the standard notion, the weak and strong counterparts are indeed equivalent. Towards showing these equivalences, we introduce new non-black-box simulation techniques permitting us, for instance, to demonstrate that the classical 2-round graph non-isomorphism protocol of Goldreich-Micali-Wigderson satisfies a “distributional” variant of zero-knowledge.
Our equivalence theorem has other applications beyond the notion of zero-knowledge. For instance, it directly implies the dense model theorem of Reingold et al (STOC ’08), and the leakage lemma of Gentry-Wichs (STOC ’11), and provides a modular and arguably simpler proof of these results (while at the same time recasting these result in the language of zero-knowledge).


Academia Sinica Institue of Information Science Academia Sinica