近期研究成果 Current Research Results "Multi-Objective Optimization and Characterization of Pareto Points for Scalable Coding," IEEE Transactions on Circuits and Systems for Video Technology, July To Appear. Authors: Wen-Liang Hwang, Chia-Chen Lee, and Guan-Ju Peng Abstract: In this work, we formulated the optimal bit-allocation problem for a scalable codec for images/videos as a graph-based constrained vector-valued optimization problem with many optimal solutions, which are referred to as Pareto points. Pareto points are generally derived using weighted sum scalarization; however, it has yet to be determined whether all Pareto points can be derived using this approach. This paper addresses this issue. When presented as a theorem, our results indicate that as long as the rate-distortion function of each resolution is strictly decreasing and convex and the Pareto points form a continuous curve, then all Pareto points can be derived using scalarization. The theorem is verified using the state-of-the-art scalable coding method H.264/SVC and a scalability extension of HEVC (SHVC).We highlight a number of easily interpretable Pareto points that represent a good trade-off between candidate resolutions. The proximity point is defined as the Pareto point closest to the ideal performance for each resolution. We also model the Pareto points as a function of bit-rate and demonstrate that the Pareto points at other bit-rates can be predicted. Current Research Results "Hierarchical and programmable one-pot synthesis of oligosaccharides," Nature Communications, December 2018. Authors: Cheng-Wei Cheng, Yixuan Zhou, Wen-Harn Pan, Supriya Dey, Chung-Yi Wu, Wen-Lian Hsu and Chi-Huey Wong Abstract: The programmable one-pot oligosaccharide synthesis method was designed to enable the rapid synthesis of a large number of oligosaccharides, using the software Optimer to search Building BLocks (BBLs) with defined relative reactivity values (RRVs) to be used sequentially in the one-pot reaction. However, there were only about 50 BBLs with measured RRVs in the original library and the method could only synthesize small oligosaccharides due to the RRV ordering requirement. Here, we increase the library to include 154 validated BBLs and more than 50,000 virtual BBLs with predicted RRVs by machine learning. We also develop the software Auto-CHO to accommodate more data handling and support hierarchical one-pot synthesis using fragments as BBLs generated by the one-pot synthesis. This advanced programmable one-pot method provides potential synthetic solutions for complex glycans with four successful examples demonstrated in this work. Current Research Results "Processor-Tracing Guided Region Formation in Dynamic Binary Translation," ACM Transactions on Architecture and Code Optimization (TACO), November 2018. Authors: Ding-Yong Hong, Jan-Jan Wu, Yu-Ping Liu, Sheng-Yu Fu, Wei-Chung Hsu Abstract: Region formation is an important step in dynamic binary translation to select hot code regions for translation and optimization. The quality of the formed regions determines the extent of optimizations and thus determines the final execution performance. Moreover, the overall performance is very sensitive to the formation overhead, because region formation can have a non-trivial cost. For addressing the dual issues of region quality and region formation overhead, this article presents a lightweight region formation method guided by processor tracing, e.g., Intel PT. We leverage the branch history information stored in the processor to reconstruct the program execution profile and effectively form high-quality regions with low cost. Furthermore, we present the designs of lightweight hardware performance monitoring sampling and the branch instruction decode cache to minimize region formation overhead. Using ARM64 to x86-64 translations, the experiment results show that our method achieves a performance speedup of up to 1.53× (1.16× on average) for SPEC CPU2006 benchmarks with reference inputs, compared to the well-known software-based trace formation method, Next Executing Tail (NET). The performance results of x86-64 to ARM64 translations also show a speedup of up to 1.25× over NET for CINT2006 benchmarks with reference inputs. The comparison with a relaxed NETPlus region formation method further demonstrates that our method achieves the best performance and lowest compilation overhead. Current Research Results "Statistical Principle-based Approach for Gene and Protein Related Object Recognition," Journal of Cheminformatics, To Appear. Authors: Po-Ting Lai, Ming-Siang Huang, Ting-Hao Yang, Richard Tzong-Han Tsai, Wen-Lian Hsu Abstract: The large number of chemical and pharmaceutical patents has attracted researchers doing biomedical text mining to extract valuable information such as chemicals, genes and gene products. To facilitate gene and gene product annotations in patents, BioCreative V.5 organized a gene- and protein-related object (GPRO) recognition task, in which participants were assigned to identify GPRO mentions and determine whether they could be linked to their unique biological database records. In this paper, we describe the system constructed for this task. Our system is based on two different NER approaches: the statistical-principle-based approach (SPBA) and conditional random fields (CRF). Therefore, we call our system SPBA-CRF. SPBA is an interpretable machine-learning framework for gene mention recognition. The predictions of SPBA are used as features for our CRF-based GPRO recognizer. The recognizer was developed for identifying chemical mentions in patents, and we adapted it for GPRO recognition. In the BioCreative V.5 GPRO recognition task, SPBACRF obtained an F-score of 73.73% on the evaluation metric of GPRO type 1 and an F-score of 78.66% on the evaluation metric of combining GPRO types 1 and 2. Our results show that SPBA trained on an external NER dataset can perform reasonably well on the partial match evaluation metric. Furthermore, SPBA can significantly improve performance of the CRF-based recognizer trained on the GPRO dataset. Current Research Results "Unsupervised Meta-learning of Figure-Ground Segmentation via Imitating Visual Effects," Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), January 2019. Authors: Ding-Jie Chen, Jui-Ting Chien, Hwann-Tzong Chen, and Tyng-Luh Liu Abstract: This paper presents a `learning to learn' approach to figure-ground image segmentation. By exploring webly-abundant images of specific visual effects, our method can effectively learn the visual-effect internal representations in an unsupervised manner and uses this knowledge to differentiate the figure from the ground in an image. Specifically, we formulate the meta-learning process as a compositional image editing task that learns to imitate a certain visual effect and derive the corresponding internal representation. Such a generative process can help instantiate the underlying figure-ground notion and enables the system to accomplish the intended image segmentation. Whereas existing generative methods are mostly tailored to image synthesis or style transfer, our approach offers a flexible learning mechanism to model a general concept of figure-ground segmentation from unorganized images that have no explicit pixel-level annotations. We validate our approach via extensive experiments on six datasets to demonstrate that the proposed model can be end-to-end trained without ground-truth pixel labeling yet outperforms the existing methods of unsupervised segmentation tasks. Current Research Results "Versatile Communication Optimization for Deep Learning by Modularized Parameter Server," 2018 IEEE International Conference on Big Data, December 2018. Authors: Po-Yen Wu, Pangfeng Liu, and Jan-Jan Wu: Abstract: Deep learning has become one of the most promising approaches to solve the artificial intelligence problems. Training large-scale deep learning models efficiently is challenging. A widely used approach to accelerate the training process is by distributing the computation across multiple nodes with a centralized parameter server. To overcome the communication overhead caused by exchanging information between workers and the parameter server, three types of optimization methods are adopted – data placement, consistency control, and compression. In this paper, we proposed modularized parameter server, an architecture composed of key components that can be overridden without much effort. This allows developers to easily incorporate optimization techniques in the training process instead of using ad-hoc ways in existing systems. With this platform, the users can analyze different combinations of techniques and develop new optimization algorithms. The experiment results show that, compared with Google’s distributed Tensorflow, our distributed training system based on the proposed modularized parameter server can achieve near-linear speedup for computing and reduce half of the training time by combining multiple optimization techniques while maintaining the convergent accuracy. Current Research Results "A Progressive Performance Boosting Strategy for 3D Charge-trap NAND Flash," IEEE Transactions on Very Large Scale Integration Systems (TVLSI), November 2018. Authors: Shuo-Han Chen, Yen-Ting Chen, Yuan-Hao Chang, Hsin-Wen Wei, and Wei-Kuan Shih Abstract: The growing demands of large-capacity flash-based storages have facilitated the downscaling process of NAND flash memory. However, the downscaling of traditional planar floatinggate flash memory faces several challenges. Therefore, new NAND flash technologies have been explored to provide larger capacity with low cost. Among these new technologies, the 3-D charge-trap flash is regarded as one of the most promising candidates. The 3-D charge-trap flash is composed of several gate-stack layers and vertical cylindrical channels to provide high-density and low cell-to-cell interference. Owing to the cylindrical geometry of vertical channels, the access performance of each page in one block is distinctive, and this situation is exacerbated in the 3-D charge-trap flash with the fast-growing number of gate-stack layers. In this paper, a progressive performance boosting strategy is proposed to boost the performance of 3-D charge-trap flash by utilizing its asymmetric page access speed feature. A series of experiments was conducted to demonstrate the capability of the proposed strategy on improving the access performance of 3-D charge-trap flash. Current Research Results "Coherent Deep-Net Fusion To Classify Shots In Concert Videos," IEEE Transactions on Multimedia, November 2018. Authors: Jen-Chun Lin, Wen-Li Wei, Tyng-Luh Liu, Yi-Hsuan Yang, Hsin-Min Wang, Hsiao-Rong Tyan, and Hong-Yuan Mark Liao Abstract: Varying types of shots is a fundamental element in the language of film, commonly used by a visual storytelling director. The technique is often used in creating professional recordings of a live concert, but meanwhile may not be appropriately applied in audience recordings of the same event. Such variations could cause the task of classifying shots in concert videos, professional or amateur, very challenging. To achieve more reliable shot classification, we propose a novel probabilistic-based approach, named as coherent classification net (CC-Net), by addressing three crucial issues. First, we focus on learning more effective features by fusing the layer-wise outputs extracted from a deep convolutional neural network (CNN), pretrained on a large-scale data set for object recognition. Second, we introduce a frame-wise classification scheme, the error weighted deep cross-correlation model (EW-Deep-CCM), to boost the classification accuracy. Specifically, the deep neural network-based cross-correlation model (deep-CCM) is constructed to not only model the extracted feature hierarchies of CNN independently, but also relate the statistical dependencies of paired features from different layers. Then, a Bayesian error weighting scheme for a classifier combination is adopted to explore the contributions from individual Deep-CCM classifiers to enhance the accuracy of shot classification in each image frame. Third, we feed the frame-wise classification results to a linear-chain conditional random field module to refine the shot predictions by taking into account the global and temporal regularities. We provide extensive experimental results on a data set of live concert videos to demonstrate the advantage of the proposed CC-Net over existing popular fusion approaches for shot classification. Current Research Results "Play As You Like: Timbre-Enhanced Multi-modal Music Style Transfer," 33rd AAAI Conference on Artificial Intelligence (AAAI 2019), 2019. Authors: Chien-Yu Lu, Min-Xin Xue, Chia-Che Chang, Che-Rung Lee, and Li Su Abstract: Style transfer of polyphonic music recordings is a challenging task when considering the generation of diverse, imaginative, and reasonable music pieces in the style different from their original one. To achieve this, learning stable multi-modal representations for the domain-variant (i.e., style) and domain-invariant (i.e., content) information of music in an unsupervised manner is critical. In this paper, we propose an unsupervised music style transfer method without the need of parallel data. Besides, to characterize the multi-modal distribution of music pieces, we employ the Multi-modal Unsupervised Image-to-Image Translation (MUNIT) framework in the proposed system. This allows one to generate diverse outputs from learned latent distributions representing contents and styles. Moreover, to better capture the granularity of sound, such as the perceptual dimensions of timbre and the nuance in instrument-specific performance, cognitively plausible features including mel-frequency cepstral coefficients (MFCC), spectral difference, and spectral envelope, are combined with the widely-used mel-scale spectrogram into a timber-enhanced multi-channel input representation. The Relativistic average Generative Adversarial Networks (RaGAN) is also utilized to achieve fast convergence and high stability. We conduct experiments on bilateral style transfer tasks among three different genres, namely piano solo, guitar solo, and string quartet. Results demonstrate the advantages of the proposed method in music style transfer with improved sound quality and in allowing users to manipulate the output. Current Research Results "Scrubbing-aware Secure Deletion for 3D NAND Flash," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), November 2018. Authors: Wei-Chen Wang, Chien-Chung Ho, Yuan-Hao Chang, Tei-Wei Kuo, and Ping-Hsien Lin Abstract: Due to the increasing security concerns, the conventional deletion operations in NAND flash memory can no longer afford the requirement of secure deletion. Although existing works exploit secure deletion and scrubbing operations to achieve the security requirement, they also result in performance and disturbance problems. The predicament becomes more severe as the growing of page numbers caused by the aggressive use of 3-D NAND flash-memory chips which stack flash cells into multiple layers in a chip. Different from existing works, this paper aims at exploring a scrubbing-aware secure deletion design so as to improve the efficiency of secure deletion by exploiting properties of disturbance. The proposed design could minimize secure deletion/scrubbing overheads by organizing sensitive data to create the scrubbing-friendly patterns, and further choose a proper operation by the proposed evaluation equations for each secure deletion command. The capability of our proposed design is evaluated by a series of experiments, for which we have very encouraging results. In a 128 Gbits 3-D NAND flashmemory device, the simulation results show that the proposed design could achieve 82% average response time reduction of each secure deletion command. Current Research Results "Hot-Spot Suppression for Resource-Constrained Image Recognition Devices with Non-Volatile Memory," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), November 2018. Authors: Chun-Feng Wu, Ming-Chang Yang, Yuan-Hao Chang, and Tei-Wei Kuo Abstract: Resource-constrained devices with convolutional neural networks (CNNs) for image recognition are becoming popular in various Internet of Things and surveillance applications. They usually have a low-power CPU and limited CPU cache space. In such circumstances, nonvolatile memory (NVM) has great potential to replace DRAM as main memory to improve overall energy efficiency and provide larger mainmemory space. However, due to the iterative access pattern, performing CNN-based image recognition may introduce some write hot-spots on the NVM main memory. These write hot-spots may lead to reliability issues due to limited write endurance of NVM. In order to improve the endurance of NVM main memory, this paper leverages the CPU cache pinning technique and exploits the iterative access pattern of CNN to resolve the write hot-spot effect. In particular, we present a CNN-aware self-bouncing pinning strategy to minimize the maximal write cycles in NVM cells by proactively fastening CPU cache lines, so as to effectively suppress the write hot-spots to NVM main memory with limited performance degradation. The proposed strategy was evaluated by a series of intensive experiments and the results are encouraging. Current Research Results Authors: Tsung-Chieh Yao, Ren-Hua Chung, Chung-Yen Lin, et al., Abstract: Background: Total immunoglobulin E (IgE) is an intermediate phenotype and a potential therapeutic target for allergic diseases. Objective: We sought to identify single nucleotide polymorphisms (SNPs) associated with total IgE in an Asian pediatric population. Methods: We performed a genome-wide association study of total IgE in 397 schoolchildren from the Prediction of Allergies in Taiwanese CHildren (PATCH) schoolchildren cohort. Replication was conducted in three independent cohorts: 838 schoolchildren, 431 birth cohort samples and 1,120 Caucasian adults. Multimarker modeling was employed to determine a minimum set of SNPs capturing total IgE. In silico functional annotation, gene ontology, network and pathway analysis were performed to mine potential functional relevant genes. Results: We identified the association of rs660895 at 6p21.32 region with total IgE in schoolchildren (p-value =1.14x10-6); replicated the association in three independent samples; and provided supportive functional evidence of rs660895. Increasing total IgE levels was found among subjects carrying more numbers of risk alleles among 40 SNPs determined from multimarker modeling. Fourteen IgE related genes identified from gene-based analysis were suggested to be functional relevance to immunological diseases. Conclusion: This study identifies rs660895 in the human lymphocyte antigen, class II, DR beta 1 (HLA-DRB1) gene associated with total IgE in newborns, schoolchildren and adults. Our results from multimarker modeling implicate a set of 40 SNPs jointly capturing total IgE; and 14 identified genes with potential relevance to immunological diseases. This study demonstrates that integrative approaches may leverage the capacity in searching for susceptibility genes to total IgE and related allergic diseases. Current Research Results "Speed Reading: Learning to Read ForBackward via Shuttle," International Conference on EMNLP, October 2018. Authors: Tsu-Hui Fu, Wei-Yun Ma Abstract: We present LSTM-Shuttle, which applies human speed reading techniques to natural language processing tasks for accurate and efficient comprehension. In contrast to previous work, LSTM-Shuttle not only reads shuttling forward but also goes back. Shuttling forward enables high efficiency, and going backward gives the model a chance to recover lost information, ensuring better prediction. We evaluate LSTM-Shuttle on sentiment analysis, news classification, and cloze on IMDB, Rotten Tomatoes, AG, and Children’s Book Test datasets. We show that LSTM-Shuttle predicts both better and more quickly. To demonstrate how LSTM-Shuttle actually behaves, we also analyze the shuttling operation and present a case study.   Current Research Results "LPTK: A Linguistic Pattern-Aware Dependency Tree Kernel Approach for the BioCreative VI CHEMPROT Task," Database (Oxford), October 2018. Authors: Neha Warikoo, Yung-Chun Chang, and Wen-Lian Hsu Abstract: Identifying the interactions between chemical compounds and genes from biomedical literatures is one of the frequently discussed topics of text mining in the life science field. In this paper, we propose LPTK, a linguistic interaction pattern learning method used in the CHEMPROT task of BioCreative VI, to capture chemical-protein interaction (CPI) patterns from biomedical literatures. We also present a framework to integrate these linguistic patterns with smooth partial tree kernel (SPTK) to extract the CPIs. To evaluate our system, two associated identification datasets were used. Corresponding experiment results demonstrate that our method is effective and outperforms several compared systems. Current Research Results "Scrubbing-aware Secure Deletion for 3D NAND Flash," ACM/IEEE International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), September 2018. Authors: Wei-Chen Wang, Chien-Chung Ho, Yuan-Hao Chang, Tei-Wei Kuo, and Ping-Hsien Lin Abstract: Due to the increasing security concerns, the conventional deletion operations in NAND flash memory can no longer afford the requirement of secure deletion. Although existing works exploit secure deletion and scrubbing operations to achieve the security requirement, they also result in performance and disturbance problems. The predicament becomes more severe as the growing of page numbers caused by the aggressive use of 3D NAND flash-memory chips which stack flash cells into multiple layers in a chip. Different from existing works, this work aims at exploring a scrubbing-aware secure deletion design so as to improve the efficiency of secure deletion by exploiting properties of disturbance. The proposed design could minimize secure deletion/scrubbing overheads by organizing sensitive data to create the scrubbing-friendly patterns, and further choose a proper operation by the proposed evaluation equations for each secure deletion command. The capability of our proposed design is evaluated by a series of experiments, for which we have very encouraging results. In a 128 Gbits 3D NAND flash-memory device, the simulation results show that the proposed design could achieve 82% average response time reduction of each secure deletion command. Current Research Results "Hot-Spot Suppression for Resource-Constrained Image Recognition Devices with Non-Volatile Memory," ACM/IEEE International Conference on Embedded Software (EMSOFT), September 2018. Authors: Chun-Feng Wu, Ming-Chang Yang, Yuan-Hao Chang, and Tei-Wei Kuo Abstract: Resource-constrained devices with Convolutional Neural Networks (CNNs) for image recognition are becoming popular in various IoT and surveillance applications. They usually have a low-power CPU and limited CPU cache space. In such circumstances, Non-Volatile Memory (NVM) has great potential to replace DRAM as main memory to improve overall energy efficiency and provide larger main-memory space. However, due to the iterative access pattern, performing CNN-based image recognition may introduce some write hot-spots on the NVM main memory. These write hot-spots may lead to reliability issues due to limited write endurance of NVM. In order to improve the endurance of NVM main memory, this work leverages the CPU cache pinning technique and exploits the iterative access pattern of CNN to resolve the write hot-spot effect. In particular, we present a CNN-aware self-bouncing pinning strategy to minimize the maximal write cycles in NVM cells by proactively fastening CPU cache lines, so as to effectively suppress the write hot-spots to NVM main memory with limited performance degradation. The proposed strategy was evaluated by a series of intensive experiments and the results are encouraging. Current Research Results "CATANA: Comprehensive Alternative Transcript Atlas based oN Annotation," Bioinformatics, September 2018. Authors: Shiau, C.K., Huang, J.H. and Tsai, H.K.* Abstract: Summary: In higher eukaryotes, the generation of transcript isoforms from a single gene through alternative splicing (AS) and alternative transcription (AT) mechanisms increases functional and regulatory diversities. Annotating these alternative transcript events is essential for genomic studies. However, there are no existing tools that generate comprehensive annotations of all these alternative transcript events including both AS and AT events. In the present study, we develop CATANA, with the encoded exon usage patterns based on the flattened gene model, to identify ten types of AS and AT events. We demonstrate the power and versatility of CATANA by showing greater depth of annotations of alternative transcript events according to either genome annotation or RNA-seq data. Availability and Implementation: CATANA is available on https://github.com/shiauck/CATANA Current Research Results "Enhancing Flash Memory Reliability by Jointly Considering Write-back Pattern and Block Endurance," ACM Transactions on Design Automation of Electronic Systems (TODAES), August 2018. Authors: Tseng-Yi Chen, Yuan-Hao Chang, Yuan-Hung Kuan, Ming-Chang Yang, Yu-Ming Chang, and Pi-Cheng Hsiu Abstract: Owing to high cell density caused by the advanced manufacturing process, the reliability of flash drives turns out to be rather challenging in flash system designs. To enhance the reliability of flash drives, error-correcting code (ECC) has been widely utilized in flash drives to correct error bits during programming/reading data to/from flash drives. Although ECC can effectively enhance the reliability of flash drives by correcting error bits, the capability of ECC would degrade while the program/erase (P/E) cycles of flash blocks is increased. Finally, ECC could not correct a flash page, because a flash page contains too many error bits. As a result, reducing error bits is an effective solution to further improve the reliability of flash driveswhen a specific ECC is adopted in the flash drive. This work focuses on how to reduce the probability of producing error bits in a flash page. Thus, we propose a pattern-aware write strategy for flash reliability enhancement. The proposed write strategy considers both the P/E cycle of blocks and the pattern of written data while a flash block is allocated to store the written data. Since the proposed write strategy allocates young blocks (respectively, old blocks) for hot data (respectively, cold data) and flips the bit pattern of the written data to the appropriate bit pattern, the proposed strategy can effectively improve the reliability of flash drives. The experimental results show that the proposed strategy can reduce the number of error pages by up to 50%, compared with the well-known DFTL solution. Moreover, the proposed strategy is orthogonal with all ECC mechanisms so that the reliability of the flash drives with ECC mechanisms can be further improved by the proposed strategy. Current Research Results Authors: Shih-Wei Hu, Gang-Xuan Lin, Sung-Hsien Hsieh, and Chun-Shien Lu Abstract: In sparse signal recovery of compressive sensing, the phase transition determines the edge, which separates successful recovery and failed recovery. The phase transition can be seen as an indicator and an intuitive way to judge which recovery performance is better.   Traditionally, the Multiple Measurement Vectors (MMVs) problem is usually solved via $\ell_{2,1}$-norm minimization, which is our first investigation via conic geometry in this paper. Then, we are interested in the same problem but with two common constraints (or prior information): prior information relevant to the ground truth and the inherent low rank within the original signal. To figure out which constraint is most helpful, the MMVs problems are solved via $\ell_{2,1}$-$\ell_{2,1}$ minimization and $\ell_{2,1}$-low rank minimization, respectively. By theoretically presenting the necessary and sufficient condition of successful recovery from MMVs, we can have a precise prediction of phase transition to judge which constraint or prior information is better.   All our findings are verified via simulations and show that, under certain conditions, $\ell_{2,1}$-$\ell_{2,1}$ minimization outperforms $\ell_{2,1}$-low rank minimization. Surprisingly, $\ell_{2,1}$-low rank minimization performs even worse than $\ell_{2,1}$-norm minimization. To our knowledge, we are the first to study the MMVs problem under different prior information in the context of compressive sensing Current Research Results "Dynamic Tuning of Applications using Restricted Transactional Memory," ACM Research in Adaptive and Convergent Systems, October 2018. Authors: Shih-Kai Lin, Ding-Yong Hong, Sheng-Yu Fu, Jan-Jan Wu, Wei-Chung Hsu Abstract: Transactional Synchronization Extensions (TSX) support for hardware Transactional Memory (TM) on Intel 4th Core generation processors. Two programming interfaces, Hardware Lock Elision (HLE) and Restricted Transactional Memory (RTM), are rovided to support software development using TSX. HLE is easy to use and maintains backward compatible with processors without TSX support while RTM is more flexible and scalable. Previous researches have shown that critical sections protected by RTM with a welldesigned retry mechanism as its fallback code path can often achieve better performance than HLE. More parallel programs may be programmed in HLE, however, using RTM may obtain greater performance. To embrace both productivity and high performance of parallel program with TSX, we present a framework built on QEMU that can dynamically transform HLE instructions in an application binary to fragments of RTM codes with adaptive tuning on the fly. Compared to HLE execution, our prototype achieves 1.15x speedup with 4 threads and 1.56x speedup with 8 threads on average. Due to the scalability of RTM, the speedup will be more significant as the number of threads increases. Current Research Results "Newsfeed Filtering and Dissemination for Behavioral Therapy on Social Network Addictions," ACM International Conference on Information and Knowledge Management (ACM CIKM), October 2018. Authors: H.-H. Shuai, Y.-C. Lien, D.-N. Yang, Y.-F. Lan, W.-C. Lee, and P. S. Yu Abstract: While the popularity of online social network (OSN) apps continues to grow, little attention has been drawn to the increasing cases of Social Network Addictions (SNAs). In this paper, we argue that by mining OSN data in support of online intervention treatment, data scientists may assist mental healthcare professionals to alleviate the symptoms of users with SNA in early stages. Our idea, based on behavioral therapy, is to incrementally substitute highly addictive newsfeeds with safer, less addictive, and more supportive newsfeeds. To realize this idea, we propose a novel framework, called Newsfeed Substituting and Supporting System (N3S), for newsfeed filtering and dissemination in support of SNA interventions. New research challenges arise in 1) measuring the addictive degree of a newsfeed to an SNA patient, and 2) properly substituting addictive newsfeeds with safe ones based on psychological theories. To address these issues, we first propose the Additive Degree Model (ADM) to measure the addictive degrees of newsfeeds to different users. We then formulate a new optimization problem aiming to maximize the efficacy of behavioral therapy without sacrificing user preferences. Accordingly, we design a randomized algorithm with a theoretical bound. A user study with 716 Facebook users and 11 mental healthcare professionals around the world manifests that the addictive scores can be reduced by more than 30%. Moreover, experiments show that the correlation between the SNA scores and the addictive degrees quantified by the proposed model is much greater than that of state-of-the-art preference based models. Current Research Results "SeeTheVoice; Learning from Music to Visual Storytelling of Shots," IEEE International Conference on Multimedia and Expo (ICME 2018), July 2018. Authors: Wen-Li Wei, Jen-Chun Lin, Tyng-Luh Liu, Yi-Hsuan Yang, Hsin-Min Wang, Hsiao-Rong Tyan, and Hong-Yuan Mark Liao Abstract: Types of shots in the language of film are considered the key elements used by a director for visual storytelling. In filming a musical performance, manipulating shots could stimulate desired effects such as manifesting the emotion or deepening the atmosphere. However, while the visual storytelling technique is often employed in creating professional recordings of a live concert, audience recordings of the same event often lack such sophisticated manipulations. Thus it would be useful to have a versatile system that can perform video mashup to create a refined video from such amateur clips. To this end, we propose to translate the music into a nearprofessional shot (type) sequence by learning the relation between music and visual storytelling of shots. The resulting shot sequence can then be used to better portray the visual storytelling of a song and guide the concert video mashup process. Our method introduces a novel probabilistic-based fusion approach, named as multi-resolution fused recurrent neural networks (MF-RNNs) with film-language, which integrates multi-resolution fused RNNs and a film-language model for boosting the translation performance. The results from objective and subjective experiments demonstrate that MF-RNNs with film-language can generate an appealing shot sequence with better viewing experience. I Current Research Results "A Collaborative CPU-GPU Approach for Principal Component Analysis on Mobile Heterogeneous Platform," Journal of Parallel and Distributed Computing (JPDC), October 2018. Authors: Olivier Valery, Pangfeng Liu, Jan-Jan Wu Abstract: The advent of the modern GPU architecture has enabled computers to use General Purpose GPU capabilities (GPGPU) to tackle large scale problem at a low computational cost. This technological innovation is also available on mobile devices, addressing one of the primary problems with recent devices: the power envelope. Unfortunately, recent mobile GPUs suffer from a lack of accuracy that can prevent them from running any large scale data analysis tasks, such as principal component analysis (Shlens, 0000) (PCA). The goal of our work is to address this limitation by combining the high precision available on a CPU with the power efficiency of a mobile GPU. In this paper, we exploit the shared memory architecture of mobile devices in order to enhance the CPU–GPU collaboration and speed up PCA computation without sacrificing precision. Experimental results suggest that such an approach drastically reduces the power consumption of the mobile device while accelerating the overall workload. More generally, we claim that this approach can be extended to accelerate other vectorized computations on mobile devices while still maintaining numerical accuracy. Current Research Results "An Erase Efficiency Boosting Strategy for 3D Charge Trap NAND Flash," IEEE Transactions on Computers (TC), September 2018. Authors: Shuo-Han Chen, Yuan-Hao Chang, Yu-Pei Liang, Hsin-Wen Wei, and Wei-Kuan Shih Abstract: Owing to the fast-growing demands of larger and faster NAND flash devices, new manufacturing techniques have accelerated the down-scaling process of NAND flash memory. Among these new techniques, 3D charge trap flash is considered to be one of the most promising candidates for the next-generation NAND flash devices. However, the long erase latency of 3D charge trap flash becomes a critical issue. This issue is exacerbated because the distinct transient voltage shift phenomenon is worsened when the number of program/erase cycle increases. In contrast to existing works that aim to tackle the erase latency issue by reducing the number of block erases, we tackle this issue by utilizing the “multi-block erase” feature. In this work, an erase efficiency boosting strategy is proposed to boost the garbage collection efficiency of 3D charge trap flash via enabling multi-block erase inside flash chips. A series of experiments was conducted to demonstrate the capability of the proposed strategy on improving the erase efficiency and access performance of 3D charge trap flash. The results show that the erase latency of 3D charge trap flash memory is improved by 75.76 percent on average even when the P/E cycle reaches 10^4. Current Research Results "Evaluating the possibility of detecting variants in shotgun proteomics via LTE-fusion analysis pipeline," Journal of Proteome Research, September 2018. Authors: Tung-Shing Mamie Lih, Wai-Kok Choong, Yu-Ju Chen, Ting-Yi Sung Abstract: In proteogenomic studies, many genome-annotated events, for example, single amino acid variation (SAAV) and short INDEL, are often unobserved in shotgun proteomics. Therefore, we propose an analysis pipeline called LeTE-fusion (Le, peptide length; T, theoretical values; E, experimental data) to first investigate whether peptides with certain lengths are observed more often in mass spectrometry (MS)-based proteomics, which may hinder peptide identification causing difficulty in detecting genome-annotated events. By applying LeTE-fusion on different MS-based proteome data sets, we found peptides within 7–20 amino acids are more frequently identified, possibly attributed to MS-related factors instead of proteases. We then further extended the usage of LeTE-fusion on four variant-containing-sequence data sets (SAAV-only) with various sample complexity up to the whole human proteome scale, which yields theoretically ∼70% variants observable in an ideal shotgun proteomics. However, only ∼40% of variants might be detectable in real shotgun proteomic experiments when LeTE-fusion utilizes the experimentally observed variant-site-containing wild-type peptides in PeptideAtlas to estimate the expected observable coverage of variants. Finally, we conducted a case study on HEK293 cell line with variants reported at genomic level that were also identified in shotgun proteomics to demonstrate the efficacy of LeTE-fusion on estimating expected observable coverage of variants. To the best of our knowledge, this is the first study to systematically investigate the detection limits of genome-annotated events via shotgun proteomics using such analysis pipeline. Current Research Results "Unifying and Merging Well-trained Deep Neural Networks for Inference Stage," International Joint Conference on Artificial Intelligence, IJCAI 2018, July 2018. Authors: Yi-Min Chou, Yi-Ming Chan, Jia-Hong Lee, Chih-Yi Chiu, Chu-Song Chen Abstract: We propose a novel method to merge convolutional neural-nets for the inference stage. Given two well-trained networks that may have different architec-tures that handle different tasks, our method aligns the layers of the original networks and merges them into a uniﬁed model by sharing the representative codes of weights. The shared weights are further re-trained to ﬁne-tune the performance of the merged model. The proposed method effectively produces a compact model that may run original tasks simultaneously on resource-limited devices. As it preserves the general architectures and leverages the co-used weights of well-trained networks, a substantial training overhead can be reduced to shorten the system development time. Experimental results demonstrate a satisfactory performance and validate the effectiveness of the method. Current Research Results "SLC-Like Programming Scheme for MLC Flash Memory," ACM Transactions on Storage (TOS), March 2018. Authors: Chien-Chung Ho, Yu-Ming Chang, Yuan-Hao Chang, and Tei-Wei Kuo Abstract: Although the multilevel cell (MLC) technique is widely adopted by flash-memory vendors to boost the chip density and lower the cost, it results in serious performance and reliability problems. Different from past work, a new cell programming method is proposed to not only significantly improve chip performance but also reduce the potential bit error rate. In particular, a single-level cell (SLC)-like programming scheme is proposed to better explore the threshold-voltage relationship to denote different MLC bit information, which in turn drastically provides a larger window of threshold voltage similar to that found in SLC chips. It could result in less programming iterations and simultaneously a much less reliability problem in programming flash-memory cells. In the experiments, the new programming scheme could accelerate the programming speed up to 742% and even reduce the bit error rate up to 471% for MLC pages. Current Research Results "Boosting NVDIMM Performance with a Light-Weight Caching Algorithm," IEEE Transactions on Very Large Scale Integration Systems (TVLSI), August 2018. Authors: Che-Wei Tsao, Yuan-Hao Chang, and Tei-Wei Kuo Abstract: In the big data era, data-intensive applications have growing demand for the capacity of DRAM main memory, but the frequent DRAM refresh, high leakage power, and high unit cost bring serious design issues on scaling up DRAM capacity. To address this issue, a nonvolatile dual inline memory module (NVDIMM), which is a hybrid memory module, becomes a possible alternative to replace the DRAM as main memory in some data-intensive applications. The NVDIMM that consists of a small-sized high-speed DRAM and a large-sized low-cost nonvolatile memory (i.e., flash memory) has the serious performance issue on accessing data stored in the flash memory because of the huge performance gap between the DRAM and the flash memory. However, there is limited room to adopt a complex caching algorithm for using the DRAM as the cache of flash memory in the NVDIMM main memory, because a complex caching algorithm itself would already cause too much performance degradation on handling each request to access the NVDIMM main memory. In this paper, we present a lightweight caching algorithm to boost NVDIMM performance by minimizing the cache management overhead and reducing the frequencies to access flash memory. A series of experiments was conducted based on popular benchmarks, and the results demonstrate that the proposed algorithm can effectively improve the performance of the NVDIMM main memory.