Current Research Results
Authors: Ching-Tai Chen, Chu-Ling Ko, Wai-Kok Choong, Jen-Hung Wang, Wen-Lian Hsu, and Ting-Yi Sung

Abstract:
Protein and peptide identification and quantitation are essential tasks in proteomics research and involve a  series of steps in analyzing mass spectrometry data. Trans-Proteomic Pipeline (TPP) provides a wide range of useful tools through its web interfaces for analyses such as sequence  database search, statistical validation, and quantitation. To utilize the powerful functionality of TPP without the need for manual intervention to launch each step, we developed a software tool, called WinProphet, to create and automatically  execute a pipeline for proteomic analyses. It seamlessly integrates with TPP and other external command-line programs, supporting various functionalities, including database search for protein and peptide identification, spectral library construction and search, data-independent acquisition (DIA) data analysis, and isobaric labeling and label-free quantitation. WinProphet is a standalone, installation-free tool with graphical interfaces for users to configure, manage, and automatically execute pipelines. The constructed pipelines can be exported as XML files with all of the parameter settings for reusability and portability. The executable files, user manual, and sample data sets of WinProphet are freely available at http://ms.iis.sinica.edu.tw/COmics/Software_WinProphet.html.
"Identifying expressive semantics in orchestral conducting kinematics," International Society of Music Information Retrieval Conference (ISMIR), November 2019.
Authors: Yu-Fen Huang, Tsung-Ping Chen, Nikki Moran, Simon Coleman, and Li Su

Abstract:
Existing kinematic research on orchestral conducting movement contributes to beat-tracking and the delivery of performance dynamics. Methodologically, such movement cues have been treated as distinct, isolated events. Yet as practicing musicians and music pedagogues know, conductors’ expressive instructions are highly flexible and dependent on the musical context. We seek to demonstrate an approach to search for effective descriptors to express musical features in conducting movement in a valid music context, and to extract complex expressive semantics from elementary conducting kinematic variations. This study therefore proposes a multi-task learning model to jointly identify dynamic, articulation, and phrasing cues from conducting kinematics. A professional conducting movement dataset is compiled using a high-resolution motion capture system. The ReliefF algorithm is applied to select significant features from conducting movement, and recurrent neural network (RNN) is implemented to identify multiple movement cues. The experimental results disclose key elements in conducting movement which communicate musical expressiveness; the results also highlight the advantage of multi-task learning in the complete musical context over single-task learning. To the best of our knowledge, this is the first attempt to use recurrent neural network to explore multiple semantic expressive cuing in conducting movement kinematics.
"Tell Me Where It is Still Blurry: Adversarial Blurred Region Mining and Refining," 27th ACM Multimedia Conference (long paper), October 2019.
Authors: Jen-Chun Lin, Wen-Li Wei, Tyng-Luh Liu, C.-C. Jay Kuo, and H. Y. Mark Liao

Abstract:
Mobile devices such as smartphones are ubiquitously being used to take photo sand videos,thus increasing the importance of image deblurring.This study introduces a novel deep learning approach that can automatically and progressively achieve the task via adversarial blurred region mining and refining (adversarialBRMR). Starting with a collaborative mechanism of two coupled conditional generative adversarial networks(CGANs),our method first learns the image-scaleCGAN,denoted as iGAN, to globally generate adeblurred image and locally uncoverits still blurred regions through an adversarial mining process. Then, we construct the patch-scale CGAN, denoted as pGAN, to further improve sharpness of the most blurred region in each iteration.  Owing to such complementary designs, the adversarial BRMR indeed functions as a bridge between iGAN and pGAN,and yields the performance synergy in better solving blind image deblurring.The overall formulation is self-explanatory and effective to globally and locally restore an underlying sharp image.Experimental results on benchmark datasets demonstrate that the proposed method outperforms the current state-of-the-art technique for blind image deblurring both quantitatively and qualitatively.
"See-through-Text Grouping for Referring Image Segmentation," 2019 International Conference on Computer Vision(ICCV), October 2019.
Authors: Ding-Jie Chen, Songhao Jia, Yi-Chen Lo, Hwann-Tzong Chen, and Tyng-Luh Liu

Abstract:
Motivated by the conventional grouping techniques to image segmentation, we develop their DNN counterpart to tackle the referring variant. The proposed method is driven by a convolutional-recurrent neural network (ConvRNN) that iteratively carries out top-down processing of bottom-up segmentation cues. Given a natural language referring expression, our method learns to predict its relevance to each pixel and derives a See-through-Text Embedding Pixelwise (STEP) heatmap, which reveals segmentation cues of pixel level via the learned visual-textual co-embedding. The ConvRNN performs a top-down approximation by converting the STEP heatmap into a refined one, whereas the improvement is expected from training the network with a classification loss from the ground truth. With the refined heatmap, we update the textual representation of the referring expression by re-evaluating its attention distribution and then compute a new STEP heatmap as the next input to the ConvRNN. Boosting by such collaborative learning, the framework can progressively and simultaneously yield the desired referring segmentation and reasonable attention distribution over the referring sentence. Our method is general and does not rely on, say, the outcomes of object detection from other DNN models, while achieving state-of-the-art performance in all of the four datasets in the experiments.
"Achieving Lossless Accuracy with Lossy Programming for Efficient Neural-Network Training on NVM-Based Systems," ACM/IEEE International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), October 2019.
Authors: Wei-Chen Wang, Yuan-Hao Chang, Tei-Wei Kuo, Chien-Chung Ho, Yu-Ming Chang, and Hung-Sheng Chang

Abstract:
Neural networks over conventional computing platforms are heavily restricted by the data volume and performance concerns. While non-volatile memory offers potential solutions to data volume issues, challenges must be faced over performance issues, especially with asymmetric read and write performance. Beside that, critical concerns over endurance must also be resolved before non-volatile memory could be used in reality for neural networks. This work addresses the performance and endurance concerns altogether by proposing a data-aware programming scheme. We propose to consider neural network training jointly with respect to the data-flow and data-content points of view. In particular, methodologies with approximate results over Dual-SET operations were presented. Encouraging results were observed through a series of experiments, where great efficiency and lifetime enhancement is seen without sacrificing the result accuracy.
"Enabling Sequential-write-constrained B+-tree Index Scheme to Upgrade Shingled Magnetic Recording Storage Performance," ACM/IEEE International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), October 2019.
Authors: Yu-Pei Liang, Tseng-Yi Chen, Yuan-Hao Chang, Shuo-Han Chen, Kam-Yiu Lam, Wei-Hsin Li, and Wei-Kuan Shih

Abstract:
When a shingle magnetic recording (SMR) drive has been widely applied to modern computer systems (e.g., archive file systems, big data computing systems, and large-scale database systems), storage system developers should thoroughly review whether current designs (e.g., index schemes and data placements) are appropriate for an SMR drive because of its sequential write constraint. Through many prior works excellently manage data in an SMR drive by integrating their proposed solutions into the driver layer, an index scheme over an SMR drive has never been optimized by any previous works because managing index over the SMR drive needs to jointly consider the properties of B$^+$-tree and SMR natures (e.g., sequential write constraint and zone partitions) in a host storage system. Moreover, poor index management will result in terrible storage performance because an index manager is extensively used in file systems and database applications. For optimizing the B$^+$-tree index structure over an SMR storage, this work identifies performance overheads caused by the B$^+$-tree index structure in an SMR drive. By such observation, this study proposes a sequential-write-constrained B$^+$-tree index scheme, namely SW-B$^+$tree, which consists of an address redirection data structure, an SMR-aware node allocation mechanism, and a frequency-aware garbage collection strategy. According to our experiments, the SW-B$^+$tree can improve the SMR storage performance 55% on average.
"Exploiting Vector Processing in Dynamic Binary Translation," the International Conference on Parallel Processing (ICPP), August 2019.
Authors: Chih-Min Lin, Sheng-Yu Fu, Ding-Yong Hong, Yu-Ping Liu, Jan-Jan Wu, Wei-Chung Hsu

Abstract:
Auto vectorization techniques have been adopted by compilers to exploit data-level parallelism in parallel processing for decades. However, since processor architectures have kept enhancing with new features to improve vector/SIMD performance, legacy application binaries failed to fully exploit new vector/SIMD capabilities in modern architectures. For example, legacy ARMv7 binaries cannot benefit from ARMv8 SIMD double precision capability, and legacy x86 binaries cannot enjoy the power of AVX-512 extensions. In this paper, we study the fundamental issues involved in cross-ISA Dynamic Binary Translation (DBT) to convert non-vectorized loops to vector/SIMD forms to achieve greater computation throughput available in newer processor architectures. The key idea is to recover critical loop information from those application binaries in order to carry out vectorization at runtime. Experiment results show that our approach achieves an average speedup of 1.42x compared to ARMv7 native run across various benchmarks in an ARMv7-to-ARMv8 dynamic binary translation system.
Current Research Results
Authors: Sheng-Yao Su, I-Hsuan Lu, Wen-Chih Cheng, Wei-Chun Chung, Pao-Yang Chen, Jan-Ming Ho, Shu-Hwa Chen, Chung-Yen Lin

Abstract:
DNA methylation is a crucial epigenomic mechanism in the biological system. Using whole genome bisulfite sequencing (WGBS) technology, the methylation status of cytosine sites can be revealed. However, performing WGBS data analysis is often complicated and challenging. To alleviate such difficulties, we integrated the WGBS data processing and downstream analysis into a two-phase approach, namely, EpiMOLAS. First, we packed a Docker container DocMethyl to deal with raw data processing, mapping, and methylation calling/ scoring to give the summary, mtable, of the whole genome methylation status by the gene. Next, mtables are uploaded to the web server EpiMOLAS_web for linking with gene annotation databases that enable rapid data retrieval and analyses. This two-phase combination of DocMethyl and EpiMOLAS_web solve methylome data analysis from raw reads processing to downstream analysis.
"Rethinking Last-level-cache Write-back Strategy for MLC STT-RAM Main Memory with Asymmetric Write Energy," ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), July 2019.
Authors: Yu-Pei Liang, Tseng-Yi Chen, Yuan-Hao Chang, Shuo-Han Chen, Pei-Yu Chen, and Wei-Kuan Shih

Abstract:
To meet the requirement of low-power consumption, multi-level-cell STT-RAM (MLC STT-RAM) has been widely regarded as a potential candidate for replacing DRAM-based main memory in the next generation computer architectures because of its high memory cell density, fast read/write performance and zero refresh power consumption. However, MLC STT-RAM has higher power consumption than DRAM while a write operation is performed because MLC STT-RAM sometimes needs to perform a two-step transition to change the originally stored bits to another specifically written bit patterns. As a result, MLC STT-RAM has different power consumption while different bit patterns are written to a memory cell. To the best of our knowledge, a few or none of the previous studies rethink a cache replacement policy to overcome the asymmetric write energy issue of MLC STT-RAM-based main memory. Thus, this study proposes an energy-aware cache replacement policy, namely E-cache, which considers asymmetric write-back power consumption on MLC STT-RAM-based main memory to evict a proper cached data from the last-level cache, so as to minimize system power consumption. The experimental results show that the proposed solution reduces the energy consumption by 36\\% on average, compared with the LRU.
"The Best of Both Worlds: On Exploiting Bit-Alterable NAND Flash for Lifetime and Read Performance Optimization," ACM/IEEE Design Automation Conference (DAC), June 2019.
Authors: Shuo-Han Chen, Ming-Chang Yang, and Yuan-Hao Chang

Abstract:
With the emergence of bit-alterable 3D NAND flash, programming and erasing a flash cell at bit-level granularity have become a reality. Bit-level operations can benefit the high density, high bit-error-rate 3D NAND flash via realizing the bit-level rewrite operation,'' which can refresh error bits at bit-level granularity for reducing the error correction latency and improving the read performance with minimal lifetime expense. Different from existing refresh techniques, bit-level operations can lower the lifetime expense via removing error bits directly without page-based rewrites. However, since bit-level rewrites may induce a similar amount of latency as conventional page-based rewrites and thus lead to low rewrite throughput, the efficiency of bit-level rewrites should be carefully considered. Such observation motivates us to propose a bit-level error removal (BER) scheme to derive the most-efficient way of utilizing the bit-level operations for both lifetime and read performance optimization. % without exaggerating the uneven wear level issue. A series of experiments was conducted to demonstrate the capability of the BER scheme with encouraging results.
"Enabling File-Oriented Fast Secure Deletion on Shingled Magnetic Recording Drives," ACM/IEEE Design Automation Conference (DAC), June 2019.
Authors: Shuo-Han Chen, Ming-Chang Yang, Yuan-Hao Chang, and Chun-Feng Wu

Abstract:
Existing secure deletion approaches are inefficient in erasing data permanently because file systems have no knowledge of the data layout on the storage device, nor is the storage device aware of file information within the file systems. This inefficiency is exaggerated on the emerging shingled magnetic recording (SMR) drive due to its inherent sequential-write constraint. On SMR drives, secure deletion requests may lead to serious write amplification and performance degradation if the data layout is not properly configured. Such observation motivates us to propose a file-oriented fast secure deletion (FFSD) strategy to alleviate the negative impacts of SMR drives' sequential-write constraint and improve the efficiency of secure deletion operations on SMR drives. A series of experiments were conducted to demonstrate the capability of the proposed strategy on improving the efficiency of secure deletion on SMR drives.
Current Research Results
"Enhancing Transactional Memory Execution via Dynamic Binary Translation," ACM Applied Computing Review (ACR), April 2019.
Authors: Ding-Yong Hong, Shih-Kai Lin, Sheng-Yu Fu, Jan-Jan Wu, Wei-Chung Hsu

Abstract:
Transactional Synchronization Extensions (TSX) have been introduced for hardware transactional memory since the 4th generation Intel Core processors. TSX provides two software programming interfaces: Hardware Lock Elision (HLE) and Restricted Transactional Memory (RTM). HLE is easy to use and maintains backward compatibility with processors without TSX support, while RTM is more flexible and scalable. Previous researches have shown that critical sections protected by RTM with a well-designed retry mechanism as its fallback code path can often achieve better performance than HLE. More parallel programs may be programmed in HLE, however, using RTM may obtain greater performance. To embrace both productivity and high performance of parallel programs with TSX, we present a framework built on QEMU that can dynamically transform HLE instructions in an application binary to fragments of RTM codes with adaptive tuning on the fly. Compared to HLE execution, our prototype achieves 1.56 x speedup with 8 threads on average. Due to the scalability of RTM, the speedup will be more significant as the number of threads increases.
Current Research Results
"Frame-based Sparse Analysis and Synthesis Signal Representations and Parseval K-SVD," IEEE Transactions on Signal Processing, To Appear.
Authors: Wen-Liang Hwang, Ping-Tzan Huang, Bo-Chen Kung, Jinn Ho, and Tai-Lang Jong

Abstract:
Frames are the foundation of the linear operators used in the decomposition and reconstruction of signals, such as the discrete Fourier transform, Gabor, wavelets, and curvelet transforms. The emergence of sparse representation models has shifted the emphasis in frame theory toward sparse-norm minimization problems. In this paper, we apply frame theory to the sparse representation of signals, wherein a frame is used for synthesis dictionary and a dual frame is used for analysis dictionary. Our objective was to formulate a novel dual frame design in which the sparse vector obtained through the decomposition of any signal is also the sparse solution representing signals based on a reconstruction frame. Our findings demonstrate that this type of dual frame cannot be constructed for over-complete frames, thereby precluding the use of any linear analysis operator to derive the sparse synthesis coefficient for signal representation. We also developed a novel dictionary learning algorithm (called Parseval K-SVD) to learn a tight-frame dictionary with normalized atoms. We then dealt with the problem of signal representation using frames from the analysis and synthesis perspective to derive optimization formulations for problems pertaining to image recovery. Our preliminary results demonstrate that the images recovered using this approach are correlated with the frame bounds of dictionaries, thereby demonstrating the importance of using different dictionaries for different applications.
Current Research Results
Authors: Li-An Yang, Yu-Jung Chang, Shu-Hwa Chen, Chung-Yen Lin and Jan-Ming Ho

Abstract:

We present SQUAT, an efficient tool for both pre-assembly and post-assembly quality assessment of de novo genome assemblies. The pre-assembly module of SQUAT computes quality statistics of reads and presents the analysis in a well-designed interface to visualize the distribution of high- and poor-quality reads in a portable HTML report. The post-assembly module of SQUAT provides read mapping analytics in an HTML format. We categorized reads into several groups including uniquely mapped reads, multiply mapped, unmapped reads; for uniquely mapped reads, we further categorized them into perfectly matched, with substitutions, containing clips, and the others. We carefully defined the poorly mapped (PM) reads into several groups to prevent the underestimation of unmapped reads; indeed, a high PM% would be a sign of a poor assembly that requires researchers’ attention for further examination or improvements before using the assembly. Finally, we evaluate SQUAT with six datasets, including the genome assemblies for eel, worm, mushroom, and three bacteria. The results show that SQUAT reports provide useful information with details for assessing the quality of assemblies and reads.

### Availability

The SQUAT software with links to both its docker image and the on-line manual is freely available at https://github.com/luke831215/SQUAT

Current Research Results
"Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary Translation," ACM Transactions on Architecture and Code Optimization (TACO), February 2019.
Authors: Yu-Ping Liu, Ding-Yong Hong, Jan-Jan Wu, Sheng-Yu Fu, Wei-Chung Hsu

Abstract:
Single instruction multiple data (SIMD) has been adopted for decades because of its superior performance and power efficiency. The SIMD capability (i.e., width, number of registers, and advanced instructions) has diverged rapidly on different SIMD instruction-set architectures (ISAs). Therefore, migrating existing applications to another host ISA that has fewer but longer SIMD registers and more advanced instructions raises the issues of asymmetric SIMD capability. To date, this issue has been overlooked and the host SIMD capability is underutilized, resulting in suboptimal performance. In this article, we present a novel binary translation technique called spill-aware superword level parallelism (saSLP), which combines short ARMv8 instructions and registers in the guest binaries to exploit the x86 AVX2 host’s parallelism, register capacity, and gather instructions. Our experiment results show that saSLP improves the performance by 1.6× (2.3×) across a number of benchmarks and reduces spilling by 97% (99%) for ARMv8 to x86 AVX2 (AVX-512) translation. Furthermore, with AVX2 (AVX-512) gather instructions, saSLP speeds up several data-irregular applications that cannot be vectorized on ARMv8 NEON by up to 3.9× (4.2×).
Current Research Results
"On Improving the Write Responsiveness for Host-Aware SMR Drives," IEEE Transactions on Computers (TC), January 2019.
Authors: Ming-Chang Yang, Yuan-Hao Chang, Fenggang Wu, Tei-Wei Kuo, and David Hung-Chang Du

Abstract:
This paper presents a Virtual Persistent Cache design to remedy the long latency behavior and to ultimately improve the write responsiveness of the Host-Aware Shingled Magnetic Recording (HA-SMR) drives. Our design keeps the cost-effective model of the existing HA-SMR drives, but at the same time asks the great help from the host system for adaptively providing some computing and management resources to improve the drive performance when needed. The technical contribution is to trick the HA-SMR drives by smartly reshaping the access patterns to HA-SMR drives, so as to avoid the occurrences of long latencies in most cases and thus to ultimately improve the drive performance and responsiveness. We conduct experiments on real Seagate 8 TB HA-SMR drives to demonstrate the advantages of Virtual Persistent Cache over the real workloads from Microsoft Research Cambridge. The results show that the proposed design can remedy most of the long latencies and improve the drive performance by at least 58.11 percent, under the evaluated workloads.
Current Research Results
"Co-optimizing Storage Space Utilization and Performance for Key-Value Solid State Drives," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), January 2019.
Authors: Yen-Ting Chen, Ming-Chang Yang, Yuan-Hao Chang, Tseng-Yi Chen, Hsin-Wen Wei, and Wei-Kuan Shih

Abstract:
Growing demand for key-value store applications is building a strong momentum for the commercialization of key-value hard disk drives. To achieve better performance, flash-based solid state drive is the next ideal candidate for commercialization in the foreseeable future. However, the existing fixed-sized management strategies of flash-based devices would potentially result in low storage space utilization when managing variable-sized key-value data. In addition, the low storage space utilization would further lead to the degradation of device performance, due to low invalid data space reclamation efficiency. The space utilization issue motivates this paper to propose a key-value flash translation layer design to improve storage space utilization as well as the performance of the key-value solid state drives. A series of experiments was conducted to evaluate the proposed design, and the experiment results of space utilization and device performance are very encouraging.
"Enabling Transitivity for Lexical Inference on Chinese Verbs Using Probabilistic Soft Logic," Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP 2017), December 2017.
Authors: Wei-Chung Wang and Lun-Wei Ku

Abstract:
To learn more knowledge, enabling transitivity is a vital step for lexical inference. However, most of the lexical inference models with good performance are for nouns or noun phrases, which cannot be directly applied to the inference on events or states. In this paper, we construct the largest Chinese verb lexical inference dataset containing 18,029 verb pairs, where for each pair one of four inference relations are annotated. We further build a probabilistic soft logic (PSL) model to infer verb lexicons using the logic language. With PSL, we easily enable transitivity in two layers, the observed layer and the feature layer, which are included in the knowledge base. We further discuss the effect of transitives within and between these layers. Results show the performance of the proposed PSL model can be improved at least 3.5% (relative) when the transitivity is enabled. Furthermore, experiments show that enabling transitivity in the observed layer benefits the most.