Institute of Information Science
"Dynamic Tuning of Applications using Restricted Transactional Memory," ACM Research in Adaptive and Convergent Systems, October 2018.
Authors: Shih-Kai Lin, Ding-Yong Hong, Sheng-Yu Fu, Jan-Jan Wu, Wei-Chung Hsu

Transactional Synchronization Extensions (TSX) support for hardware Transactional Memory (TM) on Intel 4th Core generation processors. Two programming interfaces, Hardware Lock Elision (HLE) and Restricted Transactional Memory (RTM), are rovided to support software development using TSX. HLE is easy to use and maintains backward compatible with processors without TSX support while RTM is more flexible and scalable. Previous researches have shown that critical sections protected by RTM with a welldesigned retry mechanism as its fallback code path can often achieve better performance than HLE. More parallel programs may be programmed in HLE, however, using RTM may obtain greater performance. To embrace both productivity and high performance of parallel program with TSX, we present a framework built on QEMU that can dynamically transform HLE instructions in an application binary to fragments of RTM codes with adaptive tuning on the fly. Compared to HLE execution, our prototype achieves 1.15x speedup with 4 threads and 1.56x speedup with 8 threads on average. Due to the scalability of RTM, the speedup will be more significant as the number of threads increases.
"Newsfeed Filtering and Dissemination for Behavioral Therapy on Social Network Addictions," ACM International Conference on Information and Knowledge Management (ACM CIKM), October 2018.
Authors: H.-H. Shuai, Y.-C. Lien, D.-N. Yang, Y.-F. Lan, W.-C. Lee, and P. S. Yu

While the popularity of online social network (OSN) apps continues to grow, little attention has been drawn to the increasing cases of Social Network Addictions (SNAs). In this paper, we argue that by mining OSN data in support of online intervention treatment, data scientists may assist mental healthcare professionals to alleviate the symptoms of users with SNA in early stages. Our idea, based on behavioral therapy, is to incrementally substitute highly addictive newsfeeds with safer, less addictive, and more supportive newsfeeds. To realize this idea, we propose a novel framework, called Newsfeed Substituting and Supporting System (N3S), for newsfeed filtering and dissemination in support of SNA interventions. New research challenges arise in 1) measuring the addictive degree of a newsfeed to an SNA patient, and 2) properly substituting addictive newsfeeds with safe ones based on psychological theories. To address these issues, we first propose the Additive Degree Model (ADM) to measure the addictive degrees of newsfeeds to different users. We then formulate a new optimization problem aiming to maximize the efficacy of behavioral therapy without sacrificing user preferences. Accordingly, we design a randomized algorithm with a theoretical bound. A user study with 716 Facebook users and 11 mental healthcare professionals around the world manifests that the addictive scores can be reduced by more than 30%. Moreover, experiments show that the correlation between the SNA scores and the addictive degrees quantified by the proposed model is much greater than that of state-of-the-art preference based models.
"SeeTheVoice; Learning from Music to Visual Storytelling of Shots," IEEE International Conference on Multimedia and Expo (ICME 2018), July 2018.
Authors: Wen-Li Wei, Jen-Chun Lin, Tyng-Luh Liu, Yi-Hsuan Yang, Hsin-Min Wang, Hsiao-Rong Tyan, and Hong-Yuan Mark Liao

Types of shots in the language of film are considered the key elements used by a director for visual storytelling. In filming a musical performance, manipulating shots could stimulate desired effects such as manifesting the emotion or deepening the atmosphere. However, while the visual storytelling technique is often employed in creating professional recordings of a live concert, audience recordings of the same event often lack such sophisticated manipulations. Thus it would be useful to have a versatile system that can perform video mashup to create a refined video from such amateur clips. To this end, we propose to translate the music into a nearprofessional shot (type) sequence by learning the relation between music and visual storytelling of shots. The resulting shot sequence can then be used to better portray the visual storytelling of a song and guide the concert video mashup process. Our method introduces a novel probabilistic-based fusion approach, named as multi-resolution fused recurrent neural networks (MF-RNNs) with film-language, which integrates multi-resolution fused RNNs and a film-language model for boosting the translation performance. The results from objective and subjective experiments demonstrate that MF-RNNs with film-language can generate an appealing shot sequence with better viewing experience. I
Current Research Results
"A Collaborative CPU-GPU Approach for Principal Component Analysis on Mobile Heterogeneous Platform," Journal of Parallel and Distributed Computing (JPDC), October 2018.
Authors: Olivier Valery, Pangfeng Liu, Jan-Jan Wu

The advent of the modern GPU architecture has enabled computers to use General Purpose GPU capabilities (GPGPU) to tackle large scale problem at a low computational cost. This technological innovation is also available on mobile devices, addressing one of the primary problems with recent devices: the power envelope. Unfortunately, recent mobile GPUs suffer from a lack of accuracy that can prevent them from running any large scale data analysis tasks, such as principal component analysis (Shlens, 0000) (PCA). The goal of our work is to address this limitation by combining the high precision available on a CPU with the power efficiency of a mobile GPU. In this paper, we exploit the shared memory architecture of mobile devices in order to enhance the CPU–GPU collaboration and speed up PCA computation without sacrificing precision. Experimental results suggest that such an approach drastically reduces the power consumption of the mobile device while accelerating the overall workload. More generally, we claim that this approach can be extended to accelerate other vectorized computations on mobile devices while still maintaining numerical accuracy.
Current Research Results
"An Erase Efficiency Boosting Strategy for 3D Charge Trap NAND Flash," IEEE Transactions on Computers (TC), September 2018.
Authors: Shuo-Han Chen, Yuan-Hao Chang, Yu-Pei Liang, Hsin-Wen Wei, and Wei-Kuan Shih

Owing to the fast-growing demands of larger and faster NAND flash devices, new manufacturing techniques have accelerated the down-scaling process of NAND flash memory. Among these new techniques, 3D charge trap flash is considered to be one of the most promising candidates for the next-generation NAND flash devices. However, the long erase latency of 3D charge trap flash becomes a critical issue. This issue is exacerbated because the distinct transient voltage shift phenomenon is worsened when the number of program/erase cycle increases. In contrast to existing works that aim to tackle the erase latency issue by reducing the number of block erases, we tackle this issue by utilizing the “multi-block erase” feature. In this work, an erase efficiency boosting strategy is proposed to boost the garbage collection efficiency of 3D charge trap flash via enabling multi-block erase inside flash chips. A series of experiments was conducted to demonstrate the capability of the proposed strategy on improving the erase efficiency and access performance of 3D charge trap flash. The results show that the erase latency of 3D charge trap flash memory is improved by 75.76 percent on average even when the P/E cycle reaches 10^4.
Current Research Results
Authors: Tung-Shing Mamie Lih, Wai-Kok Choong, Yu-Ju Chen, Ting-Yi Sung

In proteogenomic studies, many genome-annotated events, for example, single amino acid variation (SAAV) and short INDEL, are often unobserved in shotgun proteomics. Therefore, we propose an analysis pipeline called LeTE-fusion (Le, peptide length; T, theoretical values; E, experimental data) to first investigate whether peptides with certain lengths are observed more often in mass spectrometry (MS)-based proteomics, which may hinder peptide identification causing difficulty in detecting genome-annotated events. By applying LeTE-fusion on different MS-based proteome data sets, we found peptides within 7–20 amino acids are more frequently identified, possibly attributed to MS-related factors instead of proteases. We then further extended the usage of LeTE-fusion on four variant-containing-sequence data sets (SAAV-only) with various sample complexity up to the whole human proteome scale, which yields theoretically ∼70% variants observable in an ideal shotgun proteomics. However, only ∼40% of variants might be detectable in real shotgun proteomic experiments when LeTE-fusion utilizes the experimentally observed variant-site-containing wild-type peptides in PeptideAtlas to estimate the expected observable coverage of variants. Finally, we conducted a case study on HEK293 cell line with variants reported at genomic level that were also identified in shotgun proteomics to demonstrate the efficacy of LeTE-fusion on estimating expected observable coverage of variants. To the best of our knowledge, this is the first study to systematically investigate the detection limits of genome-annotated events via shotgun proteomics using such analysis pipeline.
"Unifying and Merging Well-trained Deep Neural Networks for Inference Stage," International Joint Conference on Artificial Intelligence, IJCAI 2018, July 2018.
Authors: Yi-Min Chou, Yi-Ming Chan, Jia-Hong Lee, Chih-Yi Chiu, Chu-Song Chen

We propose a novel method to merge convolutional neural-nets for the inference stage. Given two well-trained networks that may have different architec-tures that handle different tasks, our method aligns the layers of the original networks and merges them into a unified model by sharing the representative codes of weights. The shared weights are further re-trained to fine-tune the performance of the merged model. The proposed method effectively produces a compact model that may run original tasks simultaneously on resource-limited devices. As it preserves the general architectures and leverages the co-used weights of well-trained networks, a substantial training overhead can be reduced to shorten the system development time. Experimental results demonstrate a satisfactory performance and validate the effectiveness of the method.
Current Research Results
"SLC-Like Programming Scheme for MLC Flash Memory," ACM Transactions on Storage (TOS), March 2018.
Authors: Chien-Chung Ho, Yu-Ming Chang, Yuan-Hao Chang, and Tei-Wei Kuo

Although the multilevel cell (MLC) technique is widely adopted by flash-memory vendors to boost the chip density and lower the cost, it results in serious performance and reliability problems. Different from past work, a new cell programming method is proposed to not only significantly improve chip performance but also reduce the potential bit error rate. In particular, a single-level cell (SLC)-like programming scheme is proposed to better explore the threshold-voltage relationship to denote different MLC bit information, which in turn drastically provides a larger window of threshold voltage similar to that found in SLC chips. It could result in less programming iterations and simultaneously a much less reliability problem in programming flash-memory cells. In the experiments, the new programming scheme could accelerate the programming speed up to 742% and even reduce the bit error rate up to 471% for MLC pages.
"Scrubbing-aware Secure Deletion for 3D NAND Flash," ACM/IEEE International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), September 2018.
Authors: Wei-Chen Wang, Chien-Chung Ho, Yuan-Hao Chang, Tei-Wei Kuo, and Ping-Hsien Lin

Due to the increasing security concerns, the conventional deletion operations in NAND flash memory can no longer afford the requirement of secure deletion. Although existing works exploit secure deletion and scrubbing operations to achieve the security requirement, they also result in performance and disturbance problems. The predicament becomes more severe as the growing of page numbers caused by the aggressive use of 3D NAND flash-memory chips which stack flash cells into multiple layers in a chip. Different from existing works, this work aims at exploring a scrubbing-aware secure deletion design so as to improve the efficiency of secure deletion by exploiting properties of disturbance. The proposed design could minimize secure deletion/scrubbing overheads by organizing sensitive data to create the scrubbing-friendly patterns, and further choose a proper operation by the proposed evaluation equations for each secure deletion command. The capability of our proposed design is evaluated by a series of experiments, for which we have very encouraging results. In a 128 Gbits 3D NAND flash-memory device, the simulation results show that the proposed design could achieve 82% average response time reduction of each secure deletion command.
"Hot-Spot Suppression for Resource-Constrained Image Recognition Devices with Non-Volatile Memory," ACM/IEEE International Conference on Embedded Software (EMSOFT), September 2018.
Authors: Chun-Feng Wu, Ming-Chang Yang, Yuan-Hao Chang, and Tei-Wei Kuo

Resource-constrained devices with Convolutional Neural Networks (CNNs) for image recognition are becoming popular in various IoT and surveillance applications. They usually have a low-power CPU and limited CPU cache space. In such circumstances, Non-Volatile Memory (NVM) has great potential to replace DRAM as main memory to improve overall energy efficiency and provide larger main-memory space. However, due to the iterative access pattern, performing CNN-based image recognition may introduce some write hot-spots on the NVM main memory. These write hot-spots may lead to reliability issues due to limited write endurance of NVM. In order to improve the endurance of NVM main memory, this work leverages the CPU cache pinning technique and exploits the iterative access pattern of CNN to resolve the write hot-spot effect. In particular, we present a CNN-aware self-bouncing pinning strategy to minimize the maximal write cycles in NVM cells by proactively fastening CPU cache lines, so as to effectively suppress the write hot-spots to NVM main memory with limited performance degradation. The proposed strategy was evaluated by a series of intensive experiments and the results are encouraging.
Current Research Results
"Boosting NVDIMM Performance with a Light-Weight Caching Algorithm," IEEE Transactions on Very Large Scale Integration Systems (TVLSI), August 2018.
Authors: Che-Wei Tsao, Yuan-Hao Chang, and Tei-Wei Kuo

In the big data era, data-intensive applications have growing demand for the capacity of DRAM main memory, but the frequent DRAM refresh, high leakage power, and high unit cost bring serious design issues on scaling up DRAM capacity. To address this issue, a nonvolatile dual inline memory module (NVDIMM), which is a hybrid memory module, becomes a possible alternative to replace the DRAM as main memory in some data-intensive applications. The NVDIMM that consists of a small-sized high-speed DRAM and a large-sized low-cost nonvolatile memory (i.e., flash memory) has the serious performance issue on accessing data stored in the flash memory because of the huge performance gap between the DRAM and the flash memory. However, there is limited room to adopt a complex caching algorithm for using the DRAM as the cache of flash memory in the NVDIMM main memory, because a complex caching algorithm itself would already cause too much performance degradation on handling each request to access the NVDIMM main memory. In this paper, we present a lightweight caching algorithm to boost NVDIMM performance by minimizing the cache management overhead and reducing the frequencies to access flash memory. A series of experiments was conducted based on popular benchmarks, and the results demonstrate that the proposed algorithm can effectively improve the performance of the NVDIMM main memory.
Current Research Results
"Coherent Deep-Net Fusion To Classify Shots In Concert Videos," IEEE Transactions on Multimedia, To Appear.
Authors: Jen-Chun Lin, Wen-Li Wei, Tyng-Luh Liu, Yi-Hsuan Yang, Hsin-Min Wang, Hsiao-Rong Tyan, and Hong-Yuan Mark Liao

Varying types of shots is a fundamental element in the language of film, commonly used by a visual storytelling director. The technique is often used in creating professional recordings of a live concert, but meanwhile may not be appropriately applied in audience recordings of the same event. Such variations could cause the task of classifying shots in concert videos, professional or amateur, very challenging. To achieve more reliable shot classification, we propose a novel probabilistic-based approach, named as Coherent Classification Net (CC-Net), by addressing three crucial issues. First, We focus on learning more effective features by fusing the layer-wise outputs extracted from a deep convolutional neural network (CNN), pre-trained on a large-scale dataset for object recognition. Second, we introduce a frame-wise classification scheme, the error weighted deep cross-correlation model (EW-Deep-CCM), to boost the classification accuracy. Specifically, the deep neural network-based cross-correlation model (Deep-CCM) is constructed to not only model the extracted feature hierarchies of CNN independently but also relate the statistical dependencies of paired features from different layers. Then, a Bayesian error weighting scheme for classifier combination is adopted to explore the contributions from individual Deep-CCM classifiers to enhance the accuracy of shot classification in each image frame. Third, we feed the frame-wise classification results to a linear-chain conditional random field (CRF) module to refine the shot predictions by taking account of the global and temporal regularities. We provide extensive experimental results on a dataset of live concert videos to demonstrate the advantage of the proposed CC-Net over existing popular fusion approaches for shot classification.
"Achieving Fast Sanitization with Zero Live Data Copy for MLC Flash Memory," ACM/IEEE International Conference on Computer-Aided Design (ICCAD), November 2018.
Authors: Ping-Hsien Lin, Yu-Ming Chang, Yung-Chun Li, Wei-Chen Wang, Chien-Chung Ho, and Yuan-Hao Chang

As data security has become the major concern in modern storage systems with low-cost multi-level-cell (MLC) flash memories, it is not trivial to realize data sanitization in such a system. Even though some existing works employ the encryption or the built-in erase to achieve this requirement, they still suffer the risk of being deciphered or the issue of performance degradation. In contrast to the existing work, a fast sanitization scheme is proposed to provide the highest degree of security for data sanitization; that is, every old version of data could be immediately sanitized with zero live-data copy overhead once the new version of data is created/written. In particular, this scheme further considers the reliability issue of MLC flash memories; the proposed scheme includes a one-shot sanitization design to minimize the disturbance during data sanitization. The feasibility and the capability of the proposed scheme were evaluated through extensive experiments based on real flash chips. The results demonstrate that this scheme can achieve the data sanitization with zero live-data-copy, where performance overhead is less than 1%.
"Learning Domain-adaptive Latent Representations of Music Signals Using Variational Autoencoders," International Society of Music Information Retrieval Conference (ISMIR), September 2018.
Authors: Yin-Jyun Luo and Li Su

In this paper, we tackle the problem of domain-adaptive representation learning for music processing. Domain adaptation is an approach aiming to eliminate the distributional discrepancy of the modeling data, so as to transfer learnable knowledge from one domain to another. With its great success in the fields of computer vision and natural language processing, domain adaptation also shows great potential in music processing, for music is essentially a highly-structured semantic system having domain-dependent information. Our proposed model contains a Variational Autoencoder (VAE) that encodes the training data into a latent space, and the resulting latent representations along with its model parameters are then reused to regularize the representation learning of the downstream task where the data are in the other domain. The experiments on cross-domain music alignment, namely an audio-to-MIDI alignment, and a monophonic-to-polyphonic music alignment of singing voice show that the learned representations lead to better higher alignment accuracy than that using conventional features. Furthermore, a preliminary experiment on singing voice source separation, by regarding the mixture and the voice as two distinct domains, also demonstrates the capability to solve music processing problems from the perspective of domain-adaptive representation learning.
"Functional Harmony Recognition with Multi-task Recurrent Neural Networks," International Society of Music Information Retrieval Conference (ISMIR), September 2018.
Authors: Tsung-Ping Chen and Li Su

Previous works on chord recognition mainly focus on chord symbols but overlook other essential features that matter in musical harmony. To tackle the functional harmony recognition problem, we compile a new professionally annotated dataset of symbolic music encompassing not only chord symbols, but also various interrelated chord functions such as key modulation, chord inversion, secondary chords, and chord quality. We further present a novel holistic system in functional harmony recognition; a multi-task learning (MTL) architecture is implemented with the recurrent neural network (RNN) to jointly model chord functions in an end-to-end scenario. Experimental results highlight the capability of the proposed recognition system, and a promising improvement of the system by employing multi-task learning instead of single-task learning. This is one attempt to challenge the end-to-end chord recognition task from the perspective of functional harmony so as to uncover the grand structure ruling the flow of musical sound. The dataset and the source code of the proposed system is announced at .
Current Research Results
"Monaural source separation using Ramanujan subspace dictionaries," IEEE Sig. Proc. Lett. (SPL), August 2018.
Authors: Hsueh-Wei Liao and Li Su

Most source separation algorithms are implemented as spectrogram decomposition. In contrast, time-domain source separation is less investigated, since there is a lack of an efficient signal representation that facilitates decomposing oscillatory components of a signal directly in the time domain. In this paper, we utilize the Ramanujan subspace (RS) and the nested periodic subspace (NPS) to address this issue, by constructing a parametric dictionary which emphasizes period information with less redundancy. Methods including iterative subspace projection and convolutional sparse coding (CSC) can decompose a mixture into signals with distinct oscillation periods according to the dictionary. Experiments on score-informed source separation show that the proposed method is competitive to the state-of -the-art, frequency-domain approaches when the provided pitch information and the signal parameters are the same.
Current Research Results
"wrJFS: A Write-Reduction Journaling File System for Byte-addressable NVRAM," IEEE Transactions on Computers (TC), July 2018.
Authors: Tseng-Yi Chen, Yuan-Hao Chang, Shuo-Han Chen, Chih-Ching Kuo, Ming-Chang Yang, Hsin-Wen Wei, and Wei-Kuan Shih

Non-volatile random-access memory (NVRAM) becomes a mainstream storage device in embedded systems due to its favorable features, such as small size, low power consumption and short read/write latency. Unlike dynamic random access memory (DRAM), the most NVRAM has asymmetric performance and energy consumption on read/write operations. Generally, on NVRAM, a write operation consumes more energy and time than a read operation. Unfortunately, current mobile/embedded file systems, such as EXT2/3 and EXT4, are very unfriendly for NVRAM devices. The reason is that in order to increase the reliability of file systems, current mobile/embedded file systems employ a journaling mechanism. Although a journaling mechanism raises the safety of data in a file system, it also writes the same data twice during data commitment and checkpoint. Though several related works have been proposed to reduce the size of a write operation, they still cannot effectively minimize the write amplification of a journaling mechanism. Such observations motivate this paper to design a 2-phase write reduction journaling file system called wrJFS. In the first phase, wrJFS classified data into two categories: metadata and user data. Because the size of metadata is usually very small (few bytes), the metadata will be handled by partial byte-enabled journaling strategy. In contrast, the size of user data is very large relative to metadata; thus, user data will be processed in the second phase. In the second phase, user data will be compressed by hardware encoder so as to reduce the write size, and managed compressed-enabled journaling strategy to avoid the write amplification. Moreover, we analyze the overhead of wrJFS and show that the overhead is negligible. Accroding to the experimental results, the proposed wrJFS can reduce the size of the write request by 89.7\% on average, compared with the original EXT3.
Current Research Results
Authors: Sung-Hsien Hsieh, Tsung-Hsuan Hung, Chun-Shien Lu, Yu-Chi Chen, and Soo-Chang Pei

Chun-ShienLuSung-Hsien HsiehAbstract:
Wireless sensors have been helpful and popular for gathering information, in particular in harsh environments. Due to the limit of computation power and energy, compressive sensing (CS) has attracted considerable attention in achieving simultaneous sensing and compression of data on the sensor/encoder side with cheap cost. Nevertheless, along with the increase of the data size, the computation overhead for decoding becomes unaffordable on the user/decoder side. To overcome this problem, by taking advantage of resourceful cloud, it is helpful to leverage the overhead. In this paper, we propose a cloud-assisted compressive sensing-based data gathering system with security assurance. Our system, involving three parties of sensor, cloud, and user, possesses several advantages. First, in terms of security, for any two data that are sparse in certain transformed domain, their corresponding ciphertexts are indistinguishable on the cloud side. Second, to avoid the communication bottleneck between the user and cloud, the sensor can encrypt data individually such that, once the cloud receives encrypted data from sensor, it can immediately carry out its task without requesting any information from the user. Third, we show that, even though the cloud knows the permuted support information of data, the security never is sacrificed. Meanwhile, the compression rate can be reduced further. Theoretical and empirical results demonstrate that our system is cost-effective and privacy guaranteed and that it possesses acceptable reconstruction quality.
Current Research Results
"UnistorFS: A Union Storage File System Design for Resource Sharing between Memory and Storage on Persistent RAM based Systems," ACM Transactions on Storage (TOS), February 2018.
Authors: Shuo-Han Chen, Tseng-Yi Chen, Yuan-Hao Chang, Hsin-Wen Wei, and Wei-Kuan Shih

With the advanced technology in persistent random access memory (PRAM), PRAM such as three-dimensional XPoint memory and Phase Change Memory (PCM) is emerging as a promising candidate for the nextgeneration medium for both (main) memory and storage. Previous works mainly focus on how to overcome the possible endurance issues of PRAM while both main memory and storage own a partition on the same PRAM device. However, a holistic software-level system design should be proposed to fully exploit the benefit of PRAM. This article proposes a union storage file system (UnistorFS), which aims to jointly manage the PRAM resource for main memory and storage. The proposed UnistorFS realizes the concept of using the PRAM resource as memory and storage interchangeably to achieve resource sharing while main memory and storage coexist on the same PRAM device with no partition or logical boundary. This approach not only enables PRAM resource sharing but also eliminates unnecessary data movements between main memory and storage since they are already in the same address space and can be accessed directly. At the same time, the proposed UnistorFS ensures the persistence of file data and sanity of the file system after power recycling. A series of experiments was conducted on a modified Linux kernel. The results show that the proposed UnistorFS can eliminate unnecessary memory accesses and outperform other PRAM-based file systems for 0.2–8.7 times in terms of read/write performance.
"Enhancing the Energy Efficiency of Journaling File System via Exploiting Multi-Write Modes on MLC NVRAM," ACM/IEEE ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), July 2018.
Authors: Shuo-Han Chen, Yuan-Hao Chang, Tseng-Yi Chen, Yu-Ming Chang, Pei-Wen Hsiao, Hsin-Wen Wei, and Wei-Kuan Shih

Non-volatile random-access memory (NVRAM) is regarded as a great alternative storage medium owing to its attractive features, including byte addressability, non-volatility, and short read/write latency. In addition, multi-level-cell (MLC) NVRAM has also been proposed to provide higher bit density. However, MLC NVRAM has lower endurance and longer write latency when comparing with single-level-cell (SLC) NVRAM. These drawbacks could degrade the performance of MLC NVRAM-based storage systems. The performance degradation is magnified by existing journaling file systems (JFS) on MLC NVRAM-based storage devices due to the JFS's fail-safe policy of writing the same data twice. Such observations motivate us to propose a multi-write-mode journaling file systems (mwJFS) to alleviate the drawbacks of MLC NVRAM and boost the performance of JFS. The proposed mwJFS differentiates the data retention requirement of journaled data and applies different write modes to enhance the access performance with lower energy consumption. A series of experiments was conducted to demonstrate the capability of mwJFS on a MLC NVRAM-based storage system.
Current Research Results
"Efficient and Retargetable SIMD Translation in a Dynamic Binary Translator," Software: Practice and Experience, June 2018.
Authors: Sheng-Yu Fu, Ding-Yong Hong, Yu-Ping Liu, Jan-Jan Wu, Wei-Chung Hsu

The single‐instruction multiple‐data (SIMD) computing capability of modern processors is continually improved to deliver ever better performance and power efficiency. For example, Intel has increased SIMD register lengths from 128 bits in streaming SIMD extension to 512 bits in AVX‐512. The ARM scalable vector extension supports SIMD register length up to 2048 bits and includes predicated instructions. However, SIMD instruction translation in dynamic binary translation has not received similar attention. For example, the widely used QEMU emulates guest SIMD instructions with a sequence of scalar instructions, even when the host machines have relevant SIMD instructions. This leaves significant potential for performance enhancement. We propose a newly designed SIMD translation framework for dynamic binary translation, which takes advantage of the host's SIMD capabilities. The proposed framework has been built in HQEMU, an enhanced QEMU with a separate thread for applying LLVM optimizations. The current prototype supports ARMv7, ARMv8, and IA32 guests on the X86‐64 AVX‐2 host. Compared with the scalar‐translation version HQEMU, our framework runs up to 1.84 times faster on Standard Performance Evaluation Corporation 2006 CFP benchmarks and up to 6.81 times faster on selected real applications.
Current Research Results
Authors: Sung-Hsien Hsieh, Chun-Shien Lu, and Soo-Chang Pei

Compressive sensing (CS) is proposed for signal sampling below the Nyquist rate based on the assumption that the signal is sparse in some transformed domain.Most sensing matrices ({\em e.g.,} Gaussian random matrix) in CS, however, usually suffer from unfriendly hardware implementation, high computation cost, and huge memory storage.In this paper, we propose a deterministic sensing matrix for collecting measurements fed into sparse Fast Fourier Transform (sFFT) as the decoder. Compared with conventional paradigm with Gaussian random matrix at encoder and convex programming or greedy method at decoders, sFFT can reconstruct sparse signals with very low computation cost under the comparable number of measurements. But, the limitation is that the signal must be sparse in the frequency domain. We further show how to relax this limitation into any domains with the transformation matrix or dictionary being circulant. Experimental and theoretical results validate the proposed method achieves fast sensing, fast recovery, and low memory cost.
Current Research Results
"Revealing the compositions of the intestinal microbiota of three Anguillid eel species by using 16S rDNA sequencing," Aquaculture Research, To Appear.
Authors: Hsiang-Yi Hsu, Fang-Chi Chang, Yu-Bin Wang, Shu-Hwa Chen,, Ya-Bo Lin, Chung-Yen Lin, Yu-San Han*

Probiotics are beneficial microbes improving the health of organisms, and most of them have favorable ability to adhere to the intestinal mucus of host. Previous studies indicated that the probionts isolated from the intestine of interested fish species may be the potential probiotics for itself. A. japonica, A. marmorata and A. bicolor pacifica are three commercially valuable aquaculture eel species. However, little research focused on the analysis of intestinal microbiota of these high-value eel species before. In this study, the intestinal microbiota of the three eel species were investigated by 16S rDNA metagenomics, and the sick Japanese eels were also analyzed to realize the effect of pathogen on the compositions of the intestinal microbiota. The results showed that, although the composition of intestinal microbiota could be modified by different environment, the genus of Plesiomonas, Clostridium, Bradyrhizobium, Acinetobacter, Cetobacterium, Shewanella and Serratia were generally dominant in the intestine of Japanese eel. Infection of E. tarda significantly excluded the normal bacteria in the Japanese eel’s intestine. The dominant bacterial genus among the intestinal microbiota of the three eel species were diverse, but some shared bacterial genus, Plesiomonas, Bradyrhizobium, Acinetobacter, Cetobacterium and Shewanella, were identified. These bacteria were also common intestinal bacteria in other fish, which would be the potential probiotics applied in the eel aquaculture. 
"Adaptive Communication for Distributed Deep Learning on Commodity GPU Cluster," IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), May 2018.
Authors: Li-Yung Ho, Jan-Jan Wu and Pangfeng Liu

Deep learning is now the most promising approach to develop human-intelligent computer systems. To speedup the development of neural networks, researchers have designed many distributed learning algorithms to facilitate the training process. In these algorithms, users use a constant to indicate the communication period for model/gradient exchange. We find that this type of communication pattern could incur unnecessary and inefficient data transmission for some training methods e.g., elastic SGD and gossiping SGD. In this paper, we propose an adaptive communication method to improve the performance of gossiping SGD. Instead of using a fixed period for model exchange, we exchange the models with other machines according to the change of the local model. This makes the communication more efficient and thus improves the performance. The experiment results show that our method reduces the communication traffic by 92%, which results in 52% reduction in training time while preserving the prediction accuracy compared with gossiping SGD.
Current Research Results
Authors: Sachit Mahajan, Hao-Min Liu, Tzu-Chieh Tsai, and Ling-Jyh Chen

Information and communication technologies have been widely used to achieve the objective of smart city development. A smart air quality sensing and forecasting system is an important part of a smart city. One of the major challenges in designing such a forecast system is ensuring high accuracy and an acceptable computation time. In this paper, we show that it is possible to accurately forecast fine particulate matter (PM2.5) concentrations with low computation time by using different clustering techniques. An Internet of Things (IoT) framework comprising of Airbox Devices for PM2.5 monitoring has been used to acquire the data. Our main focus is to achieve high forecasting accuracy with reduced computation time. We use a hybrid model to do the forecast and a grid based system to cluster the monitoring stations based on the geographical distance. The experiments and evaluation is done using Airbox devices data from 557 stations deployed all over Taiwan. We are able to demonstrate that a proper clustering based on geographical distance can reduce the forecasting error rate and also the computation time. Also, in order to further evaluate our system, we have applied wavelet-based clustering to group the monitoring stations. A final comparative analysis is done for different clustering schemes with respect to accuracy and computational time.


Academia Sinica 資訊科學研究所 Academia Sinica