中央研究院 資訊科學研究所
中央研究院資訊科學研究所
  
近期研究成果
Current Research Results
"Coherent Deep-Net Fusion To Classify Shots In Concert Videos," IEEE Transactions on Multimedia, To Appear.
Authors: Jen-Chun Lin, Wen-Li Wei, Tyng-Luh Liu, Yi-Hsuan Yang, Hsin-Min Wang, Hsiao-Rong Tyan, and Hong-Yuan Mark Liao

MarkLiaoHsin-MinWangYi-HsuanYangTyng-LuhLiuAbstract:
Varying types of shots is a fundamental element in the language of film, commonly used by a visual storytelling director. The technique is often used in creating professional recordings of a live concert, but meanwhile may not be appropriately applied in audience recordings of the same event. Such variations could cause the task of classifying shots in concert videos, professional or amateur, very challenging. To achieve more reliable shot classification, we propose a novel probabilisticbased approach, named as Coherent Classification Net (CC-Net), by addressing three crucial issues. First, We focus on learning more effective features by fusing the layer-wise outputs extracted from a deep convolutional neural network (CNN), pre-trained on a large-scale dataset for object recognition. Second, we introduce a frame-wise classification scheme, the error weighted deep crosscorrelation model (EW-Deep-CCM), to boost the classification accuracy. Specifically, the deep neural network-based crosscorrelation model (Deep-CCM) is constructed to not only model the extracted feature hierarchies of CNN independently but also relate the statistical dependencies of paired features from different layers. Then, a Bayesian error weighting scheme for classifier combination is adopted to explore the contributions from individual Deep-CCM classifiers to enhance the accuracy of shot classification in each image frame. Third, we feed the framewise classification results to a linear-chain conditional random field (CRF) module to refine the shot predictions by taking account of the global and temporal regularities. We provide extensive experimental results on a dataset of live concert videos to demonstrate the advantage of the proposed CC-Net over existing popular fusion approaches for shot classification.
"Enhancing the Energy Efficiency of Journaling File System via Exploiting Multi-Write Modes on MLC NVRAM," ACM/IEEE ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), July 2018.
Authors: Shuo-Han Chen, Yuan-Hao Chang, Tseng-Yi Chen, Yu-Ming Chang, Pei-Wen Hsiao, Hsin-Wen Wei, and Wei-Kuan Shih

Yuan-HaoChangAbstract:
Non-volatile random-access memory (NVRAM) is regarded as a great alternative storage medium owing to its attractive features, including byte addressability, non-volatility, and short read/write latency. In addition, multi-level-cell (MLC) NVRAM has also been proposed to provide higher bit density. However, MLC NVRAM has lower endurance and longer write latency when comparing with single-level-cell (SLC) NVRAM. These drawbacks could degrade the performance of MLC NVRAM-based storage systems. The performance degradation is magnified by existing journaling file systems (JFS) on MLC NVRAM-based storage devices due to the JFS's fail-safe policy of writing the same data twice. Such observations motivate us to propose a multi-write-mode journaling file systems (mwJFS) to alleviate the drawbacks of MLC NVRAM and boost the performance of JFS. The proposed mwJFS differentiates the data retention requirement of journaled data and applies different write modes to enhance the access performance with lower energy consumption. A series of experiments was conducted to demonstrate the capability of mwJFS on a MLC NVRAM-based storage system.
Current Research Results
"Efficient and Retargetable SIMD Translation in a Dynamic Binary Translator," Software: Practice and Experience, June 2018.
Authors: Sheng-Yu Fu, Ding-Yong Hong, Yu-Ping Liu, Jan-Jan Wu, Wei-Chung Hsu

Jan-JanWuDing-YongHongAbstract:
The single‐instruction multiple‐data (SIMD) computing capability of modern processors is continually improved to deliver ever better performance and power efficiency. For example, Intel has increased SIMD register lengths from 128 bits in streaming SIMD extension to 512 bits in AVX‐512. The ARM scalable vector extension supports SIMD register length up to 2048 bits and includes predicated instructions. However, SIMD instruction translation in dynamic binary translation has not received similar attention. For example, the widely used QEMU emulates guest SIMD instructions with a sequence of scalar instructions, even when the host machines have relevant SIMD instructions. This leaves significant potential for performance enhancement. We propose a newly designed SIMD translation framework for dynamic binary translation, which takes advantage of the host's SIMD capabilities. The proposed framework has been built in HQEMU, an enhanced QEMU with a separate thread for applying LLVM optimizations. The current prototype supports ARMv7, ARMv8, and IA32 guests on the X86‐64 AVX‐2 host. Compared with the scalar‐translation version HQEMU, our framework runs up to 1.84 times faster on Standard Performance Evaluation Corporation 2006 CFP benchmarks and up to 6.81 times faster on selected real applications.
Current Research Results
Authors: Sung-Hsien Hsieh, Chun-Shien Lu, and Soo-Chang Pei

Chun-ShienLuAbstract:
Compressive sensing (CS) is proposed for signal sampling below the Nyquist rate based on the assumption that the signal is sparse in some transformed domain.

Most sensing matrices ({\em e.g.,} Gaussian random matrix) in CS, however, usually suffer from unfriendly hardware implementation, high computation cost, and huge memory storage.

 

In this paper, we propose a deterministic sensing matrix for collecting measurements fed into sparse Fast Fourier Transform (sFFT) as the decoder.

Compared with conventional paradigm with Gaussian random matrix at encoder and convex programming or greedy method at decoders, sFFT can reconstruct sparse signals with very low computation cost under the comparable number of measurements.

But, the limitation is that the signal must be sparse in the frequency domain.

We further show how to relax this limitation into any domains with the transformation matrix or dictionary being circulant.

Experimental and theoretical results validate the proposed method achieves fast sensing, fast recovery, and low memory cost.

Current Research Results
"Revealing the compositions of the intestinal microbiota of three Anguillid eel species by using 16S rDNA sequencing," Aquaculture Research, To Appear.
Authors: Hsiang-Yi Hsu, Fang-Chi Chang, Yu-Bin Wang, Shu-Hwa Chen,, Ya-Bo Lin, Chung-Yen Lin, Yu-San Han*

Chung-YenLinAbstract:
Probiotics are beneficial microbes improving the health of organisms, and most of them have favorable ability to adhere to the intestinal mucus of host. Previous studies indicated that the probionts isolated from the intestine of interested fish species may be the potential probiotics for itself. A. japonica, A. marmorata and A. bicolor pacifica are three commercially valuable aquaculture eel species. However, little research focused on the analysis of intestinal microbiota of these high-value eel species before. In this study, the intestinal microbiota of the three eel species were investigated by 16S rDNA metagenomics, and the sick Japanese eels were also analyzed to realize the effect of pathogen on the compositions of the intestinal microbiota. The results showed that, although the composition of intestinal microbiota could be modified by different environment, the genus of Plesiomonas, Clostridium, Bradyrhizobium, Acinetobacter, Cetobacterium, Shewanella and Serratia were generally dominant in the intestine of Japanese eel. Infection of E. tarda significantly excluded the normal bacteria in the Japanese eel’s intestine. The dominant bacterial genus among the intestinal microbiota of the three eel species were diverse, but some shared bacterial genus, Plesiomonas, Bradyrhizobium, Acinetobacter, Cetobacterium and Shewanella, were identified. These bacteria were also common intestinal bacteria in other fish, which would be the potential probiotics applied in the eel aquaculture. 
"Adaptive Communication for Distributed Deep Learning on Commodity GPU Cluster," IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), May 2018.
Authors: Li-Yung Ho, Jan-Jan Wu and Pangfeng Liu

Jan-JanWuLi-YungHoAbstract:
Deep learning is now the most promising approach to develop human-intelligent computer systems. To speedup the development of neural networks, researchers have designed many distributed learning algorithms to facilitate the training process. In these algorithms, users use a constant to indicate the communication period for model/gradient exchange. We find that this type of communication pattern could incur unnecessary and inefficient data transmission for some training methods e.g., elastic SGD and gossiping SGD. In this paper, we propose an adaptive communication method to improve the performance of gossiping SGD. Instead of using a fixed period for model exchange, we exchange the models with other machines according to the change of the local model. This makes the communication more efficient and thus improves the performance. The experiment results show that our method reduces the communication traffic by 92%, which results in 52% reduction in training time while preserving the prediction accuracy compared with gossiping SGD.
Current Research Results
Authors: Sachit Mahajan, Hao-Min Liu, Tzu-Chieh Tsai, and Ling-Jyh Chen

Ling-JyhChenAbstract:
Information and communication technologies have been widely used to achieve the objective of smart city development. A smart air quality sensing and forecasting system is an important part of a smart city. One of the major challenges in designing such a forecast system is ensuring high accuracy and an acceptable computation time. In this paper, we show that it is possible to accurately forecast fine particulate matter (PM2.5) concentrations with low computation time by using different clustering techniques. An Internet of Things (IoT) framework comprising of Airbox Devices for PM2.5 monitoring has been used to acquire the data. Our main focus is to achieve high forecasting accuracy with reduced computation time. We use a hybrid model to do the forecast and a grid based system to cluster the monitoring stations based on the geographical distance. The experiments and evaluation is done using Airbox devices data from 557 stations deployed all over Taiwan. We are able to demonstrate that a proper clustering based on geographical distance can reduce the forecasting error rate and also the computation time. Also, in order to further evaluate our system, we have applied wavelet-based clustering to group the monitoring stations. A final comparative analysis is done for different clustering schemes with respect to accuracy and computational time.
Current Research Results
"We Like, We Post: A Joint User-Post Approach for Facebook Post Stance Labeling," IEEE Transactions on Knowledge and Data Engineering, To Appear.
Authors: Wei-Fan Chen and Lun-Wei Ku

Lun-WeiKuWei-Fan ChenAbstract:
Web post and user stance labeling is challenging not only because of the informality and variation in language on the Web but also because of the lack of labeled data on fast-emerging new topics—even the labeled data we do have are usually heavily skewed. In this paper, we propose a joint user-post approach for stance labeling to mitigate the latter two difficulties. In labeling post stances, the proposed approach considers post content as well as posting and liking behavior, which involves users. Sentiment analysis is applied to posts to acquire their initial stance, and then the post and user stance are updated iteratively with correlated posting-related actions. The whole process works with few labeled data, which solves the first problem. We use the real interactions between authors and readers for stance labeling. Experimental results show that the proposed approach not only substantially improves content-based post stance labeling, but also achieves better performance for the minor stance class, which solves the second problem.
"Minimizing Write Amplification to Enhance Lifetime of Large-page Flash-Memory Storage Devices," ACM/IEEE Design Automation Conference (DAC), June 2018.
Authors: Wei-Lin Wang, Tseng-Yi Chen, Yuan-Hao Chang, Hsin-Wen Wei, and Wei-Kuan Shih

Yuan-HaoChangAbstract:
Due to the decreasing endurance of flash chips, the lifetime of flash drives has become a critical issue. To resolve this issue, various techniques such as wear-leveling and error correction code have been proposed to reduce the bit error rates of a flash drive. In contrast to these techniques, we observe that minimizing write amplification (or reducing the amount of extra writes to flash chips) is another promising direction to enhance the lifetime of a flash drive. In this work, we propose a partial update strategy to support partial updates to the data in flash pages. Thus, it can minimize write amplification by only updating the modified part of data in flash pages with the support of data reduction techniques. This strategy is orthogonal to wear-leveling and error correction techniques, and thus can cooperate with them to further enhance the lifetime of a flash drive. Based on a series of experiments, the results demonstrate that the proposed strategy can effectively improve the lifetime of a flash drive by reducing write amplification.
"Proactive Channel Adjustment to Improve Polar Code Capability for Flash Storage Devices," ACM/IEEE Design Automation Conference (DAC), June 2018.
Authors: Kun-Cheng Hsu, Che-Wei Tsao, Yuan-Hao Chang, Tei-Wei Kuo, and Yu-Ming Huang

Yuan-HaoChangAbstract:
Low-density parity-check (LDPC) codes have made a great success on correcting errors in flash storage devices, but its hardware cost and error correction time keep increasing as the error rate of flash memory keeps increasing. In addition to improving the lifetime of devices, researchers are seeking alternative methods. Fortunately, with the low encoding/decoding complexity and the high error correction capability, polar code with the support of list-decoding and cyclic redundancy check can outperform LDPC code in the area of data communication. Thus, it also draws a lot of attentions on how to adopt and enable polar codes in storage applications. However, the \\textit{code construction} and \\textit{encoding length limitation} issues obstruct the adoption of polar codes in flash storage devices. To enable polar codes in flash storage devices, we propose a proactive channel adjustment design to extend the effective time of a code construction to improve theerror correction capability of polar codes. This design pro-actively tunes the quality of the critical flash cells to maintain the correctness of the code construction and relax the constraint of the encoding length limitation, so that polar codes can be enabled in flash storage devices. A series of experiments was conducted to evaluate the efficacy of the proposed design. It shows that the proposed design can effectively improve the error correction capability of polar codes in flash storage devices.
Current Research Results
"Improving SIMD Parallelism via Dynamic Binary Translation," ACM Transactions on Embedded Computing Systems (TECS), February 2018.
Authors: Ding-Yong Hong, Yu-Ping Liu, Sheng-Yu Fu, Jan-Jan Wu, Wei-Chung Hsu

Jan-JanWuDing-YongHongAbstract:
Recent trends in SIMD architecture have tended toward longer vector lengths, and more enhanced SIMD features have been introduced in newer vector instruction sets. However, legacy or proprietary applications compiled with short-SIMD ISA cannot benefit from the long-SIMD architecture that supports improved parallelism and enhanced vector primitives, resulting in only a small fraction of potential peak performance. This article presents a dynamic binary translation technique that enables short-SIMD binaries to exploit benefits of new SIMD architectures by rewriting short-SIMD loop code. We propose a general approach that translates loops consisting of short-SIMD instructions to machine-independent IR, conducts SIMD loop transformation/optimization at this IR level, and finally translates to long-SIMD instructions. Two solutions are presented to enforce SIMD load/store alignment, one for the problem caused by the binary translator’s internal translation condition and one general approach using dynamic loop peeling optimization. Benchmark results show that average speedups of 1.51× and 2.48× are achieved for an ARM NEON to x86 AVX2 and x86 AVX-512 loop transformation, respectively.
"MONPA: Multi-objective Named-entity and Part-of-speech Annotator for Chinese using Recurrent Neural Network," The 8th International Joint Conference on Natural Language Processing (IJCNLP 2017), November 2017.
Authors: Yu-Lun Hsieh, Yung-Chun Chang, Yi-Jie Huang, Shu-Hao Yeh, Chun-Hung Chen and Wen-Lian Hsu

Wen-LianHsuYung-ChunChangYULUNHSIEHAbstract:
Part-of-speech (POS) tagging and named entity recognition (NER) are crucial steps in natural language processing. In addition, the difficulty of word segmentation places extra burden on those who deal with languages such as Chinese, and pipelined systems often suffer from error propagation. This work proposes an end to-end model using character-based recurrent neural network (RNN) to jointly accomplish segmentation, POS tagging and NER of a Chinese sentence. Experiments on previous word segmentation and NER competition data sets show that a single joint model using the proposed architecture is comparable to those trained specifically for each task, and out performs freely-available softwares. Moreover, we provide a web-based interface for the public to easily access this resource.
"Achieving Defect-Free Multilevel 3D Flash Memory with One-Shot Program Design," ACM/IEEE Design Automation Conference (DAC), June 2018.
Authors: Chien-Chung Ho, Yung-Chun Li, Yuan-Hao Chang, and Yu-Ming Chang

Yuan-HaoChangAbstract:
The rapid growth of data volume for various applications demands a high memory capacity, and multi-level-cell technology storing multiple bits in a single cell, is a very popular way to satisfy this requirement, such as multi-level-cell (MLC) and triple-level-cell (TLC) flash memories. To store the desired data on MLC and TLC flash memories, the conventional programming strategies need to divide a fixed range of threshold voltage ($V_{t}$) window into several parts. The narrowly partitioned $V_{t}$ window in turn limits the design of programming strategy and becomes the main reason to cause flash-memory defects, i.e., the longer read/write latency and worse data reliability. This motivates this work to explore the innovative programming design for solving the flash-memory defects. Thus, to achieve the defect-free 3D NAND flash memory, this paper presents and realizes a one-shot program design to significantly eliminate the negative impacts caused by conventional programming strategies. The proposed one-shot program design includes two strategies, i.e., prophetic and classification programming, for MLC flash memories, and the idea is extended to TLC flash memories. The measurement results show that it can accelerate programming speed by 31x and reduce RBER by 1000x for the MLC flash memory, and it can broaden the available window of threshold voltage up to 5.1x for the TLC flash memory.
"Improving Runtime Performance of Deduplication System with Host-Managed SMR Storage Drives," ACM/IEEE Design Automation Conference (DAC), June 2018.
Authors: Chun-Feng Wu, Ming-Chang Yang, and Yuan-Hao Chang

Yuan-HaoChangAbstract:
Due to the cost consideration for data storage, high-areal-density shingled-magnetic-recording (SMR) drives and data deduplication techniques are getting popular in many data storage services for the improvement of profit per storage unit. However, naively applying deduplication techniques upon SMR drives may dramatically downgrade the runtime performance of data storage services, because of the time-consuming SMR space reclamation processes. This work advocates a vertical integration solution by jointly managing the host-managed SMR drives with deduplication system, in order to essentially relieve the time-consuming SMR space reclamation issue. The proposed design was evaluated by a series of realistic deduplication workloads with encouraging results.
"Enabling Union Page Cache to Boost File Access Performance of NVRAM-Based Storage Devices," ACM/IEEE Design Automation Conference (DAC), June 2018.
Authors: Shuo-Han Chen, Tseng-Yi Chen, Yuan-Hao Chang, Hsin-Wen Wei, and Wei-Kuan Shih

Yuan-HaoChangAbstract:
Due to the fast access performance, byte-addressability, and non-volatility, phase-change memory (PCM) is becoming a popular candidate in the design of memory/storage systems of embedded systems. When it is considered as both main memory and storage in an embedded system, existing page cache mechanisms, which were designed to hide the performance gap between main memory and secondary storage, turn out introducing too many unnecessary data movements between main memory and storage. To resolve this issue, we propose the concept of ``union page cache,'' which jointly manages data of the page cache in both main memory and storage. To realize this concept, a partial page cache strategy is designed to consider both main memory and storage as its management space. By utilizing the fact that both main memory and storage residing in the same PCM device share the same address space, this strategy can minimize unnecessary data movement between main memory and storage without sacrificing the data consistency of file systems. A series of experiments was conducted on an embedded system evaluation board. The results show that the proposed strategy can outperform the file accessing performance of the conventional page cache mechanism by 77.68
Current Research Results
Authors: Hsin-Nan Lin and Wen-Lian Hsu

Wen-LianHsuAbstract:
Motivation
In recent years, the massively parallel cDNA sequencing (RNA-Seq) technologies have become a powerful tool to provide high resolution measurement of expression and high sensitivity in detecting low abundance transcripts. However, RNA-seq data requires a huge amount of computational efforts. The very fundamental and critical step is to align each sequence fragment against the reference genome. Various de novo spliced RNA aligners have been developed in recent years. Though these aligners can handle spliced alignment and detect splice junctions, some challenges still remain to be solved. With the advances in sequencing technologies and the ongoing collection of sequencing data in the ENCODE project, more efficient alignment algorithms are highly demanded. Most read mappers follow the conventional seed-and-extend strategy to deal with inexact matches for sequence alignment. However, the extension is much more time consuming than the seeding step.
Results
We proposed a novel RNA-seq de novo mapping algorithm, call DART, which adopts a partitioning strategy to avoid the extension step. The experiment results on synthetic datasets and real NGS datasets showed that DART is a highly efficient aligner that yields the highest or comparable sensitivity and accuracy compared to most state-of-the-art aligners, and more importantly, it spends the least amount of time among the selected aligners.
Current Research Results
Authors: Hsiao-Pei Lu, Po-Yu Liu, Yu-bin Wang, Ji-Fan Hsieh, Han-Chen Ho, Shiao-Wei Huang, Chung-Yen Lin, Chih-hao Hsieh, Hon-Tsen Yu

Chung-YenLinAbstract:
Mammalian herbivores rely on microbial activities in an expanded gut chamber to convert plant biomass into absorbable nutrients. Distinct from ruminants, small herbivores typically have a simple stomach but an enlarged cecum to harbor symbiotic microbes; however, knowledge of this specialized gut structure and characteristics of its microbial contents is limited. Here, we used leaf-eating flying squirrels as a model to explore functional characteristics of the cecal microbiota adapted to a high-fiber, toxin-rich diet. Specifically, environmental conditions across gut regions were evaluated by measuring mass, pH, feed particle size, and metabolomes. Then, parallel metagenomes and metatranscriptomes were used to detect microbial functions corresponding to the cecal environment. Based on metabolomic profiles, >600 phytochemical compounds were detected, although many were present only in the foregut and probably degraded or transformed by gut microbes in the hindgut. Based on metagenomic (DNA) and metatranscriptomic (RNA) profiles, taxonomic compositions of the cecal microbiota were dominated by bacteria of the Firmicutes taxa; they contained major gene functions related to degradation and fermentation of leaf-derived compounds. Based on functional compositions, genes related to multidrug exporters were rich in microbial genomes, whereas genes involved in nutrient importers were rich in microbial transcriptomes. In addition, genes encoding chemotaxis-associated components and glycoside hydrolases specific for plant beta-glycosidic linkages were abundant in both DNA and RNA. This exploratory study provides findings which may help to form molecular-based hypotheses regarding functional contributions of symbiotic gut microbiota in small herbivores with folivorous dietary habits.
"Singing voice correction using canonical time warping," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2018.
Authors: Yin-Jyun Luo, Ming-Tso Chen, Tai-Shih Chi, and Li Su

LiSuYin-JyunLuoAbstract:
Expressive singing voice correction is an appealing but challenging problem. A robust time-warping algorithm which synchronizes two singing recordings can provide a promising solution. We thereby propose to address the problem by canonical time warping (CTW) which aligns amateur singing recordings to professional ones. A new pitch contour is generated given the alignment information, and a pitch-corrected singing is synthesized back through the vocoder. The objective evaluation shows that CTW is robust against pitch-shifting and time-stretching effects, and the subjective test demonstrates that CTW prevails the other methods including DTW and the commercial auto-tuning software. Finally, we demonstrate the applicability of the proposed method in a practical, real-world scenario.
"Automatic music transcription leveraging generalized cepstral features and deep learning," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2018.
Authors: Yu-Te Wu, Berlin Chen, and Li Su

LiSuAbstract:
Spectral features are limited in modeling musical signals with multiple concurrent pitches due to the challenge to suppress the interference over the harmonic peaks from one pitch to another. In this paper, we show that using multiple features represented in both the frequency and time domains with deep learning modeling can reduce such interference. These features are derived systematically from conventional pitch detection functions that relate to one another through the Fourier transform and a nonlinear scaling function. Neural networks modeled with these features outperform state-of-the-art methods while using less training data.
"Vocal melody extraction using patch-based CNN," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2018.
Authors: Li Su

LiSuAbstract:
A patch-based convolutional neural network (CNN) model presented in this paper for vocal melody extraction in polyphonic music is inspired from object detection in image processing. The input of the model is a novel time-frequency representation which enhances the pitch contours and suppresses the harmonic components of a signal. This succinct data representation and the patch-based CNN model enable an efficient training process with limited labeled data. Experiments on various datasets show excellent speed and competitive accuracy comparing to other deep learning approaches.
"How sampling rate affects cross-domain transfer learning for video description," IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2018.
Authors: Y. S. Chou, P. H. Hsiao, S. D. Lin, and H. Y. Mark Liao

MarkLiaoAbstract:
Translating video to language is very challenging due to diversified video contents originated from multiple activities and complicated integration of spatio-temporal information. There are two urgent issues associated with the video-to-language translation problem. First, how to transfer knowledge learned from a more general dataset to a specific application domain dataset? Second, how to generate stable video captioning (or description) results under different sampling rates? In this paper, we propose a novel temporal embedding method to better retain temporal representation under different video sampling rates. We present a transfer learning method that combines a stacked LSTM encoder-decoder structure and a temporal embedding learning with soft-attention (TELSA) mechanism. We evaluate the proposed approach on two public datasets, including MSR-VTT and MSVD. The promising experimental results confirm the effectiveness of the proposed approach.
"Low precision deep learning training on mobile heterogeneous platform," 26th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP 2018), March 2018.
Authors: Olivier Valery, Pangfeng Liu, Jan-Jan Wu

Jan-JanWuAbstract:
Recent advances in System-on-Chip architectures have made the use of deep learning suitable for a number of applications on mobile devices. Unfortunately, due to the computational cost of neural network training, it is often limited to inference task, e.g., prediction, on mobile devices. In this paper, we propose a deep learning framework that enables both deep learning training and inference tasks on mobile devices. While being able to accommodate with the heterogeneity of computing devices technology on mobile devices, it also uses OpenCL to efficiently leverages modern SoC capabilities, e.g., multi-core CPU, integrated GPU and shared memory architecture, and accelerates deep learning computation. In addition, our system encodes the arithmetic operations of deep networks down to 8-bit fixed-point on mobile devices. As a proof of concept, we trained three well-known neural networks on mobile devices and exhibits a significant performance gain, energy consumption reduction, and memory saving.
Current Research Results
"Workload Prediction and Balance for Distributed Reachability Processing for Attribute Graphs," Concurrency and Computation: Practice and Experience, To Appear.
Authors: Li-Yung Ho, Jan-Jan Wu, Pangfeng Liu

Jan-JanWuLi-YungHoAbstract:
Reachability query with label constraint in an attribute graph is one of the most fundamental and important operations in semantic network analysis. However, ever-growing graph size has resulted in intractable reachability problems on single machines. This work aims to devise efficient solutions for the reachability with label constraint problem in an attribute graph in a distributed environment. We focus on two issues in distributed processing data locality workload balancing since data locality reduces communication overhead and workload balancing improves the efficiency of cluster use. We propose three novel techniques to address the two issues: (1) a partition replication method that improves data locality while conserving community property, (2) a workload-prediction method that accurately predicts machine workloads for a given quer, and (3) a workload balancing method that uses these predictions to shift partial workloads among machines to produce a balanced workload. Experimental results suggest that these techniques significantly improve performance and reduce total execution time by 40%.
Current Research Results
"Automatic Image Cropping for Visual Aesthetic Enhancement Using Deep Neural Networks and Cascaded Regression," IEEE Transactions on Multimedia, To Appear.
Authors: Guanjun Guo, Hanzi Wang, Chunhua Shen, Yan Yan, and Hong-Yuan Mark Liao

MarkLiaoAbstract:
Despite recent progress, computational visual aesthetic is still challenging. Image cropping, which refers to the removal of unwanted scene areas, is an important step to improve the aesthetic quality of an image. However, it is challenging to evaluate whether cropping leads to aesthetically pleasing results because the assessment is typically subjective. In this paper, we propose a novel cascaded cropping regression (CCR) method to perform image cropping by learning the knowledge from professional photographers. The proposed CCR method improves the convergence speed of the cascaded method, which directly uses random-ferns regressors. In addition, a two-step learning strategy is proposed and used in the CCR method to address the problem of lacking labelled cropping data. Specifically, a deep convolutional neural network (CNN) classifier is first trained on large-scale visual aesthetic datasets. The deep CNN model is then designed to extract features from several image cropping datasets, upon which the cropping bounding boxes are predicted by the proposed CCR method. Experimental results on public image cropping datasets demonstrate that the proposed ethod significantly outperforms several state-of-the-art image cropping methods
Authors: Peng-Hsuan Li, Ruo-Ping Dong, Yu-SiangWang, Ju-Chieh Chou, Wei-Yun Ma

Wei-YunMaPeng-Hsuan LiAbstract:
In this paper, we utilize the linguistic structures of texts to improve named entity recognition by BRNN-CNN, a special bidirectional recursive network attached with a convolutional network. Motivated by the observation that named entities are highly related to linguistic constituents, we propose a constituent-based BRNN-CNN for named entity recognition. In contrast to classical sequential labeling methods, the system first identifies which text chunks are possible named entities by whether they are linguistic constituents. Then it classifies these chunks with a constituency tree structure by recursively propagating syntactic and semantic information to each constituent node. This method surpasses current state-of-the-art on OntoNotes 5.0 with automatically generated parses.