Institute of Information Science
"Oblivious Parallel RAM and Applications," The 13th IACR Theory of Cryptography Conference (TCC2016), January 2016.
Authors: Elette Boyle and Kai-Min Chung and Rafael Pass

We initiate the study of cryptography for parallel RAM (PRAM) programs. The PRAM model captures modern multi-core architectures and cluster computing models, where several processors execute in parallel and make accesses to shared memory, and provides the "best of both" circuit and RAM models, supporting both cheap random access and parallelism.

We propose and attain the notion of Oblivious PRAM. We present a compiler taking any PRAM into one whose distribution of memory accesses is statistically independent of the data (with negligible error), while only incurring a polylogarithmic slowdown (in both total and parallel complexity). We discuss applications of such a compiler, building upon recent advances relying on Oblivious (sequential) RAM (Goldreich Ostrovsky JACM’12). In particular, we demonstrate the construction of a garbled PRAM compiler based on an OPRAM compiler and secure identity-based encryption.
Current Research Results
"Energy-efficient Task Scheduling for Multi-core Platforms with per-core DVFS," Journal of Parallel and Distributed Computing, December 2015.
Authors: Ching-Chi Lin, You-Cheng Syu, Chao-Jui Chang, Jan-Jan Wu, Pangfeng Liu, Po-Wen Cheng, Wei-Te Hsu

Energy-efficient task scheduling is a fundamental issue in many application domains, such as energy conservation for mobile devices and the operation of green computing data centers. Modern processors support dynamic voltage and frequency scaling (DVFS) on a per-core basis, i.e., the CPU can adjust the voltage or frequency of each core. As a result, the core in a processor may have different computing power and energy consumption. To conserve energy in multi-core platforms, we propose task scheduling algorithms that leverage per-core DVFS and achieve a balance between performance and energy consumption. We consider two task execution modes: the batch mode, which runs jobs in batches; and the online mode in which jobs with different time constraints, arrival times, and computation workloads co-exist in the system. For tasks executed in the batch mode, we propose an algorithm that finds the optimal scheduling policy; and for the online mode, we present a heuristic algorithm that determines the execution order and processing speed of tasks in an online fashion. The heuristic ensures that the total cost is minimal for every time interval during a task execution. Furthermore, we analyze and derive algorithms with low time complexity for each mode.
Current Research Results
"A Necessary and Sufficient Condition for Generalized Demixing," IEEE Signal Processing Letters, November 2015.
Authors: Chun-Yen Kuo, Gang-Xuan Lin, and Chun-Shien Lu,

Demixing is the problem of identifying multiple structured signals from a superimposed observation. This work analyzes a general framework, based on convex optimization, for solving demixing problems. We present a new solution to determine whether or not a specific convex optimization problem built for generalized demixing is successful. This solution also creates a way to estimate the probability of success by the approximate kinematic formula.
Authors: Li-Yung Ho, Fei Shao, Jan-Jan Wu, and Pangfeng Liu

To reduce container management costs, ocean carrier companies rent containers from container leasing companies. Two carrier companies can exchange their empty containers between each other at various ports to eliminate the transportation cost of empty containers. To minimize costs, a container leasing company has to find the maximum number of pairs of carrier companies that can exchange containers. We formulate this problem as maximum matching in a large general graph, and propose a distributed matching algorithm to solve this problem. We also propose several optimization techniques to improve the efficiency of our algorithm. To facilitate the implementation of the proposed algorithm, we extend the traditional BSP model and develop a novel iterative multiple BSP model. We collect real world container shipment data from ocean carrier companies and conduct extensive experiments to demonstrate the efficiency of the proposed algorithm.
Current Research Results
Authors: Reta Birhanu Kitata, Baby Rorielyn T. Dimayacyac-Esleta, Wai-Kok Choong, Chia-Feng Tsai, Tai-Du Lin, Chih-Chiang Tsou, Shao-Hsing Weng, Yi-Ju Chen, Pan-Chyr Yang, Susan D. Arco, Alexey I. Nesvizhskii, Ting-Yi Sung, and Yu-Ju Chen

Despite significant efforts in the past decade toward complete mapping of the human proteome, 3564 proteins (neXtProt, 09-2014) are still “missing proteins”. Over one-third of these missing proteins are annotated as membrane proteins, owing to their relatively challenging accessibility with standard shotgun proteomics. Using nonsmall cell lung cancer (NSCLC) as a model study, we aim to mine missing proteins from disease-associated membrane proteome, which may be still largely under-represented. To increase identification coverage, we employed Hp-RP StageTip prefractionation of membrane-enriched samples from 11 NSCLC cell lines. Analysis of membrane samples from 20 pairs of tumor and adjacent normal lung tissue was incorporated to include physiologically expressed membrane proteins. Using multiple search engines (X!Tandem, Comet, and Mascot) and stringent evaluation of FDR (MAYU and PeptideShaker), we identified 7702 proteins (66% membrane proteins) and 178 missing proteins (74 membrane proteins) with PSM-, peptide-, and protein-level FDR of 1%. Through multiple reaction monitoring using synthetic peptides, we provided additional evidence of eight missing proteins including seven with transmembrane helix domains. This study demonstrates that mining missing proteins focused on cancer membrane subproteome can greatly contribute to map the whole human proteome. All data were deposited into ProteomeXchange with the identifier PXD002224.
Current Research Results
Authors: P. Horvatovich, EK Lunberg, Yu-Ju Chen, Ting-Yi Sung, F. He, et al.

This paper summarizes the recent activities of the Chromosome-Centric Human Proteome Project (C-HPP) consortium, which develops new technologies to identify yet-tobe annotated proteins (termed “missing proteins”) in biological samples that lack sufficient experimental evidence at the protein level for confident protein identification. The C-HPP also aims to identify new protein forms that may be caused by genetic variability, post-translational modifications, and alternative splicing. Proteogenomic data integration forms the basis of the C-HPP’s activities; therefore, we have summarized some of the key approaches and their roles in the project. We present new analytical technologies that improve the chemical space and lower detection limits coupled to bioinformatics tools and some publicly available resources that can be used to improve data analysis or support the development of analytical assays. Most of this paper’s content has been compiled from posters, slides, and discussions presented in the series of C-HPP workshops held during 2014. All data (posters, presentations) used are available at the C-HPP Wiki ( and in the Supporting Information.
Current Research Results
Authors: Hsu HY, Chen SH, Cha YR, Aoyama J, Miller M, Watanabe S, Tsukamoto K, Lin CY*, Han YS*.

Natural stocks of Japanese eel (Anguilla japonica) have decreased drastically because of overfishing, habitat destruction, and changes in the ocean environment over the past few decades. However, to date, artificial mass production of glass eels is far from reality because of the lack of appropriate feed for the eel larvae. In this study, wild glass eel, leptocephali, preleptocephali, and embryos were collected to conduct RNA-seq. Approximately 279 million reads were generated and assembled into 224,043 transcripts. The transcript levels of genes coding for digestive enzymes and nutrient transporters were investigated to estimate the capacities for nutrient digestion and absorption during early development. The results showed that the transcript levels of protein digestion enzymes were higher than those of carbohydrate and lipid digestion enzymes in the preleptocephali and leptocephali, and the transcript levels of amino acid transporters were also higher than those of glucose and fructose transporters and the cholesterol transporter. In addition, the transcript levels of glucose and fructose transporters were significantly raising in the leptocephali. Moreover, the transcript levels of protein, carbohydrate, and lipid digestion enzymes were balanced in the glass eel, but the transcript levels of amino acid transporters were higher than those of glucose and cholesterol transporters. These findings implied that preleptocephali and leptocephali prefer high-protein food, and the nutritional requirements of monosaccharides and lipids for the eel larvae vary with growth. An online database ( that will provide the sequences and the annotated results of assembled transcripts was established for the eel research community.
Current Research Results
"Enabling Adaptive Cloud Gaming in an Open-Source Cloud Gaming Platform," IEEE Transactions on Circuits and Systems for Video Technology, To Appear.
Authors: Cheng-Hsin Hsu, Hua-Jun Hong, Chih-Fan Hsu, Tsung-Han Tsai, Chun-Ying Huang, and Kuan-Ta Chen

We study the problem of optimally adapting ongoing cloud gaming sessions to maximize the gamer experience in dynamic environments. The considered problem is quite challenging because: (i) gamer experience is subjective and hard to quantify, (ii) the existing open-source cloud gaming platform does not support dynamic reconfigurations of video codecs, and (iii) the resource allocation among concurrent gamers leaves a huge room to optimize. We rigorously address these three challenges by: (i) conducting a crowdsourced user study over the live Internet for an empirical gaming experience model, (ii) enhancing the cloud gaming platform to support frame rate and bitrate adaptation on-the-fly, and (iii) proposing optimal yet efficient algorithms to maximize the overall gaming experience or ensure the fairness among gamers. We conduct extensive trace-driven simulations to demonstrate the merits of our algorithms and implementation. Our simulation results show that the proposed efficient algorithms: (i) outperform the baseline algorithms by up to 46% and 30%, (ii) run fast and scale to large ( ≥ 8000 gamers) problems, and (iii) achieve the user-specified optimization criteria, such as maximizing average gamer experience or maximizing the minimum gamer experience. The resulting cloud gaming platform can be leveraged by many researchers, developers, and gamers.
Current Research Results
Authors: Tsai, Z.T.Y., Shiu, S.H.*, and Tsai, H.K.*

Transcription factor (TF) binding is determined by the presence of specific sequence motifs(SM) and chromatin accessibility, where the latter is influenced by both chromatin state(CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a TF-agnostic predictive model based on three DNA intrinsic properties (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.
Current Research Results
Authors: Cheng, J.H., Pan D., Tsai, Z.T.Y., and Tsai, H.K.*

Enhancers play a crucial role in gene regulation but the participation of enhancer transcripts (i.e. enhancer RNA, eRNAs) in regulatory systems remains unclear. We provide a computational analysis on eRNAs using genome-wide data across 12 mouse tissues. The expression of genes targeted by transcribing enhancer is positively correlated with eRNA expression and significantly higher than expression of genes targeted by non-transcribing enhancers. This result implies eRNA transcription indicates a state of enhancer that further increases gene expression. This state of enhancer is tissue-specific, as the same enhancer differentially transcribes eRNAs across tissues. Therefore, the presence of eRNAs describes a tissue-specific state of enhancer that is generally associated with higher expressed target genes, surmising as to whether eRNAs have gene activation potential. We further found a large number of eRNAs contain regions in which sequences and secondary structures are similar to microRNAs. Interestingly, an increasing number of recent studies hypothesize that microRNAs may switch from their general repressive role to an activating role when targeting promoter sequences. Collectively, our results provide speculation that eRNAs may be associated with the selective activation of enhancer target genes.
"Smart Beholder: An Open-Source Smart Lens for Mobile Photography," Proceedings of ACM Multimedia 2015, 2015.
Authors: Chun-Ying Huang, Chih-Fan Hsu, Tsung-Han Tsai, Ching-Ling Fang, Cheng-Hsin Hsu, and Kuan-Ta Chen

Smart lenses are detachable lenses connected to mobile devices via wireless networks, which are not constrained by the small form factor of mobile devices, and have potential to deliver better photo (video) quality. However, the viewfinder previews of smart lenses on mobile devices are difficult to optimize, due to the strict resource constraints on smart lenses and fluctuating wireless network conditions. In this paper, we design, implement, and evaluate an open-source smart lens, called Smart Beholder. It achieves three design goals: (i) cost effectiveness, (ii) low interaction latency, and (iii) high preview quality by: (i) selecting an embedded system board that is just powerful enough, (ii) minimizing per-component latency, and (iii) dynamically adapting the video coding parameters to maximizing Quality of Experience (QoE), respectively. Several optimization techniques, such as anti-drifting mechanism for video frames and QoE-driven resolution/frame rate adaptation algorithm, are proposed in this paper. Our extensive measurement study indicates that Smart Beholder outperforms Altek Cubic and Sony QX100 in terms of lower bitrate, lower latency, slightly higher frame rate, and better preview quality. We also demonstrate that Smart Beholder adapts to network dynamics. Smart Beholder has been made public as an experimental platform for researchers and developers for optimized smart lenses, and other embedded real-time video streaming systems.
Current Research Results
Authors: Lin CH, Chen SH,Wang YB, Hsiung CA, Lin CY*

Enteroviruses (EV) with different genotypes cause diverse infectious diseases in humans and mammals. A correct EV typing result is crucial for effective medical treatment and disease control; however, the emergence of novel viral strains has impaired the performance of available diagnostic tools. Here, we present a web-based tool, named EVIDENCE (EnteroVirus In DEep coNCEption), for EV genotyping and recombination detection. We introduce the idea of using mixed–ranking scores to evaluate the fitness of prototypes based on relatedness and on the genome regions of interest. Using phylogenetic methods, the most possible genotype is determined based on the closest neighbor among the selected references. To detect possible recombination events, EVIDENCE calculates the sequence distance and phylogenetic relationship among sequences of all sliding windows scanning over the whole genome. Detected recombination events are plotted in an interactive figure for viewing of fine details. In addition, all EV sequences available in GenBank were collected and revised using the latest classification and nomenclature of EV in EVIDENCE. These sequences are built into the database and are retrieved in an indexed catalog, or can be searched for by keywords or by sequence similarity. EVIDENCE is the first web-based tool containing pipelines for genotyping and recombination detection, with updated, built-in, and complete reference sequences to improve sensitivity and specificity. The use of EVIDENCE can accelerate genotype identification, aiding clinical diagnosis and enhancing our understanding of EV evolution. 
Reference URL:
"Access Pattern Reshaping for eMMC-enabled SSDs," ACM/IEEE International Conference on Computer-Aided Design (ICCAD), November 2015.
Authors: Chien-Chung Ho, Yuan-Hao Chang, and Tei-Wei Kuo

The growing popularity of embedded Multi-Media Controllers (eMMCs) presents a unique opportunity to design solid-state drives of commodity products. This work addresses the essential design issues of such drives and introduces a light-weighted FTL design. In particular, access patterns to an eMMC-enabled solid-state drive are reshaped to accommodate the characteristics of eMMCs. That is, accesses to such a drive are reshaped to create sequential access patterns and writes of specific sizes preferred to eMMCs without resorting to some ordinary address translation design of FTL. In the meantime, garbage collection overheads should be minimized with reliability considerations, where eMMCs are usually not of a powerful controller of a sophisticated design. The capability of the proposed design is evaluated by a series of experiments, for which we have very encouraging results.
"A Light-Weighted Software-Controlled Cache for PCM-based Main Memory Systems," ACM/IEEE International Conference on Computer-Aided Design (ICCAD), November 2015.
Authors: Hung-Sheng Chang, Yuan-Hao Chang, Tei-Wei Kuo, and and Hsiang-Pang Li

The replacement of DRAM with non-volatile memory relies on solutions to resolve the wear leveling and slow write problems. Different from the past work in compiler-assisted optimization or joint DRAM-PCM management strategies, we explore a light-weighted software-controlled DRAM cache design for the non-volatile-memory-based main memory. The run-time overheads in the management of the DRAM cache is minimized by utilizing the information from a miss of the translation lookaside buffer (TLB) or the cache. Experiments were conducted based on a series of the well-known benchmarks to evaluate the effectiveness of the proposed design, for which the results are very encouraging.
"Spatio-Temporal Learning of Basketball Offensive Strategies," 2015 ACM Multimedia Conference, October 2015.
Authors: Ching-Hang Chen, Tyng-Luh Liu, Yu-Shuen Wang, Hung-Kuo Chu, Nick C. Tang, Hong-Yuan Mark Liao

Video-based group behavior analysis is drawing attention to its rich applications in sports, military, surveillance and biological observations. The recent advances in tracking techniques, based on either computer vision methodology or hardware sensors, further provide the opportunity of better solving this challenging task. Focusing specically on the analysis of basketball oensive strategies, we introduce a systematic approach to establishing unsupervised modeling of group behaviors. In view that a possible group behavior (oensive strategy) could be of dierent duration and represented by dynamic player trajectories, the crux of our method is to automatically divide training data into meaningful clusters and learn their respective spatio-temporal model, which is established upon Gaussian mixture regression to account for intra-class spatio-temporal variations. The resulting strategy representation turns out to be exible that can be used to not only establish the discriminant functions but also improve learning the models. We demonstrate the usefulness of our approach by exploring its eectiveness in analyzing a set of given basketball video clips.
"System Combination for Machine Translation through Paraphrasing," Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), September 2015.
Authors: Wei-Yun Ma and Kathleen McKeown

In this paper, we propose a paraphrasing model to address the task of system combination for machine translation. We dynamically learn hierarchical paraphrases from target hypotheses without any syntactic annotations and form a synchronous context-free grammar to guide a series of transformations of target hypotheses into fused translations. The model is able to exploit phrasal and structural system-weighted consensus and also to utilize existing information about word ordering present in the target hypotheses. In addition, to consider a diverse set of plausible fused translations, we develop a hybrid combination architecture, where we paraphrase every target hypothesis to obtain a fused translation for each target, and then make the final selection among all fused translations. Our experimental results show that our approach can achieve a significant improvement over combination baselines.
Current Research Results
"Extractive broadcast news summarization leveraging recurrent neural network language modeling techniques," IEEE/ACM Transactions on Audio, Speech, and Language Processing, August 2015.
Authors: Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang, Ea-Ee Jan, Wen-Lian Hsu, Hsin-Hsi Chen

Extractive text or speech summarization manages to select a set of salient sentences from an original document and concatenate them to form a summary, enabling users to better browse through and understand the content of the document. A recent stream of research on extractive summarization is to employ the language modeling (LM) approach for important sentence selection, which has proven to be effective for performing speech summarization in an unsupervised fashion. However, one of the major challenges facing the LM approach is how to formulate the sentence models and accurately estimate their parameters for each sentence in the document to be summarized. In view of this, our work in this paper explores a novel use of recurrent neural network language modeling (RNNLM) framework for extractive broadcast news summarization. On top of such a framework, the deduced sentence models are able to render not only word usage cues but also long-span structural information of word co-occurrence relationships within broadcast news documents, getting around the need for the strict bag-of-words assumption. Furthermore, different model complexities and combinations are extensively analyzed and compared. Experimental results demonstrate the performance merits of our summarization methods when compared to several well-studied state-of-the-art unsupervised methods.
"Linguistic Template Extraction for Recognizing Reader-Emotion and Emotional Resonance Writing Assistance," The 53rd Annual Meeting of the Association for Computational Linguistics and the 7rd International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2015), July 2015.
Authors: Yung-Chun Chang, Cen-Chieh Chen, Yu-Lun Hsieh, Chien Chin Chen, Wen-Lian Hsu

In this paper, we propose a flexible principle-based approach (PBA) for reader-emotion classification and writing assistance. PBA is a highly automated process that learns emotion templates from raw texts to characterize an emotion and is comprehensible for humans. These templates are adopted to predict reader-emotion, and may further assist in emotional resonance writing. Results demonstrate that PBA can effectively detect reader-emotions by exploiting the syntactic structures and semantic associations in the context, thus outperforming wellknown statistical text classification methods and the state-of-the-art reader-emotion classification method. Moreover, writers are able to create more emotional resonance in articles under the assistance of the generated emotion templates. These templates have been proven to be highly interpretable, which is an attribute that is difficult to accomplish in traditional statistical methods.
Current Research Results
Authors: Kuo, C. Y., Chen, C. H., Chen, S. H., Lu, I. H., Lu, Huang, L. C., Lin, C. Y., Lin, Chen, C. Y., Lo, H. F., Jeng, S. T., Chen, L. F. O.

Agarwood, a heartwood derived from Aquilaria trees, is a valuable commodity that has seen prevalent use among many cultures. In particular, it is widely used in herbal medicine and many compounds in agarwood are known to exhibit medicinal properties. Although there exists much research into medicinal herbs and extraction of high value compounds, few have focused on increasing the quantity of target compounds through stimulation of its related pathways in this species.
In this study, we observed that cucurbitacin yield can be increased through the use of different light conditions to stimulate related pathways and conducted three types of high-throughput sequencing experiments in order to study the effect of light conditions on secondary metabolism in agarwood. We constructed genome-wide profiles of RNA expression, small RNA, and DNA methylation under red light and far-red light conditions. With these profiles, we identified a set of small RNA which potentially regulates gene expression via the RNA-directed DNA methylation pathway.
We demonstrate that light conditions can be used to stimulate pathways related to secondary metabolism, increasing the yield of cucurbitacins. The genome-wide expression and methylation profiles from our study provide insight into the effect of light on gene expression for secondary metabolism in agarwood and provide compelling new candidates towards the study of functional secondary metabolic components.
Reference website:
Current Research Results
"Court Reconstruction for Camera Calibration in Broadcast Basketball Videos," IEEE Transactions on Visualization and Computer Graphics, To Appear.
Authors: P. C. Wen, W. C. Cheng, Y. S. Wang, H. K. Khu, Nick C. Tang, and H. Y. Mark Liao

We introduce a technique of calibrating camera motions in basketball videos. Our method particularly transforms player positions to standard basketball court coordinates and enables applications such as tactical analysis and semantic basketball video retrieval. To achieve a robust calibration, we reconstruct the panoramic basketball court from a video, followed by warping the panoramic court to a standard one. As opposed to previous approaches, which individually detect the court lines and corners of each video frame, our technique considers all video frames simultaneously to achieve calibration; hence, it is robust to illumination changes and player occlusions. To demonstrate the feasibility of our technique, we present a stroke-based system that allows users to retrieve basketball videos. Our system tracks player trajectories from broadcast basketball videos. It then rectifies the trajectories to a standard basketball court by using our camera calibration method. Consequently, users can apply stroke queries to indicate how the players move in gameplay during retrieval. The main advantage of this interface is an explicit query of basketball videos so that unwanted outcomes can be prevented. We show the results in Figures 1, 7, 9, 10 and our accompanying video to exhibit the feasibility of our technique.
"On Relaxing Page Program Disturbance over 3D MLC Flash Memory," ACM/IEEE International Conference on Computer-Aided Design (ICCAD), November 2015.
Authors: Yu-Ming Chang, Yung-Chun Li, Yuan-Hao Chang, Tei-Wei Kuo, Chih-Chang Hsieh, and Hsiang-Pang Li

With the rapidly-increasing capacity demand over flash memory, 3D NAND flash memory has drawn tremendous attention as a promising solution to further reduce the bit cost and to increase the bit density. However, such advanced 3D devices will suffer more intensive program disturbance, compared to 2D NAND flash memory. Especially when multi-level-cell (MLC) technology is adopted, the deteriorated disturbance due to the program operations of intra and inter pages will become even more critical for reliability. In contrast to the past efforts that try to resolve the reliability issue with error correction codes or hardware designs, this work seeks for the redesign of the program operation. A disturb-aware programming scheme is proposed to not only relax the disturbance induced by slow cells as much as possible but also reduce the possibility in requiring a high voltage to program the slow cells. A series of experiments was conducted based on real 3D MLC flash chips, and the results demonstrate that the proposed scheme is extremely effective on reducing the disturbance as well as the bit error rate.
"How to Improve the Space Utilization of Dedup-based PCM Storage Devices," ACM/IEEE International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), October 2015.
Authors: Chun-Ta Lin, Yuan-Hao Chang, Tei-Wei Kuo, Hung-Sheng Chang, and Hsiang-Pang Li

There is a growing demand to introduce more and more intelligence to storage devices in recent years, especially with the rapid increasing of hardware computing power. This paper exploits essential design issues in space utilization for dedup-based non-volatile phase-change memory (PCM). We explore the adoption of data duplication techniques to reduce potential data duplicates over PCM storage devices to provide more storage space than the physical storage space does. Among various data deduplication techniques, variable-sized chunking is considered in less cost-effective PCM-based storage devices because variable-sized chunking has better data deduplication capability than fixed-sized chunking. However, in a typical system architecture, data are written or updated in the fixed management units (e.g., LBAs). Thus, to ultimately improve the space utilization of PCM-based storage device, the technical problem  falls on (1) how to map fixed-sized LBAs to variable-sized chunks and (2) how to efficiently manage (i.e., allocated and deallocate) free PCM storage space for variable-sized chunks. In this work, we propose a free space manager, called container-based space manager, to resolve the above two issues by exploiting the fact that (1) a storage system initially has more free space to relax the complexity on space management and (2) the space optimization of a storage system can grow with the time when it contains more and more data. The proposed design is evaluated over popular benchmarks, for which we have very encouraging results.
"Constant-Round Concurrent Zero-knowledge from Indistinguishability Obfuscation," The 35th International Cryptology Conference (CRYPTO), August 2015.
Authors: Kai-Min Chung and Huijia Lin and Rafael Pass

We present a constant-round concurrent zero-knowledge protocol for NP. Our protocol relies on the existence of families of collision-resistant hash functions, one-way permutations, and indistinguishability obfuscators for P poly (with slightly super-polynomial security).

Current Research Results
"A Proximal Method for Dictionary Updating in Sparse Representations," IEEE Transactions on Signal Processing, August 2015.
Authors: Guan-Ju Peng and Wen-Liang Hwang

Guan-Ju PengWen-LiangHwangAbstract:
In this paper, we propose a new dictionary updating method for sparse dictionary learning. Our method imposes the $ell_0$ norm constraint on coefficients as well as a proximity regularization on the distance of dictionary modifications in the dictionary updating process. We show that the derived dictionary updating rule is a generalization of the K-SVD method. We study the convergence and the complexity of the proposed method. We also compare its performance with that of other methods.
"Predicting Winning Price in Real Time Bidding with Censored Data," 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2015), August 2015.
Authors: Wush Chi-Hsuan Wu, Mi-Yen Yeh, and Ming-Syang Chen

In the aspect of a Demand-Side Platform (DSP), which is the agent of advertisers, we study how to predict the winning price such that the DSP can win the bid by placing a proper bidding value in the real-time bidding (RTB) auction. We propose to leverage the machine learning and statistical methods to train the winning price model from the bidding history. A major challenge is that a DSP usually suers from the censoring of the winning price, especially for those lost bids in the past. To solve it, we utilize the censored regression model, which is widely used in the survival analysis and econometrics, to t the censored bidding data. Note, however, the assumption of censored regression does not hold on the real RTB data. As a result, we further propose a mixture model, which combines linear regression on bids with observable winning prices and censored regression on bids with the censored winning prices, weighted by the winning rate of the DSP. Experiment results show that the proposed mixture model in general prominently outperforms linear regression in terms of the prediction accuracy.
"Large-Scale Secure Computation: Multi-party Computation for (Parallel) RAM Programs," The 35th International Cryptology Conference (CRYPTO), August 2015.
Authors: Elette Boyle and Kai-Min Chung and Rafael Pass

We present the first efficient (i.e., polylogarithmic overhead) method for securely and privately processing large data sets over multiple parties with parallel, distributed algorithms. More specifically, we demonstrate load-balanced, statistically secure computation protocols for computing Parallel RAM (PRAM) programs, handling (1/3−\epsilon) fraction malicious players, while preserving up to polylogarithmic factors the computation and memory complexities of the PRAM program, aside from a one-time execution of a broadcast protocol per party. Additionally, our protocol has polylog communication locality—that is, each of the n parties speaks only with polylog(n) other parties.
"Energy Stealing - An Exploration into Unperceived Activities on Mobile Systems," ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), July 2015.
Authors: Chi-Hsuan Lin, Yu-Ming Chang, Pi-Cheng Hsiu, and Yuan-Hao Chang

Understanding the implications in smartphone usage and the power breakdown among hardware components has led to various energy-efficient designs for mobile systems. While energy consumption has been extensively explored, one critical dimension is often overlooked - unperceived activities that could steal a significant amount of energy behind users' back potentially. In this paper, we conduct the first exploration of unperceived activities in mobile systems. Specifically, we design a series of experiments to reveal, characterize, and analyze unperceived activities invoked by popular resident applications when an Android smartphone is left unused. We draw possible solutions inspired by the exploration and demonstrate that even an immediate remedy can mitigate energy dissipation to some extent.


Academia Sinica 資訊科學研究所 Academia Sinica