Institute of Information Science
Current Research Results
"SPIRIT: A Tree Kernel-based Method for Topic Person Interaction Detection," IEEE Transactions on Knowledge and Data Engineering (TKDE), May To Appear.
Authors: Yung-Chun Chang, Chien Chin Chen, and Wen-Lian Hsu

The development of a topic in a set of topic documents is constituted by a series of person interactions at a specific time and place. Knowing the interactions of the persons mentioned in these documents is helpful for readers to better comprehend the documents. In this paper, we propose a topic person interaction detection method called SPIRIT, which classifies the text segments in a set of topic documents that convey person interactions. We design the rich interactive tree structure to represent syntactic, context, and semantic information of text, and this structure is incorporated into a tree-based convolution kernel to identify interactive segments. Experiment results based on real world topics demonstrate that the proposed rich interactive tree structure effectively detects the topic person interactions and that our method outperforms many well-known relation extraction and protein-protein interaction methods.
"When Social Influence Meets Item Inference," ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM KDD), August 2016.
Authors: H.-J. Hung, H.-H. Shuai, D.-N. Yang, L.-H. Huang, W.-C. Lee, J. Pei, and M.-S. Chen

Research issues and data mining techniques for product recommendation and viral marketing have been widely studied. Existing works on seed selection in social networks do not take into account the effect of product recommendations in e-commerce stores. In this paper, we investigate the seed selection problem for viral marketing that considers both effects of item inference recommendation and social influence. We develop a new model, Social Item Graph (SIG), that captures both effects in form of hyperedges and learn the corresponding probability by EMS framework. Accordingly, we formulate a new optimization problem, called Social Item Maximization Problem (SIMP), to simultaneously exploit influence from friends and recommendations from stores for effective viral marketing. We prove that SIMP is inapproximable within $n^{c} $ for any $c<1$, and design an efficient $n$-approximation algorithm. Experimental results demonstrate the accuracy of the proposed model and show the effectiveness and efficiency of the proposed algorithm.
"Routing and Scheduling of Social Influence Diffusion in Online Social Networks," IEEE International Conference on Distributed Computing Systems (IEEE ICDCS), June 2016.
Authors: H.-J. Hung, D.-N. Yang, and W.-C. Lee

Owing to the rising popularity of online social networking services (OSNs), studies on social influence and its diffusion have received significant attention from the research community. Prior research has mostly studied the impact of single-hop influence diffusion and multi-hop influence broadcast in online social networks. Very little research explores the idea of guiding (routing) multi-hop social influence towards a specific target. In this paper, we motivate the needs of timely routing social influence to formulate a new optimization problem, namely, Routing And Scheduling of Target-Oriented Social Influence Diffusion (RAS-TOSID). Accordingly, we propose the Efficient Routing And Scheduling with Social and Temporal decOmposition (ERASSTO) algorithm, which finds the optimal solution to RAS-TOSID in polynomial time. We carry out a user study by implementing ERASSTO in Facebook and conduct a comprehensive evaluation on ERASSTO and alternative approaches by simulation. The result shows that ERASSTO significantly outperforms other algorithms regarding solution quality and computational efficiency.
Current Research Results
"Disturbance Relaxation for 3D Flash Memory," IEEE Transactions on Computers (TC), May 2016.
Authors: Yu-Ming Chang, Yuan-Hao Chang, Tei-Wei Kuo, Yung-Chun Li, and Hsiang-Pang Li

Even though 3D flash memory presents a grand opportunity to huge-capacity non-volatile memory, it suffers from serious program disturbance problems. In contrast to the past efforts in error correction codes and the work in trading the space utilization for reliability, we propose a disturbance-relaxation scheme that can alleviate the negative effects caused by program disturbance inside a physical block. This scheme does not introduce any extra overheads on encoding or storing of extra redundant data. In particular, a methodology is proposed to reduce the data error rate by distributing unavoidable disturbance errors to the flash-memory space of invalid data, with the considerations of the physical organization of 3D flash memory. A series of experiments was conducted based on real multi-layer 3D flash chips, and it showed that the proposed scheme could significantly enhance the reliability of 3D flash memory.
Current Research Results
"Reducing Data Migration Overheads of Flash Wear Leveling in a Progressive Way," IEEE Transactions on Very Large Scale Integration Systems (TVLSI), May 2016.
Authors: Ming-Chang Yang, Yuan-Hao Chang, Tei-Wei Kuo, and Fu-Hsin Chen

As the endurance of flash memory keeps deteriorating, exploiting wear leveling techniques to improve the lifetime/endurance of flash memory has become a critical issue in the design of flash storage devices. Nevertheless, the deteriorated access performance of high-density flash memory makes the performance overheads introduced by wear leveling non-negligible. In particular, the existing wear-leveling designs usually aggressively distribute the erases to all flash blocks evenly in a regular basis. As a result, a lot of non-negligible unnecessary data migrations would be imposed in the early stages of the device lifespan, and would be further exacerbated if a wear leveling design selects improper victim blocks for erases. In contrast to the existing wear-leveling approaches, we propose a progressive wear leveling design to perform wear leveling in a ``progressive'' way to prevent any block from being worn out prematurely with minimized performance overheads caused by the unnecessary data migration. The experiments were conducted based on representative realistic workloads to evaluate the efficacy of the proposed design. The results reveal that instead of sacrificing the device lifetime, performing wear leveling in such a progressive way can not only minimize the performance overheads but also have potentials to extend the device lifetime.
Current Research Results
"Optimal Time-Convex Hull for a Straight-Line Highway in Lp-Metrics," Computational Geometry: Theory and Applications, February 2016.
Authors: B.-S. Dai, Mong-Jen Kao and D.T. Lee

Der- TsaiLeeAbstract:
We consider the problem of computing the time-convex hull of a point set under the general Lp -metric in the presence of a straight-line highway in the plane. The traveling speed along the highway is assumed to be faster than that off the highway, and the shortest time-path between a distant pair may involve traveling along the highway. The time-convex hull TCH(P)of a point set P is the smallest set containing both P and all shortest time-paths between any two points in TCH(P). In this paper we give an algorithm that computes the time-convex hull under the Lp -metric in optimal O(nlogn) time for a given set of n points and a real number p with 1 ≤p ≤∞. 
Current Research Results
Authors: T. Mamie Lih, Wai-Kok Choong, Chen-Chun Chen, Cheng-Wei Cheng, Hsin-Nan Lin, Ching-Tai Chen, Hui-Yin Chang, Wen-Lian Hsu and Ting-Yi Sung

Ting-YiSungWen-LianHsuHui-YinChangChing-TaiCasterChenCheng-Wei ChengWai-Kok ChoongTung-ShingLihAbstract:
MAGIC-web is the first web server, to the best of our knowledge, that performs both untargeted and targeted analyses of mass spectrometry-based glycoproteomics data for site-specific N-linked glycoprotein identification. The first two modules, MAGIC and MAGIC+, are designed for untargeted and targeted analysis, respectively. MAGIC is implemented with our previously proposed novel Y1-ion pattern matching method, which adequately detects Y1- and Y0-ion without prior information of proteins and glycans, and then generates in silico MS 2 spectra that serve as input to a database search engine (e.g. Mascot) to search against a large-scale protein sequence database. On top of that, the newly implemented MAGIC+ allows users to determine glycopeptide sequences using their own protein sequence file. The third module, Reports Integrator, provides the service of combining protein identification results from Mascot and glycan-related information from MAGIC-web to generate a complete site-specific protein-glycan summary report. The last module, Glycan Search, is designed for the users who are interested in finding possible glycan structures with specific numbers and types of monosaccharides. The results from MAGIC, MAGIC+ and Reports Integrator can be downloaded via provided links whereas the annotated spectra and glycan structures can be visualized in the browser. MAGIC-web is accessible from
Current Research Results
Authors: Ya-Xuan Hung, Pei-Chen Huang, Kuan-Ta Chen, and Woeichyn Chu

Stroke is one of the most common causes of physical disability, and early, intensive, and repetitive rehabilitation exercises are crucial to the recovery of stroke survivors. Unfortunately, research shows that only one third of stroke patients actually perform recommended exercises at home, because of the repetitive and mundane nature of conventional rehabilitation exercises. Thus, to motivate stroke survivors to engage in monotonous rehabilitation is a significant issue in the therapy process. Game-based rehabilitation systems have the potential to encourage patients continuing rehabilitation exercises at home. However, these systems are still rarely adopted at patients' places. Discovering and eliminating the obstacles in promoting game-based rehabilitation at home is therefore essential. For this purpose, we conducted a study to collect and analyze the opinions and expectations of stroke patients and clinical therapists. The study is composed of two parts: (i) Rehab-preference survey: interviews to both patients and therapists to understand the current practices, challenges, and expectations on game-based rehabilitation systems, and (ii) Rehab-compatibility survey: a gaming experiment with therapists to elaborate what commercial games are compatible with rehabilitation. The study is conducted with 30 outpatients with stroke and 19 occupational therapists from two rehabilitation centers in Taiwan. Our surveys show that game-based rehabilitation systems can turn the rehabilitation exercises more appealing and provide personalized motivation for various stroke patients. Patients prefer to perform rehabilitation exercises with more diverse and fun games, and need cost-effective rehabilitation systems, which are often built on commodity hardware. Our study also sheds light on incorporating the existing design-for-fun games into rehabilitation system. We envision the results are helpful in developing a platform which enables rehab-compatible (i.e. existing, appropriately selected) games to be operated on commodity hardware and brings cost-effective rehabilitation systems to more and more patients' home for long-term recovery.
Current Research Results
"An Investigation of a Two-Tier Test Strategy in a University Calculus Course: Causes vs. Consequences," IEEE Transactions on Learning Technologies, To Appear.
Authors: Tzu Chi Yang, Sherry Y. Chen, Meng Chang Chen

Meng ChangChenAbstract:
Online tests have been identified as a core learning activity in higher education. Unlike conventional online tests, which cannot completely reflect students; learning status, two-tier tests not only consider students; answers, but also take into account reasons for their answers. Due to such significance, research into the two-tier test had mushroomed but few studies examined why the two-tier test approach was effective. To this end, we conducted an empirical study, where a lag sequential analysis was used to analyze behavior patterns while qualitative data from the questionnaire were applied to explain why these behavior patterns were happened. The results indicated students with a two-tier test tended to realize the rationale of a concept, instead of relying on their memories. In other words, the two-tier test can facilitate students to develop deep thinking skills. This may be because they considered the two-tier test as a learning tool, instead of an assessment tool only.
Current Research Results
"Capacity-independent Address Mapping for Flash Storage Devices with Explosively Growing Capacity," IEEE Transactions on Computers (TC), February 2016.
Authors: Ming-Chang Yang, Yuan-Hao Chang, Tei-Wei Kuo, and Po-Chun Huang

Address mapping for flash storage devices has been a challenging design issue for controllers because of rapidly growing device capacity. In contrast with existing mapping methods, this study proposes a capacity-independent address mapping method to decouple the required on-device RAM space from the capacity of a flash storage device. Especially, the required RAM size of the proposed method depends only on the user accessed data set, which is also referred to as the working set, while the page-level performance can be nearly achieved. In addition, a simple but practical wear-leveling design is proposed with the capability in lifetime estimation of flash storage devices. Experiments of the proposed scheme obtained encouraging results.
Current Research Results
"Compressed Sensing-Based Clone Identification in Sensor Network," IEEE Trans. on Wireless Communications, To Appear.
Authors: Chia-Mu Yu, Chun-Shien Lu, and Sy-Yen Kuo

Clone detection, aimed at detecting illegal copies with all of the credentials of legitimate sensor nodes, is of great importance for sensor networks because of the severe impact of clones on network operations, like routing, data collection, and key distribution. Various detection methods have been proposed, but most of them are communication-inefficient due to the common use of the witness-finding strategy. In view of the sparse characteristic of replicated nodes, we propose a novel clone detection framework, called CSI, based on a state-of-the-art signal processing technology, compressed sensing. Specifically, CSI bases its detection effectiveness on the compressed aggregation of sensor readings. Due to its consideration of data aggregation, CSI not only achieves the asymptotically lowest communication cost but also makes the network traffic evenly distributed over sensor nodes. In particular, they are achieved by exploiting the sparse property of the clones within the sensor network caused by the clone attack. The performance and security of CSI will be demonstrated by numerical simulations, analyses, and prototype implementation.
"Enabling Sub-blocks Erase Management to Boost the Performance of 3D NAND Flash Memory," ACM/IEEE Design Automation Conference (DAC), June 2016.
Authors: Tseng-Yi Chen, Yuan-Hao Chang, Chien-Chung Ho, and Shuo-Han Chen

3D NAND has been proposed to provide a large capacity storage with low-cost consideration due to its high density memory architecture. However, 3D NAND needs to consume enormous time for garbage collection because live-page copying overhead and long block erase time. To alleviate the impact of live-page copying on the performance of 3D NAND, a sub-block erase design has been designed.With sub-block erase design, this paper proposes a performance booster strategy to extremely boost the performance of garbage collection. As experimental results shows, the proposed strategy has a significant improvement on the average write response time.
"DigitSpace: Designing Thumb-To-Fingers Touch Input Interface for One-Handed and Eyes-Free Interactions," ACM SIGCHI Conference on Human Factors in Computing Systems (ACM CHI), March 2016.
Authors: D.-Y. Huang, S. Yang, F. Wang, R.-H. Liang, L. Chan, D.-N. Yang, Y.-P. Hung, B.-Y. Chen.

Thumb-to-fingers interfaces augment touch widgets on fingers, which are manipulated by the thumb. Such interfaces are ideal for one-handed eyes-free input since touch widgets on the fingers enable easy access by the stylus thumb. This study presents DigitSpace, a thumb-to-fingers interface that addresses two ergonomic problems: hand anatomy and touch precision. Hand anatomy restricts possible movements of a thumb, which further influences the physical comfort during the interactions. Touch precision is a human factor that determines how precisely users can manipulate touch widgets set on fingers, which determines effective layouts of the widgets. Buttons and touchpads were considered in our studies to enable discrete and continuous input in an eyes-free manner. The first study explores the regions of fingers where the interactions can be comfortably performed. According to the comfort regions, the second and third studies explore effective layouts for button and touchpad widgets. The experimental results indicate that participants could discriminate at least 16 buttons on their fingers. For touchpad, participants were asked to perform unistrokes. Our results revealed that since individual participant performed a coherent writing behavior, personalized $1 recognizers could offer 92% accuracy on a cross-finger touchpad. A series of design guidelines are proposed for designers, and a DigitSpace prototype that uses magnetic-tracking methods is demonstrated
"Precise Player Segmentation in Team Sports Videos Using Contrast-Aware Co-segmentation," IEEE International Conference on Acoustics, Speech, and Signal Processing, March 2016.
Authors: T.Y. Tsai, Y. Y. Lin, H.Y. Mark Liao, and S. K. Jeng

Player segmentation in team sports videos is challenging but crucial to video semantic understanding, such as player interaction identification and tactic analysis. We leverage the appearance similarity among players of the same team, and cast this task as a co-segmentation problem. In this way, the extra knowledge shared across players significantly reduces unfavorable uncertainty in segmenting individual players. We are also aware that the performance of co-segmentation highly depends on the used features, and further propose a contrastbased approach to estimate the discriminant power of each feature in an unsupervised manner. It turns out that our approach can properly fuse features by assigning higher weights to discriminant ones, and result in remarkable performance gains. The promising results on segmenting basketball players manifest the effectiveness of our approach.
Current Research Results
Authors: Hui-Yin Chang, Ching-Tai Chen, T. Mamie Lih, Ke-Shiuan Lynn, Chiun-Gung Juo, Wen-Lian Hsu, Ting-Yi Sung

Efficient and accurate quantitation of metabolites from LC-MS data has become an important topic. Here we present an automated tool, called iMet-Q (intelligent Metabolomic Quantitation), for label-free metabolomics quantitation from high-throughput MS1 data. By performing peak detection and peak alignment, iMet-Q provides a summary of quantitation results and reports ion abundance at both replicate level and sample level. Furthermore, it gives the charge states and isotope ratios of detected metabolite peaks to facilitate metabolite identification. An in-house standard mixture and a public Arabidopsis metabolome data set were analyzed by iMet-Q. Three public quantitation tools, including XCMS, MetAlign, and MZmine 2, were used for performance comparison. From the mixture data set, seven standard metabolites were detected by the four quantitation tools, for which iMet-Q had a smaller quantitation error of 12% in both profile and centroid data sets. Our tool also correctly determined the charge states of seven standard metabolites. By searching the mass values for those standard metabolites against Human Metabolome Database, we obtained a total of 183 metabolite candidates. With the isotope ratios calculated by iMet-Q, 49% (89 out of 183) metabolite candidates were filtered out. From the public Arabidopsis data set reported with two internal standards and 167 elucidated metabolites, iMet-Q detected all of the peaks corresponding to the internal standards and 167 metabolites. Meanwhile, our tool had small abundance variation (≤0.19) when quantifying the two internal standards and had higher abundance correlation (≥0.92) when quantifying the 167 metabolites. iMet-Q provides user-friendly interfaces and is publicly available for download at
Current Research Results
Authors: Chih-Hao Fang, Yu-Jung Chang, Wei-Chun Chung, Ping-Heng Hsieh, Chung-Yen Lin and Jan-Ming Ho

We developed two subset selection methods for single-end reads and a method for paired-end reads based on base quality scores and other read analytic tools using the MapReduce framework. We proposed two strategies to select reads: MinimalQ and ProductQ. MinimalQ selects reads with minimal base-quality above a threshold. ProductQ selects reads with probability of no incorrect base above a threshold. In the single-end experiments, we used Escherichia coli and Bacillus cereus datasets of MiSeq, Velvet assembler for genome assembly, and GAGE benchmark tools for result evaluation. In the paired-end experiments, we used the giant grouper (Epinephelus lanceolatus) dataset of HiSeq, ALLPATHS-LG genome assembler, and QUAST quality assessment tool for comparing genome assemblies of the original set and the subset. The results show that subset selection not only can speed up the genome assembly but also can produce substantially longer scaffolds. Availability: The software is freely available at
Current Research Results
"Efficient Victim Block Selection for Flash Storage Devices," IEEE Transactions on Computers (TC), December 2015.
Authors: Che-Wei Tsao, Yuan-Hao Chang, Ming-Chang Yang, and Po-Chun Huang

Motivated by the needs to enhance the performance of garbage collection in low-cost flash storage devices, we propose a victim block selection design to efficiently identify the blocks for erases and reclaim the space of invalid data without extensively scanning flash memory for the data status stored in the storage, so as to improve the garbage collection performance on reclaiming the space of invalid data. Moreover, this design could easily identify and reclaim the space released by file systems. Experiments based on benchmark traces show significant performance improvement of garbage collection with limited system overheads.
Current Research Results
"An Acoustic-Phonetic Model of F0 Likelihood for Vocal Melody Extraction," IEEE/ACM Transactions on Audio, Speech, and Language Processing, September 2015.
Authors: Yu-Ren Chien, Hsin-Min Wang, and Shyh-Kang Jeng

This paper presents a novel approach to extraction of vocal melodies from accompanied singing recordings. Central to our approach is a model of vocal fundamental frequency (F0) likelihood that integrates acoustic-phonetic knowledge and real-world data. This model consists of a timbral fitness score and a loudness measure of each F0 candidate. Timbral fitness is measured for the partial amplitudes of an F0 candidate, with respect to a small set of vocal timbre examples. This F0-specific measurement of timbral fitness depends on an acoustic-phonetic F0 modification of each timbre example. In the loudness part of the likelihood model, sinusoids are detected, tracked, and pruned to give loudness values that minimize interference from the accompaniment. A final F0 estimate is determined by a prior model of F0 sequence in addition to the likelihood model. Melody extraction is completed by detecting voiced time positions according to the singing voice loudness variations given by the estimated F0 sequence. The numerical parameters involved in our approach were optimized on three development sets from different sources before the system was evaluated on ten test sets separate from these development sets. Controlled experiments show that use of the timbral fitness score accounts for a 13% difference in overall accuracy.
Current Research Results
"Spatial-Proximity Optimization for Rapid Task Group Deployment," ACM Trans. on Knowledge Discovery from Data, To Appear.
Authors: C.-Y. Shen, D.-N. Yang, W.-C. Lee, and M.-S. Chen

Spatial proximity is one of the most important factors for the quick deployment of the task groups in various time-sensitive missions. This paper proposes a new spatial query, Spatio-Social Team Query (SSTQ), that forms a strong task group by considering 1) the group’s spatial distance (i.e., transportation time), 2) skills of the candidate group members, and 3) social rapport among the candidates. Efficient processing of SSTQ is very challenging, because the aforementioned spatial, skill, and social factors need to be carefully examined. In this paper, therefore, we first formulate two subproblems of SSTQ, namely Hop-Constrained Team Problem (HCTP) and Connection-Oriented Team Query (COTQ). HCTP is a decision problem that considers only social and skill dimensions. We prove that HCTP is NP-Complete. Moreover, based on the hardness of HCTP, we prove that SSTQ is NP-Hard and inapproximable within any factor. On the other hand, COTQ is a special case of SSTQ that relaxes the social constraint. We prove that COTQ is NP-Hard and propose an approximation algorithm for COTQ, COTprox, that achieves the best approximation ratio. Furthermore, based on the observations on COTprox, we devise an approximation algorithm, SSTprox, with a guaranteed error bound for SSTQ. Finally, to efficiently obtain the optimal solution to SSTQ for small instances, we design two efficient algorithms, SpatialFirst and SkillFirst, with different scenarios in mind. These two algorithms incorporate various effective ordering and pruning techniques to reduce the search space for answering SSTQ. Experimental results on real datasets indicate that the proposed algorithms can efficiently answer SSTQ under various parameter settings.
Current Research Results
"Cooperative Multicasting in Renewable Energy Enhanced Relay Networks - Expending More Power to Save Energy," IEEE Trans. on Wireless Communications, To Appear.
Authors: S.-Y. Lee, C.-Y. Liu, M.-K. Chang, D.-N. Yang, and Y.-W. Hong

Abstract—Power and on–off control problems are examined for renewable energy enabled base-stations (BSs) and relay nodes (RNs) in cooperative multicast networks. Renewable energy is utilized at BSs and RNs to reduce the overall grid energy cost. By considering a practical energy consumption model and the statistics of the renewable energy arrival, the optimal transmit powers are first determined by minimizing the expected grid energy consumption subject to an average outage probability constraint at MUs. The optimal solution is found via line search in the general case and is obtained in closed-form at high SNR. In addition, an on–off control policy is also proposed to further reduce the basic operational energy costs. The joint on–off and power control problems are solved approximately using two sequential deflation techniques, namely, the subset-search and the convex-relaxation-based approaches. The power control problem is also extended to the multicarrier scenario with unequal transmit powers and is solved using successive convex approximation. Simulations using the photovoltaic energy arrival model are provided to demonstrate the effectiveness of the proposed schemes. The results show that expending more power at RNs allows for more efficient use of renewable energy and, thus, increases energy-savings.
"Multicast Traffic Engineering for Software-Defined Networks," IEEE Conference on Computer Communications (IEEE INFOCOM), April 2016.
Authors: L.-H. Huang, H.-C. Hsu, S.-H. Shen, D.-N. Yang, and W. T. Chen,

Although Software-Defined Networking (SDN) enables flexible network resource allocations for traffic engineering, current literature mostly focuses on unicast communications. Compared to traffic engineering for multiple unicast flows, multicast traffic engineering for multiple trees is very challenging not only because minimizing the bandwidth consumption of a single multicast tree by solving the Steiner tree problem is already NP-Hard, but the Steiner tree problem does not consider the link capacity constraint for multicast flows and node capacity constraint to store the forwarding entries in Group Table of OpenFlow. In this paper, therefore, we first study the hardness results of scalable multicast traffic engineering in SDN. We prove that scalable multicast traffic engineering with only the node capacity constraint is NPHard and not approximable within delta, which is the number of destinations in the largest multicast group. We then prove that scalable multicast traffic engineering with both the node and link capacity constraints is NP-Hard and not approximable within any ratio. To solve the problem, we design a delta-approximation algorithm, named Multi-Tree Routing and State Assignment Algorithm (MTRSA), for the first case and extend to the general multicast traffic engineering problem. The simulation results demonstrate that the solutions obtained by the proposed algorithm are more bandwidth-efficient and scalable than the shortest-path trees and Steiner trees. Most importantly, MTRSA is computation-efficient and can be deployed in SDN since it can generate the solution on massive networks in a short time.
"Mining Online Social Data for Detecting Social Network Mental Disorders," International World Wide Web Conference (WWW), April 2016.
Authors: H.-H. Shuai, C.-Y. Shen, D.-N. Yang, Y.-F. Lan, W.-C. Lee, P. S. Yu, and M.-S. Chen

An increasing number of social network mental disorders (SNMDs), such as Cyber-Relationship Addiction, Information Overload, and Net Compulsion, have been recently noted. Symptoms of these mental disorders are usually observed passively today, resulting in delayed clinical intervention. In this paper, we argue that mining online social behavior provides an opportunity to actively identify SNMDs at an early stage. It is challenging to detect SNMDs because the mental factors considered in standard diagnostic criteria (questionnaire) cannot be observed from online social activity logs. Our approach, new and innovative to the practice of SNMD detection, does not rely on self-revealing of those mental factors via questionnaires. Instead, we propose a machine learning framework, namely, Social Network Mental Disorder Detection (SNMDD), that exploits features extracted from social network data to accurately identify potential cases of SNMDs. We also exploit multi-source learning in SNMDD and propose a new SNMDbased Tensor Model (STM) to improve the performance. Our framework is evaluated via a user study with 3126 online social network users. We conduct a feature analysis, and also apply SNMDD on large-scale datasets and analyze the characteristics of the three SNMD types. The results show that SNMDD is promising for identifying online social network users with potential SNMDs.
"Optimizing Control Transfer and Memory Virtualization in Full System Emulators," Eruopean Network of Excellence on High-Performance and Embedded Architecture and Compilation (HiPEAC), January 2016.
Authors: Chun-Chen Hsu, Ding-Yong Hong, Cheng-Yi Chou, Jan-Jan Wu, Wei-Chung Hsu, and Pangfeng Liu

Full system emulators provide virtual platforms for several important applications, such as kernel and system software development, co-verification with cycle accurate CPU simulators, or application development for hardware still in development. Full system emulators usually use dynamic binary translation to obtain reasonable performance. This paper focuses on optimizing the performance of full system emulators. First, we optimize performance by enabling classic control transfer optimizations of dynamic binary translation in full system emulation, such as indirect branch target caching and block chaining. Second, we improve the performance of memory virtualization of cross-ISA virtual machines by improving the efficiency of the software translation lookaside buffer (software TLB). We implement our optimizations on QEMU, an industrial-strength full system emulator, along with the Android emulator. Experimental results show that our optimizations achieve an average speedup of 1.98X for ARM-to-X86-64 QEMU running SPEC CINT2006 benchmarks with train inputs. Our optimizations also achieve an average speedup of 1.44X and 1.40X for IA32-to-X86-64 QEMU and AArch64-to-X86-64 QEMU on SPEC CINT2006. We use a set of real applications downloaded from Google Play as benchmarks for the Android emulator. Experimental results show that our optimizations achieve an average speedup of 1.43X for the Android emulator running these applications.
Current Research Results
Authors: Tong-Yen Tsai, Shih-Ying Wu, Cheng-Han Chung, Pang-Yen Yang, Shin-Tang Su, Yu-Hsuan Tseng, Tong-Cheng Wang, Wen-Hsiung Li, Arthur Chun-Chieh Shih*, and Kuo-I Lin*

Kuo-I LinArthur Chun-ChiehShihAbstract:
Using genome-wide approaches, we studied the microRNA (miRNA) expression profile during human plasma cell (PC) differentiation induced by stimulation of human blood B cells with T follicular helper cell–dependent signals. Combining the profiles of differentially expressed genes in PC differentiation with gene ontology (GO) analysis revealed that a significant group of genes involved in the transcription factor (TF) activity was preferentially changed. We thus focused on studying the effects of differentially expressed miRNAs on several key TFs in PC differentiation. Cohorts of differentially expressed miRNAs cooperating as miRNA hubs were predicted and validated to modulate key TFs, including a down-regulated miRNA hub containing miR-101-3p, -125b-5p, and -223-3p contributing to induction of PRDM1 as well as an up-regulated miRNA hub containing miR-34a-5p, -148a-3p, and -183-5p suppressing BCL6, BACH2, and FOXP1. Induced expression of NF-κB and PRDM1 during PC differentiation controlled the expression of up- and down-regulated miRNA hubs, respectively. Co-expression of miR-101-3p, -125b-5p, and -223-3p in stimulated B cells showed synergistic effects on inhibition of PC formation, which can be rescued by re-introduction of PRDM1. Together, we catalogue the complex roadmap of miRNAs and their functional interplay in collaboratively directing PC differentiation.
Current Research Results
Authors: Wai-Kok Choong, Hui-Yin Chang, Ching-Tai Chen, Chia-Feng Tsai, Wen-Lian Hsu, Yu-Ju Chen, and Ting-Yi Sung

Protein experiment evidence at protein level from mass spectrometry and antibody experiments are essential to characterize the human proteome. neXtProt (2014-09 release) reported 20 055 human proteins, including 16 491 proteins identified at protein level and 3564 proteins unidentified. Excluding 616 proteins at uncertain level, 2948 proteins were regarded as missing proteins. Missing proteins were unidentified partially due to MS limitations and intrinsic properties of proteins, for example, only appearing in specific diseases or tissues. Despite such reasons, it is desirable to explore issues affecting validation of missing proteins from an “ideal” shotgun analysis of human proteome. We thus performed in silico digestions on the human proteins to generate all in silico fully digested peptides. With these presumed peptides, we investigated the identification of proteins without any unique peptide, the effect of sequence variants on protein identification, difficulties in identifying olfactory receptors, and highly similar proteins. Among all proteins with evidence at transcript level, G protein-coupled receptors and olfactory receptors, based on InterPro classification, were the largest families of proteins and exhibited more frequent variants. To identify missing proteins, the above analyses suggested including sequence variants in protein FASTA for database searching. Furthermore, evidence of unique peptides identified from MS experiments would be crucial for experimentally validating missing proteins.


Academia Sinica 資訊科學研究所 Academia Sinica