Institute of Information Science, Academia Sinica



Press Ctrl+P to print from browser

Recent Research Results


REFROM: Responsive, Energy-efficient Frame Rendering for Mobile Devices

ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), August 2023

Tsung-Yen Hsu, Yi-Shen Chen, Yun-Chih Chen, Yuan-Hao Chang, and Tei-Wei Kuo

Yuan-Hao Chang


The increasing demand for high-quality graphics on mobile devices necessitates a high frame rate for display refresh. However, current process scheduling and memory management policies fail to consider the computation demands of frame rendering because they are optimized for saving energy and resource utilization. This leads to unresponsive displays for mobile users due to rendering delays. Accurately estimating computation demands is challenging for the mobile operating system, particularly under memory pressure, without displayspecific semantics from user space. Moreover, the complexity of frame rendering makes it infeasible to schedule them with real-time policies. To address these issues, we propose a new framework called REFROM that utilizes a history-based frame time estimator to analyze frame time samples from UI threads and predict the computation requirements of upcoming frames. Experimental results demonstrate that REFROM reduces the number of delayed frames by up to 40% and improves up to 4% energy efficiency compared to the existing approaches.

Extreme Event Discovery with Self-Attention for PM2.5 Anomaly Prediction

IEEE Intelligent Systems, To Appear

Hsin-Chih Yang, Ming-Chuan Yang, Guo-Wei~Wong, Meng Chang Chen

Ming-Chuan Yang Meng-Chang Chen


Fine particulate matter (PM2.5) values of a particular location form a time series, whose prediction is challenging due to the complicated interactions between numerous factors from meteorological measurements, terrain conditions, and industry and human habitation activities, and their predictions have attracted considerable attention from the deep learning community. Although the deep learning approach for PM2.5 prediction generally has an acceptable accuracy, it has difficulty in PM2.5 anomaly prediction, while mispredictions prevent the authority from issuing proper instructions to reduce the impact on general health. We use extreme value theory (EVT) to formulate the PM2.5 prediction problem with a self-attention-based neural network implementation. EVT-based loss accounts for the rarity of anomalous data, and self-attention captures global information. Experiments demonstrate that the proposed model obtains an improved performance of 478% in F1 score and 286% in Matthews correlation coefficient (MCC) over the fully connected network, and 229% in F1 and 148% in MCC over the typical Transformer trained with the traditional loss function.

Attention Discriminant Sampling for Point Clouds

International Conference on Computer Vision (ICCV), October 2023

Cheng-Yao Hong, Yu-Ying Chou and Tyng-Luh Liu

Tyng-Luh Liu


This paper describes an attention-driven approach to 3-D point cloud sampling. Our method is established based on a structure-aware attention discriminant analysis that explores geometric and semantic relations embodied among points and their clusters. The proposed {\em attention discriminant sampling} (ADS) starts by efficiently decomposing a given point cloud into clusters to implicitly encode its structural and geometric relatedness among points. By treating each cluster as a structural component, ADS then draws on evaluating two levels of self attention: within-cluster and between-cluster. The former reflects the semantic complexity entailed by the learned features of points within each cluster, while the latter reveals the semantic similarity between clusters. Driven by structurally preserving the point distribution, these two aspects of self attention help avoid sampling redundancy and decide the number of sampled points in each cluster.   Extensive experiments demonstrate that ADS significantly improves classification performance to \textbf{95.1\%} on ModelNet40 and \textbf{87.5\%} on ScanObjectNN and achieves \textbf{86.9\%} mIoU on ShapeNet Part Segmentation. For scene segmentation, ADS yields \textbf{91.1\%}  accuracy on S3DIS with higher mIoU to the state-of-the-art and \textbf{75.6\%} mIoU on ScanNetV2. Furthermore, ADS surpasses the state-of-the-art with \textbf{55.0\%} mAP$_{50}$ on ScanNetV2 object detection. 

Shape-Guided Dual-Memory Learning for 3D Anomaly Detection

40th International Conference on Machine Learning (ICML), July 2023

Yu-Min Chu, Chieh Liu, Ting-I Hsieh, Hwann-Tzong Chen and Tyng-Luh Liu

Tyng-Luh Liu


We present a shape-guided expert-learning framework to tackle the problem of unsupervised 3D anomaly detection. Our method is established on the effectiveness of two specialized expert models and their synergy to localize anomalous regions from color and shape modalities. The first expert utilizes geometric information to probe 3D structural anomalies by modeling the implicit distance fields around local shapes. The second expert considers the 2D RGB features associated with the first expert to identify color appearance irregularities on the local shapes. We use the two experts to build the dual memory banks from the anomaly-free training samples and perform shape-guided inference to pinpoint the defects in the testing samples. Owing to the per-point 3D representation and the effective fusion scheme of complementary modalities, our method efficiently achieves state-of-the-art performance on the MVTec 3D-AD dataset with better recall and lower false positive rates, as preferred in real applications. 

ABC-Norm Regularization for Fine-Grained and Long-Tailed Image Classification

IEEE Transactions on Image Processing, July 2023

Yen-Chi Hsu, Cheng-Yao Hong, Ming-Sui Lee, Davi Geiger and Tyng-Luh Liu

Tyng-Luh Liu


Image classification for real-world applications often involves complicated data distributions such as fine-grained and long-tailed. To address the two challenging issues simultaneously, we propose a new regularization technique that yields an adversarial loss to strengthen the model learning. Specifically, for each training batch, we construct an {\em adaptive batch prediction} (ABP) matrix and establish its corresponding {\em adaptive batch confusion norm} (ABC-Norm). The ABP matrix is a composition of two parts, including an adaptive component to class-wise encode the imbalanced data distribution, and the other component to batch-wise assess the softmax predictions. The ABC-Norm leads to a norm-based regularization loss, which can be theoretically shown to be an upper bound for an objective function closely related to rank minimization. By coupling with the conventional cross-entropy loss, the ABC-Norm regularization could introduce adaptive classification confusion and thus trigger adversarial learning to improve the effectiveness of model learning. Different from most of state-of-the-art techniques in solving either fine-grained or long-tailed problems, our method is characterized with its simple and efficient design, and most distinctively, provides a unified solution. In the experiments, we compare ABC-Norm with relevant techniques and demonstrate its efficacy on several benchmark datasets, including (CUB-LT, iNaturalist2018); (CUB, CAR, AIR); and (ImageNet-LT), which respectively correspond to the real-world, fine-grained, and long-tailed scenarios. 

A feature extraction free approach for protein interactome inference from co-elution data

Briefings in Bioinformatics, June 2023

Chen, Y.H., Chao, K.H., Wong, J.Y., Liu, C.F., Leu, J.Y. and Tsai, H.K.

Yu-Hsin Chen Jin Yung Wong Chien-Fu Liu Huai-Kuang Tsai


Protein complexes are key functional units in cellular processes. High-throughput techniques, such as co-fractionation coupled with mass spectrometry (CF-MS), have advanced protein complex studies by enabling global interactome inference. However, dealing with complex fractionation characteristics to define true interactions is not a simple task, since CF-MS is prone to false positives due to the co-elution of non-interacting proteins by chance. Several computational methods have been designed to analyze CF-MS data and construct probabilistic PPI networks. Current methods usually first infer protein-protein interactions (PPIs) based on handcrafted CF-MS features, and then use clustering algorithms to form potential protein complexes. While powerful, these methods suffer from the potential bias of handcrafted features and severely imbalanced data distribution. However, the handcrafted features based on domain knowledge might introduce bias, and current methods also tend to overfit due to the severely imbalanced PPI data. To address these issues, we present a balanced end-to-end learning architecture, SPIFFED, to integrate feature representation from raw CF-MS data and interactome prediction by convolutional neural network. SPIFFED outperforms the state-of-the-art methods in predicting PPIs under the conventional imbalanced training. When trained with balanced data, SPIFFED had greatly improved sensitivity for true PPIs. Moreover, the ensemble SPIFFED model provides different voting schemes to integrate predicted PPIs from multiple CF-MS data. Using the clustering software (i.e. ClusterONE), SPIFFED allows users to infer high confidence protein complexes depending on the CF-MS experimental designs. The source code of SPIFFED is freely available at:

Decomposition and Reorganization of Phonetic Information for Speaker Embedding Learning

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023

Qian-Bei Hong, Chung-Hsien Wu, and Hsin-Min Wang

Chung-Hsien Wu Hsin-Min Wang


Speech content is closely related to the stability of speaker embeddings in speaker verification tasks. In this paper, we propose a novel architecture based on self-constraint learning (SCL) and reconstruction task (RT) to remove the influence of phonetic information on speaker embedding generation. First, SCL is used to reduce the divergence of frame-level features, which can avoid ambiguity between the resulting embeddings of the two utterances being compared. Second, RT is used to further remove phonetic information in frame-level layers, focusing on speaker-discriminative feature transformation. In our experiments, the speaker embedding models were trained on the VoxCeleb2 dataset and evaluated on the VoxCeleb1, Librispeech, SITW and VoxMovies datasets. Experimental results on VoxCeleb1 show that the proposed DROP-TDNN system reduced the EER by 7.5%, compared to the state-of-the-art ECAPA-TDNN system. Furthermore, the proposed DROP-TDNN system also outperformed the ECAPA-TDNN system in the experiments on SITW, Librispeech and VoxMovies under cross-dataset conditions. In the experiments on SITW, the proposed system reduced the EER by 3.4% compared to the ECAPA-TDNN system. In the experiments on Librispeech, the proposed system demonstrated the advantage of removing phonetic information under the clean speech condition, with a significant reduction of 25.5% in EER compared to the ECAPA-TDNN system. In the experiments on VoxMovies, the proposed system reduced the EER by up to 7.9% compared to the ECAPA-TDNN system under different pronunciation and background conditions.

Multi-target Extractor and Detector for Unknown-number Speaker Diarization

IEEE Signal Processing Letters, 2023

Chin-Yi Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

Yu Tsao Hsin-Min Wang


Strong representations of target speakers can help extract important information about speakers and detect corresponding temporal regions in multi-speaker conversations. In this study, we propose a neural architecture that simultaneously extracts speaker representations consistent with the speaker diarization objective and detects the presence of each speaker on a frame-by-frame basis regardless of the number of speakers in a conversation. A speaker representation (called z-vector) extractor and a time-speaker contextualizer, implemented by a residual network and processing data in both temporal and speaker dimensions, are integrated into a unified framework. Tests on the CALLHOME corpus show that our model outperforms most of the methods proposed so far. Evaluations in a more challenging case with simultaneous speakers ranging from 2 to 7 show that our model achieves 6.4% to 30.9% relative diarization error rate reductions over several typical baselines.

LLSM: A Lifetime-Aware Wear-Leveling for LSM-Tree on NAND Flash Memory

ACM/IEEE International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), October 2022

Dharamjeet, Yi-Shen Chen, Tseng-Yi Chen, Yuan-Hung Kuan, and Yuan-Hao Chang

Tseng-Yi Chen Yuan-Hao Chang


The advancement of nonvolatile memory (NVM) technology reduces the cost-per-unit of solid-state drives (SSDs). Flash memory-based SSDs have become ubiquitous because they provide better performance and energy efficiency than hard disk drives. However, it suffers from wear-out problems caused by the out-of-place updates that limit its lifetime. Log-structured merge tree (LSM-tree) is a level-based data structure that is widely used in many database systems because it eliminates the random write operations to the storage devices. By transferring the random write operations into sequential write operations, the write performance of hard disk drives can be improved. However, LSM-tree is not efficient for SSDs because it is not aware of the access characteristics of flash memory. Moreover, the level-based indexing strategy of the LSM-tree significantly shortens the lifetime of SSDs because the data must be frequently updated due to the compaction operations between different levels. In contrast to many previous works that focus on alleviating the write amplification on SSDs for the database systems implemented by LSM-tree, we propose LLSM, a lifetime-aware wear-leveling for LSM-tree on NAND flash memory with open-channel SSD. By considering the data access frequency of the LSM-tree between different levels, LLSM rethinks the block allocation strategy during the compaction to evenly erase all the blocks of SSD storage devices, prolonging the SSD lifetime. Moreover, a proactive swapping strategy is designed to reorganize the data blocks for resolving the potential wear-leveling issues caused by the behaviors of the LSM-tree. The extensive experiments show that the results of lifetime improvement are encouraging.

Designing Network Design Strategies Through Gradient Path Analysis

Journal of Information Science and Engineering, July 2023

C. Y. Wang, H. Y. Mark Liao, and I-Hau Yeh

Chien-Yao Wang Hong-Yuan Mark Liao


Designing a high-efficiency and high-quality expressive network architecture has always been the most important research topic in the field of deep learning. Most of today’s network design strategies focus on how to integrate features extracted from different layers, and how to design computing units to effectively extract these features, thereby enhancing the expressiveness of the network. This paper proposes a new network design strategy, i.e., to design the network architecture based on gradient path analysis. On the whole, most of today’s mainstream network design strategies are based on feed forward path, that is, the network architecture is designed based on the data path. In this paper, we hope to enhance the expressive ability of the trained model by improving the network learning ability. Due to the mechanism driving the network parameter learning is the backward propagation algorithm, we design network design strategies based on back propagation path. We propose the gradient path design strategies for the layer-level, the stage-level, and the network-level, and the design strategies are proved to be superior and feasible from theoretical analysis and experiments. The source code of this work is at:

Differential Hsp90-dependent gene expression is strain-specific and common among yeast strains

iScience, May 2023

Hung, P.H., Liao, C.W., Ko, F.H., Tsai, H.K.* and Leu, J.Y.*

Huai-Kuang Tsai


Enhanced phenotypic diversity increases a population’s likelihood of surviving catastrophic conditions. Hsp90, an essential molecular chaperone and a central network hub in eukaryotes, has been observed to suppress or enhance the effects of genetic variation on phenotypic diversity in response to environmental cues. Because many Hsp90-interacting genes are involved in signaling transduction pathways and transcriptional regulation, we tested how common Hsp90-dependent differential gene expression is in natural populations.Many genes exhibited Hsp90-dependent strain-specific differential expression in five diverse yeast strains. We further identified transcription factors (TFs) potentially contributing to variable expression. We found that on Hsp90 inhibition or environmental stress, activities or abundances of Hsp90-dependent TFs varied among strains, resulting in differential strain-specific expression of their target genes, which consequently led to phenotypic diversity. We provide evidence that individual strains can readily display specific Hsp90-dependent gene expression, suggesting that the evolutionary impacts of Hsp90 are widespread in nature.

Short human eccDNAs are predictable from sequences

Briefings in Bioinformatics, April 2023

Chang, K.L., Chen, J.H., Lin, T.C., Leu, J.Y., Kao, C.F., Wong, J.Y.*, Tsai, H.K.*

Kai-Li Chang Tzu-Chi Lin Jin Yung Wong Huai-Kuang Tsai


Ubiquitous presence of short extrachromosomal circular DNAs (eccDNAs) in eukaryotic cells has perplexed generations of biologists. Their widespread origins in the genome lacking apparent specificity led some studies to conclude their formation as random or near-random. Despite this, the search for specific formation of short eccDNA continues with a recent surge of interest in biomarker development.
To shed new light on the conflicting views on short eccDNAs’ randomness, here we present DeepCircle, a bioinformatics framework incorporating convolution- and attention-based neural networks to assess their predictability. Short human eccDNAs from different datasets indeed have low similarity in genomic locations, but DeepCircle successfully learned shared DNA sequence features to make accurate cross-datasets predictions (accuracy: convolution-based models: 79.65±4.7%, attention-based models: 83.31±4.18%).
The excellent performance of our models shows that the intrinsic predictability of eccDNAs is encoded in the sequences across tissue origins. Our work demonstrates how the perceived lack of specificity in genomics data can be re-assessed by deep learning models to uncover unexpected similarity.