Accelerating Convolutional Neural Networks via Inter-operator Scheduling
IEEE International Conference on Parallel and Distributed Systems (ICPADS), Best Paper Runner-up, December 2022
Yi You, Pangfeng Liu, Ding-Yong Hong, Jan-Jan Wu and Wei-Chung Hsu
Convolution neural networks (CNNs) are essential in many machine learning tasks. Current deep learning frameworks and compilers usually treat the neutral network as a DAG (directed acyclic graph) of tensor operations and execute them one at a time according to a topological order, which respects the dependency in the DAG. There are two issues with this general approach. First, new CNNs have branch structures, and they form complex DAGs. These DAGs make it hard to find a good topology sort order that schedules operators within a GPU. Second, modern hardware has high computational power, which makes running operators sequentially on modern hardware under-utilizes resources. These two issues open the possibility of exploiting inter-operator parallelism, i.e., parallelism among independent operators in the DAG, to utilize the hardware resources more efficiently. In this work, we formally define the DAG scheduling problem that addresses the resource contention and propose an early-start-time-first algorithm with two heuristic rules for exploiting parallelism between independent operators. Experimental results show that our method improves the performance by up to 3.76× on RTX 3090 compared to the sequential execution.
Evolving Skyrmion Racetrack Memory as Energy-Efficient Last-Level Cache Devices
ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), August 2022
Ya-Hui Yang, Shuo-Han Chen, and Yuan-Hao Chang
Skyrmion racetrack memory (SK-RM) has been regarded as a promising alternative to replace static random-access memory (SRAM) as a large-size on-chip cache device with high memory density. Different from other nonvolatile random-access memories (NVRAMs), data bits of SK-RM can only be altered or detected at access ports, and shift operations are required to move data bits across access ports along the racetrack. Owing to these special characteristics, wordbased mapping and bit-interleaved mapping architectures have been proposed to facilitate reading and writing on SK-RM with different data layouts. Nevertheless, when SK-RM is used as an on-chip cache device, existing mapping architectures lead to the concerns of unpredictable access performance or excessive energy consumption during both data reads and writes. To resolve such concerns, this paper proposes extracting the merits of existing mapping architectures for allowing SK-RM to seamlessly switch its data update policy by considering the write latency requirement of cache accesses. Promising results have been demonstrated through a series of benchmark-driven experiments.
Drift-tolerant Coding to Enhance the Energy Efficiency of Multi-Level-Cell Phase-Change Memory
ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), August 2022
Yi-Shen Chen, Yuan-Hao Chang, and Tei-Wei Kuo
Phase-Change Memory (PCM) has emerged as a promising memory and storage technology in recent years, and Multi-Level-Cell (MLC) PCM further reduces the per-bit cost to improve its competitiveness by storing multiple bits in each PCM cell. However, MLC PCM has high energy consumption issue in its write operations. In contrast to existing works that try to enhance the energy efficiency of the physical program&verify strategy for MLC PCM, this work proposes a drift-tolerant coding scheme to enable the fast write operation on MLC PCM without sacrificing any data accuracy. By exploiting the resistance drift and asymmetric write characteristic of PCM cells, the proposed scheme can reduce the write energy consumption of MLC PCM significantly. Meanwhile, a segmentation strategy is proposed to further improve the write performance with our coding scheme. A series of analyses and experiments was conducted to evaluate the capability of the proposed scheme. The results show that the proposed scheme can reduce 6.2–17.1% energy consumption and 3.2–11.3% write latency under six representative benchmarks, compared with the existing well-known schemes.
A Deconvolution Approach to Unveiling the Immune Microenvironment of Complex Tissues and Tumors in Transcriptomics
BMC Bioinformatics, To Appear
Shu-Hwa Chen, Bo-Yi Yu, Wen-Yu Kuo, Ya-Bo Lin, Sheng-Yao Su, Wei-Hsuan Chuang, I-Hsuan Lu, Chung-Yen Lin
Resolving the composition of tumor-infiltrating leukocytes is essential for expanding the cancer immunotherapy strategy, which has witnessed dramatic success in some clinical trials but remained elusive and limited in its application. In this study, we developed a two-step streamed workflow to manage the complex bioinformatic processes involved in immune cell composition analysis. We developed a dockerized toolkit (DOCexpress_fastqc, https://hub.docker.com/r/lsbnb/docexpress_fastqc) to perform gene expression profiling from RNA sequencing raw reads by integrating the hisat2-stringtie pipeline and our scripts with Galaxy/Docker images. Then the output of DOCexpress_fastqc fits the input format of mySORT web, a web application that employs the deconvolution algorithm to determine the immune content of 21 cell subclasses. The usage of mySORT was also demonstrated using a pseudo-bulk pool through single-cell datasets. Additionally, the consistency between the estimated values and the ground-truth immune-cell composition from the single-cell datasets confirmed the exceptional performance of mySORT. The mySORT demo website and Docker image can be accessed for free at https://mysort.iis.sinica.edu.tw and https://hub.docker.com/r/lsbnb/mysort_2022.
SACS: A Self-Adaptive Checkpointing Strategy for Microkernel-Based Intermittent Systems
ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), August 2022
Yen-Ting Chen, Han-Xiang Liu, Yuan-Hao Chang, Yu-Pei Liang, and Wei-Kuan Shih
Intermittent systems are usually energy-harvesting embedded systems that harvest energy from ambient environment and perform computation intermittently. Due to the unreliable power, these intermittent systems typically adopt different checkpointing strategies for ensuring the data consistency and execution progress after the systems are resumed from unpredictable power failures. Existing checkpointing strategies are usually suitable for bare-metal intermittent systems with short run time. Due to the improvement of energy-harvesting techniques, intermittent systems are having longer run time and better computation power, so that more and more intermittent systems tend to function with a microkernel for handling more/multiple tasks at the same time. However, existing checkpointing strategies were not designed for (or aware of) such microkernel-based intermittent systems that support the running of multiple tasks, and thus have poor performance on preserving the execution progress. To tackle this issue, we propose a design, called self-adaptive checkpointing strategy (SACS), tailored for microkernel-based intermittent systems. By leveraging the time-slicing scheduler, the proposed design dynamically adjust the checkpointing interval at both run time and reboot time, so as to improve the system performance by achieving a good balance between the execution progress and the number of performed checkpoints. A series of experiments was conducted based on a development board of Texas Instrument (TI) with well-known benchmarks. Compared to the state-of-the-art designs, experiment results show that our design could reduce the execution time by at least 46.8% under different conditions of ambient environment while maintaining the number of performed checkpoints in an acceptable scale.
Rethinking the Interactivity of OS and Device Layers in Memory Management
ACM Transactions on Embedded Computing Systems (TECS), July 2022
Tse-Yuan Wang, Chun-Feng Wu, Che-Wei Tsao, Yuan-Hao Chang, Tei-Wei Kuo, and Xue Liu
Recently, the requirement of storing digital data has been growing rapidly; however, the conventional storage medium cannot satisfy these huge demands. Fortunately, thanks to biological technology development, storing digital data into deoxyribonucleic acid (DNA) has become possible in recent years. Furthermore, because of the attractive features (e.g., high storing density, long-term durability, and stability), DNA storage has been regarded as a potential alternative storage medium to store massive digital data in the future. Nevertheless, reading and writing digital data over DNA requires a series of extremely time-consuming processes (i.e., DNA sequencing and DNA synthesis). More specifically, among the two costs, the writing cost is the predominant cost of a DNA data storage system. Therefore, to enable efficient DNA storage, this article proposes an index management scheme for reducing the number of accesses to DNA storage. Additionally, this article introduces a new DNA data encoding format with VERA (Version Editing Recovery Approach) to reduce the total writing bits while inserting and deleting the data. To the best of our knowledge, this work is the first work to provide a total data management solution for DNA storage. According to the experimental results, the proposed design with VERA can reduce the cost by 77% and improve the performance by 71% compared to the append-only methods.
SEEN: Structured Event Enhancement Network for Explainable Need Detection of Information Recall Assistance
The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), December 2022
You-En Lin, An-Zi Yen, Hen-Hsen Huang and Hsin-Hsi Chen
When recalling life experiences, people often forget or confuse life events, which necessitates information recall services. Previous work on information recall focuses on providing such assistance reactively, i.e., by retrieving the life event of a given query. Proactively detecting the need for information recall services is rarely discussed. In this paper, we use a human annotated life experience retelling dataset to detect the right time to trigger the information recall service. We propose a pilot model– structured event enhancement network (SEEN) that detects life event inconsistency, additional information in life events, and forgotten events. A fusing mechanism is also proposed to incorporate event graphs of stories and enhance the textual representations. To explain the need detection results, SEEN simultaneously pro vides support evidence by selecting the related nodes from the event graph. Experimental results show that SEEN achieves promising performance in detecting information needs. In addition, the extracted evidence can be served as complementary information to remind users what events they may want to recall.
Enrichment of Prevotella intermedia in human colorectal cancer and its additive effects with Fusobacterium nucleatum on the malignant transformation of colorectal adenomas
Journal of Biomedical Science, October 2022
Chia-Hui Lo, Deng-Chyang Wu, Shu-Wen Jao, Chang-Chieh Wu, Chung-Yen Lin, Chia-Hsien Chuang, Ya-Bo Lin, Chien-Hsiun Chen, Ying-Ting Chen, Jiann-Hwa Chen, Koung-Hung Hsiao, Ying-Ju Chen, Yuan-Tsong Chen, Jaw-Yuan Wang, Ling-Hui Li
Owing to the heterogeneity of microbiota among individuals and populations, only Fusobacterium nucleatum and Bacteroides fragilis have been reported to be enriched in colorectal cancer (CRC) in multiple studies. Thus, the discovery of additional bacteria contributing to CRC development in various populations can be expected. We aimed to identify bacteria associated with the progression of colorectal adenoma to carcinoma and determine the contribution of these bacteria to malignant transformation in patients of Han Chinese origin.
Microbiota composition was determined through 16S rRNA V3–V4 amplicon sequencing of autologous adenocarcinomas, adenomatous polyps, and non-neoplastic colon tissue samples (referred to as “tri-part samples”) in patients with CRC. Enriched taxa in adenocarcinoma tissues were identified through pairwise comparison. The abundance of candidate bacteria was quantified through genomic quantitative polymerase chain reaction (qPCR) in tissue samples from 116 patients. Associations of candidate bacteria with clinicopathological features and genomic and genetic alterations were evaluated through odds ratio tests. Additionally, the effects of candidate bacteria on CRC cell proliferation, migration, and invasion were evaluated through the co-culture of CRC cells with bacterial cells or with conditioned media from bacteria.
Prevotella intermedia was overrepresented in adenocarcinomas compared with paired adenomatous polyps. Furthermore, co-abundance of P. intermedia and F. nucleatum was observed in tumor tissues. More notably, the coexistence of these two bacteria in adenocarcinomas was associated with lymph node involvement and distant metastasis. These two bacteria also exerted additive effects on the enhancement of the migration and invasion abilities of CRC cells. Finally, conditioned media from P. intermedia promoted the migration and invasion of CRC cells.
This report is the first to demonstrate that P. intermedia is enriched in colorectal adenocarcinoma tissues and enhances the migration and invasion abilities of CRC cells. Moreover, P. intermedia and F. nucleatum exert additive effects on the malignant transformation of colorectal adenomas into carcinomas. These findings can be used to identify patients at a high risk of malignant transformation of colorectal adenomas or metastasis of CRC, and they can accordingly be provided optimal clinical management.
AI4AVP: An Antiviral Peptides Predictor in Deep Learning Approach with Generative Adversarial Network Data Augmentation
Bioinformatics Advances, October 2022
Tzu-Tang Lin, Yih-Yun Sun, Ching-Tien Wang, Wen-Chih Cheng, I-Hsuan Lu, Chung-Yen Lin*, Shu-Hwa Chen*
Antiviral peptides from various sources suggest the possibility of developing peptide drugs for treating viral diseases. Because of the increasing number of identified antiviral peptides and the advances in deep-learning theory, it is reasonable to experiment with peptide drug design using in-silico methods.
We collected the most up-to-date antiviral peptides and used deep learning to construct a sequence-based binary classifier. A generative adversarial network was employed to augment the number of antiviral peptides in the positive training dataset and enable our deep-learning convolutional neural network model to learn from the negative dataset. Our classifier outperformed other state-of-the-art classifiers when using the testing dataset. We have placed the trained classifiers on a user-friendly web server, AI4AVP, for the research community.
Availability and implementation:
AI4AVP is freely accessible at http://axp.iis.sinica.edu.tw/AI4AVP/; codes and datasets for the peptide GAN and the AVP predictor CNN are available at https://github.com/lsbnb/amp_gan and https://github.com/LinTzuTang/AI4AVP_predictor.
Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023
Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, and Yu Tsao
Non-intrusive speech assessment metrics have garnered significant attention in recent years, and several deep learning-based models have been developed accordingly. Although these models are more flexible than conventional speech assessment metrics, most of them are designed to estimate a specific evaluation score, whereas speech assessment generally involves multiple facets. Herein, we propose a cross-domain multi-objective speech assessment model called MOSA-Net, which can estimate multiple speech assessment metrics simultaneously. More specifically, MOSA-Net is designed to estimate the speech quality, intelligibility, and distortion assessment scores of an input test speech signal. It comprises a convolutional neural network and bidirectional long short-term memory (CRNN) architecture for representation extraction, and a multiplicative attention layer and a fully connected layer for each assessment metric. In addition, cross-domain features (spectral and time-domain features) and latent representations from self-supervised learned (SSL) models are used as inputs to combine rich acoustic information from different speech representations to obtain more accurate assessments. Experimental results show that MOSA-Net can improve the linear correlation coefficient (LCC) by 0.026 (0.990 vs 0.964 in seen noise environments) and 0.012 (0.969 vs 0.957 in unseen noise environments) in perceptual evaluation of speech quality (PESQ) prediction, compared to Quality-Net, an existing single-task model for PESQ prediction, and improve LCC by 0.021 (0.985 vs 0.964 in seen noise environments) and 0.047 (0.836 vs 0.789 in unseen noise environments) in short-time objective intelligibility (STOI) prediction, compared to STOI-Net (based on CRNN), an existing single-task model for STOI prediction. Moreover, MOSA-Net, originally trained to assess objective scores, can be used as a pre-trained model to be effectively adapted to an assessment model for predicting subjective quality and intelligibility scores with a limited amount of training data. Experimental results show that MOSA-Net can improve LCC by 0.018 (0.805 vs 0.787) in mean opinion score (MOS) prediction, compared to MOS-SSL, a strong single-task model for MOS prediction. In light of the confirmed prediction capability, we further adopt the latent representations of MOSA-Net to guide the speech enhancement (SE) process and derive a quality-intelligibility (QI)-aware SE (QIA-SE) approach accordingly. Experimental results show that QIA-SE provides superior enhancement performance compared with the baseline SE system in terms of objective evaluation metrics and qualitative evaluation test. For example, QIA-SE can improve PESQ by 0.301 (2.953 vs 2.652 in seen noise environments) and 0.18 (2.658 vs 2.478 in unseen noise environments) over a CNN-based baseline SE model.
Rewriting Deep Learning Models for Maximizing Edge TPU Utilization
IEEE International Conference on Parallel and Distributed Systems (ICPADS), December 2022
Kung-Fu Chen and Ding-Yong Hong
The Google Edge TPU is an ASIC designed to accelerate inference of deep learning models on edge devices. Edge TPU only supports a limited set of operations. In those deep learning models containing unsupported operations, the Edge TPU compiler maps the unsupported operations and their succeeding operations to execute on the CPU, even if the succeeding operations can be executed on the Edge TPU. As a result, the Edge TPU is under-utilized and performance significantly degrades. To overcome this issue, we have developed a model rewriting tool, which leverages MLIR to replace unsupported operations in the model with supported ones while maintaining the same functionality. We also propose a general method to approximate an arbitrary continuous function to any precision using the ReLU operations. Experimental results show that our transformation achieves an average speedup of 1.66x and 4.44x over the models without rewriting on the server and edge platforms, respectively.
The gut microbiota regulates acute foreign body reaction and tissue repair after biomaterial implantation
Biomaterials, October 2022
Chen, SL, Lundy, DJ, Ruan, SC, Chen, HC, Chao, YK, Cheng, YY, Prajnamitra, PR, Liao, CC, Lin, CY, Lai, JJ, Hsieh, PC
We hypothesized that the host microbiome may influence foreign body responses following biomaterial implantation. To test this, we implanted a variety of clinically relevant biomaterials into germ-free or antibiotic-treated mice. Surprisingly, these mice displayed less fibrous tissue deposition, reduced host cell recruitment to the implant site, and differential expression of angiogenic and inflammatory markers. These observations were reversed upon fecal microbiome reconstitution, confirming a causal role of the host microbiome. In a clinically relevant disease model, microbiome-depleted mice cleared hyaluronic acid and bone marrow mononuclear cells from ischemic hind limb tissues more slowly, resulting in an improved therapeutic response. Findings were confirmed in pigs which showed reduced fibrotic responses to a variety of implanted materials. Lastly, we profiled changes in the host microbiome following material implantation, implicating several key bacteria phyla.
Chain-based Discriminative Autoencoders for Speech Recognition
Interspeech2022, September 2022
Hung-Shin Lee, Pin-Tuan Huang, Yao-Fei Cheng and Hsin-Min Wang
In our previous work, we proposed a discriminative autoencoder (DcAE) for speech recognition. DcAE combines two training schemes into one. First, since DcAE aims to learn encoderdecoder mappings, the squared error between the reconstructed speech and the input speech is minimized. Second, in the code layer, frame-based phonetic embeddings are obtained by minimizing the categorical cross-entropy between ground truth labels and predicted triphone-state scores. DcAE is developed based on the Kaldi toolkit by treating various TDNN models as encoders. In this paper, we further propose three new versions of DcAE. First, a new objective function that considers both categorical cross-entropy and mutual information between ground truth and predicted triphone-state sequences is used. The resulting DcAE is called a chain-based DcAE (c-DcAE). For application to robust speech recognition, we further extend c-DcAE to hierarchical and parallel structures, resulting in hc-DcAE and pc-DcAE. In these two models, both the error between the reconstructed noisy speech and the input noisy speech and the error between the enhanced speech and the reference clean speech are taken into the objective function. Experimental results on the WSJ and Aurora-4 corpora show that our DcAE models outperform baseline systems.
Learning to Rank Visual Stories From Human Ranking Data
in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022), May 2022
Chi-Yang Hsu, Yun-Wei Chu, Vincent Chen, Kuan-Chieh Lo, Chacha Chen, Ting-Hao (Kenneth) Huang, Lun-Wei Ku
Visual storytelling (VIST) is a typical vision and language task that has seen extensive development in the natural language generation research domain. However, it remains unclear whether conventional automatic evaluation metrics for text generation are applicable on VIST. In this paper, we present the VHED (VIST Human Evaluation Data) dataset, which first re-purposes human evaluation results for automatic evaluation; hence we develop Vrank (VIST ranker), a novel reference-free VIST metric for story evaluation.1 We first show that the results from commonly adopted automatic metrics for text generation have little correlation with those obtained from human evaluation, which motivates us to directly utilize human evaluation results to learn the automatic evaluation model. In the experiments, we evaluate the generated texts to predict story ranks using our model as well as other reference-based and reference-free metrics. Results show that Vrank prediction is significantly more aligned to human evaluation than other metrics with almost 30% higher accuracy when ranking story pairs. Moreover, we demonstrate that only Vrank shows human-like behavior in its strong ability to find better stories when the quality gap between two stories is high. Finally, we show the superiority of Vrank by its generalizability to pure textual stories, and conclude that this reuse of human evaluation results puts Vrank in a strong position for continued future advances.
A Multi-grained Dataset for News Event Triggered Knowledge Update
Proceedings of the 31st ACM International Conference on Information and Knowledge Management (CIKM 2022), October 2022
Yu-Ting Lee, Ying-Jhe Tang, Yu-Chung Cheng, Pai-Lin Chen, Tsai-Yen Li and Hen-Hsen Huang
Keeping knowledge facts up-to-date is labored and costly as the world rapidly changes and new information emerges every second. In this work, we introduce a novel task, news event triggered knowledge update. Given an existing article about a topic with a news event about the topic, the aim of our task is to generate an updated article according to the information from the news event. We create a multi-grained dataset for the investigation of our task. The articles from Wikipedia are collected and aligned with news events at multiple language units, including the citation text, the first paragraph, and the full content of the news article. Baseline models are also explored at three levels of knowledge update, including the first paragraph, the summary, and the full content of the knowledge facts.
Gut Microbiota Composition in Chemotherapy and Targeted Therapy of Patients with Metastatic Colorectal Cancer
Frontiers in Oncology, September 2022
Yen-Cheng Chen, Chia-Hsien Chuang, Zhi-Feng Miao, Kwan-Ling Yip, Chung-Jung Liu, Ling-Hui Li, Deng-Chyang Wu, Tian-Lu Cheng, Chung-Yen Lin* and Jaw-Yuan Wang*
Studies have reported the effects of the gut microbiota on colorectal cancer (CRC) chemotherapy, but few studies have investigated the association between gut microbiota and targeted therapy. This study investigated the role of the gut microbiota in the treatment outcomes of patients with metastatic CRC (mCRC). We enrolled 110 patients with mCRC and treated them with standard cancer therapy. Stool samples were collected before administering a combination of chemotherapy and targeted therapy. Patients who had a progressive disease (PD) or partial response (PR) for at least 12 cycles of therapy were included in the study. We further divided these patients into anti-epidermal growth factor receptor (cetuximab) and anti-vascular endothelial growth factor (bevacizumab) subgroups. The gut microbiota of the PR group and bevacizumab-PR subgroup exhibited significantly higher α-diversity. The β-diversity of bacterial species significantly differed between the bevacizumab-PR and bevacizumab-PD groups (P). Klebsiella quasipneumoniae exhibited the greatest fold change in abundance in the PD group than in the PR group. Lactobacillus and Bifidobacterium species exhibited higher abundance in the PD group. The abundance of Fusobacterium nucleatum was approximately 32 times higher in the PD group than in the PR group. A higher gut microbiota diversity was associated with more favorable treatment outcomes in the patients with mCRC. Bacterial species analysis of stool samples yielded heterogenous results. K. quasipneumoniae exhibited the greatest fold change in abundance among all bacterial species in the PD group. This result warrants further investigation especially in a Taiwanese population.