Designing Network Design Strategies Through Gradient Path Analysis
Journal of Information Science and Engineering, July 2023
C. Y. Wang, H. Y. Mark Liao, and I-Hau Yeh


Abstract
Differential Hsp90-dependent gene expression is strain-specific and common among yeast strains
iScience, May 2023
Hung, P.H., Liao, C.W., Ko, F.H., Tsai, H.K.* and Leu, J.Y.*

Abstract
Enhanced phenotypic diversity increases a population’s likelihood of surviving catastrophic conditions. Hsp90, an essential molecular chaperone and a central network hub in eukaryotes, has been observed to suppress or enhance the effects of genetic variation on phenotypic diversity in response to environmental cues. Because many Hsp90-interacting genes are involved in signaling transduction pathways and transcriptional regulation, we tested how common Hsp90-dependent differential gene expression is in natural populations.Many genes exhibited Hsp90-dependent strain-specific differential expression in five diverse yeast strains. We further identified transcription factors (TFs) potentially contributing to variable expression. We found that on Hsp90 inhibition or environmental stress, activities or abundances of Hsp90-dependent TFs varied among strains, resulting in differential strain-specific expression of their target genes, which consequently led to phenotypic diversity. We provide evidence that individual strains can readily display specific Hsp90-dependent gene expression, suggesting that the evolutionary impacts of Hsp90 are widespread in nature.
Short human eccDNAs are predictable from sequences
Briefings in Bioinformatics, April 2023
Chang, K.L., Chen, J.H., Lin, T.C., Leu, J.Y., Kao, C.F., Wong, J.Y.*, Tsai, H.K.*




Abstract
Background
Ubiquitous presence of short extrachromosomal circular DNAs (eccDNAs) in eukaryotic cells has perplexed generations of biologists. Their widespread origins in the genome lacking apparent specificity led some studies to conclude their formation as random or near-random. Despite this, the search for specific formation of short eccDNA continues with a recent surge of interest in biomarker development.
Results
To shed new light on the conflicting views on short eccDNAs’ randomness, here we present DeepCircle, a bioinformatics framework incorporating convolution- and attention-based neural networks to assess their predictability. Short human eccDNAs from different datasets indeed have low similarity in genomic locations, but DeepCircle successfully learned shared DNA sequence features to make accurate cross-datasets predictions (accuracy: convolution-based models: 79.65±4.7%, attention-based models: 83.31±4.18%).
Conclusions
The excellent performance of our models shows that the intrinsic predictability of eccDNAs is encoded in the sequences across tissue origins. Our work demonstrates how the perceived lack of specificity in genomics data can be re-assessed by deep learning models to uncover unexpected similarity.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023
C. Y. Wang, Alexey Bochkovskiy, H. Y. Mark Liao


Abstract
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 120 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100. YOLOv7-E6 object detector (54 FPS V100, 55.9% AP) outperforms both transformer-based detector SWINL Cascade-Mask R-CNN (9.2 FPS A100, 53.9% AP) by 587% in speed and 2% in accuracy, and convolutionalbased detector ConvNeXt-XL Cascade-Mask R-CNN (8.6 FPS A100, 55.2% AP) by 628% in speed and 0.7% AP in accuracy, as well as YOLOv7 outperforms: YOLOR, YOLOX, Scaled-YOLOv4, YOLOv5, DETR, Deformable DETR, DINO-5scale-R50, ViT-Adapter-B and many other object detectors in speed and accuracy. Moreover, we train YOLOv7 only on MS COCO dataset from scratch without using any other datasets or pre-trained weights.
You Only Learn One Representation: Unified Network for Multiple Tasks
Journal of Information Science and Engineering, May 2023
C. Y. Wang, H. Y. Mark Liao, and I-Hau Yeh


Abstract
People “understand” the world via vision, hearing, tactile, and also the past experience. Human experience can be learned through normal learning (we call it explicit knowledge), or subconsciously (we call it implicit knowledge). These experiences learned through normal learning or subconsciously will be encoded and stored in the brain. Using these abundant experience, as a huge database, human beings can effectively process data, even they were unseen beforehand. In this paper, we propose a unified network to encode implicit knowledge and explicit knowledge together, just like the human brain can learn knowledge from normal learning as well as subconsciousness learning. The unified network can generate a unified representation to simultaneously serve various tasks. We can perform kernel space alignment, prediction refinement, and multi-task learning in a convolutional neural network. The results demonstrate that when implicit knowledge is introduced into the neural network, it benefits the performance of all tasks. We further analyze the implicit representation learnt from the proposed unified network, and it shows great capability on catching the physical meaning of different tasks. The source code of this work is at : https://github.com/WongKinYiu/yolor.
Intelligent De Novo Design of Novel Antimicrobial Peptides Against Antibiotic-Resistant Bacteria Strains
International Journal of Molecular Sciences, April 2023
Tzu-Tang Lin, Li-Yen Yang, Chung-Yen Lin, Ching-Tien Wang, Chia-Wen Lai, Chi-Fong Ko, Yang-Hsin Shih, Shu-Hwa Chen




Abstract
Because of the growing number of clinical antibiotic resistance cases in recent years, novel anti-microbial peptides (AMPs) may be ideal for next-generation antibiotics. This study trained a Wasserstein generative adversarial network with gradient penalty (WGAN-GP) based on known AMPs to generate novel AMP candidates. The quality of the GAN-designed peptides was evalu-ated in silico, and eight of them, named GAN-pep 1–8, were selected by AMP Artificial Intelligence (A.I.) classifier and synthesized for further experiments. Disc diffusion testing and minimum in-hibitory concentration (MIC) determinations were used to identify the antibacterial effects of the synthesized GAN-designed peptides. Seven of the eight synthesized GAN-designed peptides displayed antibacterial activity. Additionally, GAN-pep 3 and GAN-pep 8 presented a broad spectrum of antibacterial effects and effectively against antibiotic-resistant bacteria strains, such as methicillin-resistant Staphylococcus aureus and carbapenem-resistant Pseudomonas aeruginosa. GAN-pep 3, the most promising GAN-designed peptide candidate, had low MICs against all the tested bacteria. In brief, our approach shows an efficient way to discover effective AMPs against general and antibiotic-resistant bacteria strains. And such a strategy also allows other novel func-tional peptides to be quickly designed, identified and synthesized for validation on the wet bench.
Identification of Sexually Dimorphic Genes in Pectoral Fin as Molecular Markers for Assessing the Sex of Japanese Silver Eels (Anguilla japonica)
Zoological Studies, March 2023
Hsiang-Yi Hsu, Chia-Hsien Chuang, I-Hsuan Lu, Chung-Yen Lin, and Yu-San Han



Abstract
The Japanese eel (Anguilla japonica) is an important species in East Asian aquaculture. However, the production of seedlings for this purpose still depends on natural resources, as the commercial production of glass eels is not yet possible. Confusion about the sex of silver eels is one of the factors affecting the success rate of artificial maturation. This study sought to devise a harmless method to precisely assess the sex of silver eels. Partial pectoral fins were collected from females and males and the total RNA was extracted for transcriptomic analysis to identify sexually dimorphic genes as molecular markers for sex typing. An online database was constructed to integrate the annotations of transcripts and perform comparative transcriptome analysis. This analysis identified a total of 29 candidate sexually dimorphic genes. Ten were selected for a real-time quantitative polymerase chain reaction (RT-qPCR) to validate the transcriptomic data and evaluate their feasibility as markers. The transcriptomic analysis and RT-qPCR data implicated three potential markers (LOC111853410, kera, and dcn) in sex typing. The expression of LOC111853410 was higher in females than in males. In contrast, the expression of kera and dcn was higher in males than in females. The ΔCT values of three markers were analyzed to determine their inferred thresholds, which can be used to determine the sex of Japanese eels. The results suggested that if a silver eel had a pectoral fin with the pectoral fin having the ΔCT of LOC111853410 < 11.3, the ΔCT of kera > 11.4, or the ΔCT of dcn > 6.5 can be assessed it could be assessed as female. Males could be assessed by the ΔCT of LOC111853410 > 11.3, the ΔCT of kera < 11.4, or the ΔCT of dcn < 6.5 in their pectoral fins. The molecular functions of these markers and the biological significance of their differential expression require further exploration.
Exploring Synchronous Page Fault Handling
ACM/IEEE International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), October 2022
Yin-Chiuan Chen, Chun-Feng Wu, Yuan-Hao Chang, and Tei-Wei Kuo


Abstract
The advance of nonvolatile memory in storage technology has presented challenges in redefining the ways in handling the main memory and the storage. This work is motivated by the strong demands in effective handling of page faults over ultralow-latency storage devices. In particular, we propose synchronous and asynchronous prefetching strategies to satisfy process executions with different memory demands in supporting of synchronous page fault handling. An adaptive CPU scheduling strategy is also proposed to cope with the needs of processes in maintaining their working sets in the main memory. Six representative benchmarks and applications were evaluated. It was shown that our strategy can effectively save 12.33% of the total execution time and reduce 13.33% of page faults, compared to the conventional demand paging strategy with nearly no sacrificing of process fairness.
GraphRC: Accelerating Graph Processing on Dual-addressing Memory with Vertex Merging
ACM/IEEE International Conference on Computer-Aided Design (ICCAD), October 2022
Wei Cheng, Chun-Feng Wu, Yuan-Hao Chang, and Ing-Chao Lin


Abstract
Architectural innovation in graph accelerators attracts research attention due to foreseeable inflation in data sizes and the irregular memory access pattern of graph algorithms. Conventional graph accelerators ignore the potential of Non-Volatile Memory (NVM) crossbar as a dual-addressing memory and treat it as a traditional single-addressing memory with higher density and better energy efficiency. In this work, we present GraphRC, a graph accelerator that leverages the power of dual-addressing memory by mapping in-edge/out-edge requests to column/row-oriented memory accesses. Although the capability of dual-addressing memory greatly improves the performance of graph processing, some memory accesses still suffer from low-utilization issues. Therefore, we propose a vertex merging (VM) method that improves cache block utilization rate by merging memory requests from consecutive vertices. VM reduces the execution time of all 6 graph algorithms on all 4 datasets by 24.24% on average. We then identify the data dependency inherent in a graph limits the usage of VM, and its effectiveness is bounded by the percentage of mergeable vertices. To overcome this limitation, we propose an aggressive vertex merging (AVM) method that outperforms VM by ignoring the data dependency inherent in a graph. AVM significantly reduces the execution time of ranking-based algorithms on all 4 datasets while preserving the correct ranking of the top 20 vertices.
RankMix: Data Augmentation for Weakly Supervised Learning of Classifying Whole Slide Images with Diverse Sizes and Imbalanced Categories
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023
Yuan-Chih Chen and Chun-Shien Lu


Abstract
Whole Slide Images (WSIs) are usually gigapixel in size and lack pixel-level annotations. The WSI datasets are also imbalanced in categories. These unique characteristics, significantly different from the ones in natural images, pose the challenge of classifying WSI images as a kind of weakly supervise learning problems. In this study, we propose, RankMix, a data augmentation method of mixing ranked features in a pair of WSIs. RankMix introduces the concepts of pseudo labeling and ranking in order to extract key WSI regions in contributing to the WSI classification task. A two-stage training is further proposed to boost stable training and model performance. To our knowledge, we are the first to investigate weakly supervised learning from the perspective of data augmentation to deal with the WSI classification problem that suffers from lack of training data and imbalance of categories.
Self-Adapted Utterance Selection for Suicidal Ideation Detection in Lifeline Conversations
The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), January 2023
Zhong-Ling Wang, Po-Hsien Huang, Wen-Yau Hsu and Hen-Hsen Huang

Abstract
This paper investigates a crucial aspect of mental health by exploring the detection of suicidal ideation in spoken phone conversations between callers and counselors at a suicide prevention hotline. These conversations can be lengthy, noisy, and cover a broad range of topics, making it challenging for NLP models to accurately identify the caller's suicidal ideation. To address these difficulties, we introduce a novel, self-adaptive approach that identifies the most critical utterances that the NLP model can more easily distinguish. The experiments use real-world Lifeline transcriptions, expertly labeled, and show that our approach outperforms the baseline models in overall performance with an F-score of 66.01%. In detecting the most dangerous cases, our approach achieves a significantly higher F-score of 65.94% compared to the baseline models, an improvement of 8.9%. The selected utterances can also provide valuable insights for suicide prevention research. Furthermore, our approach demonstrates its versatility by showing its effectiveness in sentiment analysis, making it a valuable tool for NLP applications beyond the healthcare domain.
Planting Fast-growing Forest by Leveraging the Asymmetric Read/Write Latency of NVRAM-based Systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), October 2022
Yu-Pei Liang, Tseng-Yi Chen, Yuan-Hao Chang, Yi-Da Huang, and Wei-Kuan Shih


Abstract
Owing to the considerations of cell density and low static power consumption, nonvolatile random-access memory (NVRAM) has been a promising candidate for collaborating with a dynamic random-access memory (DRAM) as the main memory in modern computer systems. As NVRAM also brings technical challenges (e.g., limited endurance and high writing cost) to computer system developers, the concept of write reduction becomes the famous doctrine in NVRAM-based system design. Unfortunately, a well-known machine learning algorithm, random forest, will generate a massive amount of write traffic to the main memory space during its construction phase. In other words, a random forest hits the Achilles’ heel of NVRAMbased systems. For remedying this pain, our work proposes an NVRAM-friendly random forest algorithm, namely, Amine, for an NVRAM-based system. The design principle of Amine is to replace write operations with read accesses without raising the read complexity of the random forest algorithm. According to experimental results, Amine can effectively decrease the latency of random forest construction by 64%, compared with the original random forest algorithm.
On Minimizing the Read Latency of Flash Memory to Preserve Inter-tree Locality in Random Forest
ACM/IEEE International Conference on Computer-Aided Design (ICCAD), October 2022
Yu-Cheng Lin, Yu-Pei Liang, Tseng-Yi Chen, Yuan-Hao Chang, Shuo-Han Chen, and Wei-Kuan Shih



Abstract
Many prior research works have been widely discussed how to bring machine learning algorithms to embedded systems. Because of resource constraints, embedded platforms for machine learning applications play the role of a predictor. That is, an inference model will be constructed on a personal computer or a server platform, and then integrated into embedded systems for just-in-time inference. With the consideration of the limited main memory space in embedded systems, an important problem for embedded machine learning systems is how to efficiently move inference model between the main memory and a secondary storage (e.g., flash memory). For tackling this problem, we need to consider how to preserve the locality inside the inference model during model construction. Therefore, we have proposed a solution, namely locality-aware random forest (LaRF), to preserve the inter-locality of all decision trees within a random forest model during the model construction process. Owing to the locality preservation, LaRF can improve the read latency by 81.5% at least, compared to the original random forest library.
SGIRR: Sparse Graph Index Remapping for ReRAM Crossbar Operation Unit and Power Optimization
ACM/IEEE International Conference on Computer-Aided Design (ICCAD), October 2022
Cheng-Yuan Wang, Yao-Wen Chang, and Yuan-Hao Chang

Abstract
Resistive Random Access Memory (ReRAM) Crossbars are a promising process-in-memory technology to reduce enormous data movement overheads of large-scale graph processing between computation and memory units. ReRAM cells can combine with crossbar arrays to effectively accelerate graph processing, and partitioning ReRAM crossbar arrays into Operation Units (OUs) can further improve computation accuracy of ReRAM crossbars. The operation unit utilization was not optimized in previous work, incurring extra cost. This paper proposes a two-stage algorithm with a crossbar OU-aware scheme for sparse graph index remapping for ReRAM (SGIRR) crossbars, mitigating the influence of graph sparsity. In particular, this paper is the first to consider the given operation unit size with the remapping index algorithm, optimizing the operation unit and power dissipation. Experimental results show that our proposed algorithm reduces the utilization of crossbar OUs by 31.4%, improves the total OU block usage by 10.6%, and saves energy consumption by 17.2%, on average.
D4AM: A General Denoising Framework for Downstream Acoustic Models
The Eleventh International Conference on Learning Representations, ICLR 2023, May 2023
Chi-Chang Lee, Yu Tsao, Hsin-Min Wang, Chu-Song Chen


Abstract
The performance of acoustic models degrades notably in noisy environments. Speech enhancement (SE) can be used as a front-end strategy to serve automatic speech recognition (ASR) systems. However, the training objectives of existing SE approaches do not consider the generalization ability to unseen ASR systems. In this study, we propose a general denoising framework for various downstream acoustic models, called D4AM. Our framework fine-tunes the SE model with the backward gradient according to a specific acoustic model and the corresponding classification objective. At the same time, our method aims to take the regression objective as an auxiliary loss to make the SE model generalize to other unseen acoustic models. To jointly train an SE unit with regression and classification objectives, D4AM uses an adjustment scheme to directly estimate suitable weighting coefficients instead of going through a grid search process with additional training costs. The adjustment scheme consists of two parts: gradient calibration and regression objective weighting. Experimental results show that D4AM can consistently and effectively provide improvements to various unseen acoustic models and outperforms other combination setups. To the best of our knowledge, this is the first work that deploys an effective combination scheme of regression (denoising) and classification (ASR) objectives to derive a general pre-processor applicable to various unseen ASR systems.
Generalization Ability Improvement of Speaker Representation and Anti-Interference for Speaker Verification
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023
Qian-Bei Hong, Chung-Hsien Wu, and Hsin-Min Wang

Abstract
The ability to generalize to mismatches between training and testing conditions and resist interference from other speakers is crucial for the performance of speaker verification. In this paper, we propose two novel approaches to improve the generalization ability to deal with the mismatched recorded scenarios and languages in test conditions and to reduce the influence of interference from other speakers on the similarity measurement of two speaker embeddings. First, parent embedding learning (PEL) is used for model training, which exploits the generalization ability of the shared structure to improve the representation of speaker embeddings. Second, partial adaptive score normalization (PAS-Norm) is used to reduce the influence of interference from other speakers on embedding-based similarity measures. In the experiments, the speaker embedding models are trained using the VoxCeleb2 dataset, and the performance is evaluated on four other datasets under different conditions, including VoxCeleb1, Librispeech, SITW, and CN-Celeb datasets. In the experiments on VoxCeleb1, evaluation results considering a large number of verification speakers and identity restrictions show that the proposed PEL-based system reduces the EER by 6.0% and 4.9% in these two cases, respectively, compared to the state-of-the-art (SOTA) system. Furthermore, in the experiments evaluating speaker verification in mismatch conditions on SITW and CN-Celeb, the proposed PEL-based system also outperforms the SOTA system. In the language mismatched conditions, the EER is reduced by 8.3%. For the evaluation of the influence of interference from other speakers, the EER is significantly reduced by 24.4% when PAS-Norm is used instead of the baseline AS-Norm score normalization method.
Performance Enhancement of SMR-based Deduplication Systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), September 2022
Chun-Feng Wu, Martin Kuo, Ming-Chang Yang, and Yuan-Hao Chang



Abstract
Due to the fast-growing amount of data and cost consideration, shingled-magnetic-recording (SMR) drives are developed to provide low-cost and high-capacity data storage by enhancing the areal-density of hard disk drives, and (data) deduplication techniques are getting popular in data-centric applications to reduce the amount of data that need to be stored in storage devices by eliminating the duplicate data chunks. However, directly applying deduplication techniques on SMR drives could significantly decrease the runtime performance of the deduplication system because of the time-consuming SMR space reclamation caused by the sequential write constraint of SMR drives. In this article, an SMR-aware deduplication scheme is proposed to improve the runtime performance of SMR-based deduplication systems with the consideration of the sequential write constraint of SMR drives. Moreover, to bridge the information gap between the deduplication system and the SMR drive, the lifetime information of data chunks is extracted to separate data chunks of different lifetimes in different places of SMR drives, so as to further reduce the SMR space reclamation overhead. A series of experiments was conducted with a set of realistic deduplication workloads. The results show that the proposed scheme can significantly improve the runtime performance of the SMR-based deduplication system with limited system overheads.
Accelerating Convolutional Neural Networks via Inter-operator Scheduling
IEEE International Conference on Parallel and Distributed Systems (ICPADS), Best Paper Runner-up, December 2022
Yi You, Pangfeng Liu, Ding-Yong Hong, Jan-Jan Wu and Wei-Chung Hsu


Abstract
Convolution neural networks (CNNs) are essential in many machine learning tasks. Current deep learning frameworks and compilers usually treat the neutral network as a DAG (directed acyclic graph) of tensor operations and execute them one at a time according to a topological order, which respects the dependency in the DAG. There are two issues with this general approach. First, new CNNs have branch structures, and they form complex DAGs. These DAGs make it hard to find a good topology sort order that schedules operators within a GPU. Second, modern hardware has high computational power, which makes running operators sequentially on modern hardware under-utilizes resources. These two issues open the possibility of exploiting inter-operator parallelism, i.e., parallelism among independent operators in the DAG, to utilize the hardware resources more efficiently. In this work, we formally define the DAG scheduling problem that addresses the resource contention and propose an early-start-time-first algorithm with two heuristic rules for exploiting parallelism between independent operators. Experimental results show that our method improves the performance by up to 3.76× on RTX 3090 compared to the sequential execution.
Evolving Skyrmion Racetrack Memory as Energy-Efficient Last-Level Cache Devices
ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), August 2022
Ya-Hui Yang, Shuo-Han Chen, and Yuan-Hao Chang


Abstract
Skyrmion racetrack memory (SK-RM) has been regarded as a promising alternative to replace static random-access memory (SRAM) as a large-size on-chip cache device with high memory density. Different from other nonvolatile random-access memories (NVRAMs), data bits of SK-RM can only be altered or detected at access ports, and shift operations are required to move data bits across access ports along the racetrack. Owing to these special characteristics, wordbased mapping and bit-interleaved mapping architectures have been proposed to facilitate reading and writing on SK-RM with different data layouts. Nevertheless, when SK-RM is used as an on-chip cache device, existing mapping architectures lead to the concerns of unpredictable access performance or excessive energy consumption during both data reads and writes. To resolve such concerns, this paper proposes extracting the merits of existing mapping architectures for allowing SK-RM to seamlessly switch its data update policy by considering the write latency requirement of cache accesses. Promising results have been demonstrated through a series of benchmark-driven experiments.
Drift-tolerant Coding to Enhance the Energy Efficiency of Multi-Level-Cell Phase-Change Memory
ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), August 2022
Yi-Shen Chen, Yuan-Hao Chang, and Tei-Wei Kuo

Abstract
Phase-Change Memory (PCM) has emerged as a promising memory and storage technology in recent years, and Multi-Level-Cell (MLC) PCM further reduces the per-bit cost to improve its competitiveness by storing multiple bits in each PCM cell. However, MLC PCM has high energy consumption issue in its write operations. In contrast to existing works that try to enhance the energy efficiency of the physical program&verify strategy for MLC PCM, this work proposes a drift-tolerant coding scheme to enable the fast write operation on MLC PCM without sacrificing any data accuracy. By exploiting the resistance drift and asymmetric write characteristic of PCM cells, the proposed scheme can reduce the write energy consumption of MLC PCM significantly. Meanwhile, a segmentation strategy is proposed to further improve the write performance with our coding scheme. A series of analyses and experiments was conducted to evaluate the capability of the proposed scheme. The results show that the proposed scheme can reduce 6.2–17.1% energy consumption and 3.2–11.3% write latency under six representative benchmarks, compared with the existing well-known schemes.