# 中央研究院 資訊科學研究所

## 研究概況

### 近期研究成果

:::

#### Composite Neural Network: Theory and Application to PM2.5 Prediction

IEEE Transactions on Knowledge and Data Engineering, To Appear

Ming-Chuan Yang and Meng Chang Chen

##### Abstract

This work investigates the framework and statistical performance guarantee of the composite neural network, which is composed of a collection of pre-trained and non-instantiated neural network models connected as a rooted directed acyclic graph, for solving complicated applications. A pre-trained neural network model is generally well trained, targeted to approximate a specific function.  The advantages of adopting a pre-trained model as a component in composing a complicated neural network are two-fold. One is benefiting from the intelligence and diligence of domain experts, and the other is saving effort in data acquisition as well as computing resources and time for model training.
Despite a general belief that a composite neural network may perform better than any a single component, the overall performance characteristics are not clear. In this work, we propose the framework of a composite network, and prove that a composite neural network performs better than any of its pre-trained components with a high probability.
In the study, we explore a complicated application---PM2.5 prediction---to support the correctness of the proposed composite network theory. In the empirical evaluations of PM2.5 prediction, the constructed composite neural network models perform better than other machine learning models.

#### SurpriseNet: Melody Harmonization Conditioning on User-controlled Surprise Contours

ISMIR2021, November 2021

Yi-Wei Chen, Hung-Shin Lee, Yen-Hsing Chen, and Hsin-Min Wang

##### Abstract

The surprisingness of a song is an essential and seemingly subjective factor in determining whether the listener likes it. With the help of information theory, it can be described as the transition probability of a music sequence modeled as a Markov chain. In this study, we introduce the concept of deriving entropy variations over time, so that the surprise contour of each chord sequence can be extracted. Based on this, we propose a user-controllable framework that uses a conditional variational autoencoder (CVAE) to harmonize the melody based on the given chord surprise indication. Through explicit conditions, the model can randomly generate various and harmonic chord progressions for a melody, and the Spearman’s correlation and p-value significance show that the resulting chord progressions match the given surprise contour quite well. The vanilla CVAE model was evaluated in a basic melody harmonization task (no surprise control) in terms of six objective metrics. The results of experiments on the Hooktheory Lead Sheet Dataset show that our model achieves performance comparable to the state-of-the-art melody harmonization model.

#### Learning Unsupervised Metaformer for Anomaly Detection

International Conference on Computer Vision (ICCV), October 2021

Jhih-Ciang Wu, Ding-Jie Chen, Chiou-Shann Fuh and Tyng-Luh Liu

#### Happy Dance, Slow Clap: Using Reaction GIFs to Predict Induced Affect on Twitter

Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), August 2021

Boaz Shmueli, Soumya Ray and Lun-Wei Ku

##### Abstract

Datasets with induced emotion labels are scarce but of utmost importance for many NLP tasks. We present a new, automated method for collecting texts along with their induced reaction labels. The method exploits the online use of reaction GIFs, which capture complex affective states. We show how to augment the data with induced emotion and induced sentiment labels. We use our method to create and publish ReactionGIF, a first-of-its-kind affective dataset of 30K tweets. We provide baselines for three new tasks, including induced sentiment prediction and multilabel classification of induced emotions. Our method and dataset open new research opportunities in emotion detection and affective computing.

#### Plot and Rework: Modeling Storylines for Visual Storytelling

Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), ACL Findings, August 2021

Chi-yang Hsu, Yun-Wei Chu, Ting-Hao Huang and Lun-Wei Ku

##### Abstract

Writing a coherent and engaging story is not easy. Creative writers use their knowledge and worldview to put disjointed elements together to form a coherent storyline, and work and rework iteratively toward perfection. Automated visual storytelling (VIST) models, however, make poor use of external knowledge and iterative generation when attempting to create stories. This paper introduces PR-VIST, a framework that represents the input image sequence as a story graph in which it finds the best path to form a storyline. PR-VIST then takes this path and learns to generate the final story via an iterative training process. This framework produces stories that are superior in terms of diversity, coherence, and humanness, per both automatic and human evaluations. An ablation study shows that both plotting and reworking contribute to the model's superiority.

#### H-FND: Hierarchical False-Negative Denoising for Distant Supervision Relation Extraction

Annual Meeting of Association for Computational Linguistics 2021, August 2021

Jhih-Wei Chen, Tsu-Jui Fu, Chen-Kang Lee, Wei-Yun Ma

##### Abstract

Although distant supervision automatically generates training data for relation extraction, it also introduces false-positive (FP) and false-negative (FN) training instances to the generated datasets. Whereas both types of errors degrade the final model performance, previous work on distant supervision denoising focuses more on suppressing FP noise and less on resolving the FN problem. We here propose H-FND, a hierarchical false-negative denoising framework for robust distant supervision relation extraction, as an FN denoising solution. H-FND uses a hierarchical policy which first determines whether non-relation (NA) instances should be kept, discarded, or revised during the training process. For those learning instances which are to be revised, the policy further reassigns them appropriate relations, making them better training inputs. Experiments on SemEval-2010 and TACRED were conducted with controlled FN ratios that randomly turn the relations of training and validation instances into negatives to generate FN instances. In this setting, H-FND can revise FN instances correctly and maintains high F1 scores even when 50% of the instances have been turned into negatives. Experiment on NYT10 is further conducted to shows that H-FND is applicable in a realistic setting.

#### Space-efficient Graph Data Placement to Save Energy of ReRAM Crossbar

ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), July 2021

Ting-Hsuan Lo, Chun-Feng Wu, Yuan-Hao Chang, Tei-Wei Kuo, and Wei-Chen Wang

##### Abstract

While Processing-In-Memory (PIM) offers a promising approach in running graph applications, crossbar accelerators with Resistive Random-Access Memory (ReRAM) receive attention from the academics. However, in order to match the property of bitline current summation, before being processed, graph data are mapped to adjacency matrices which incur severe sparsity and random access issues. This work provides an offline adjacency matrix index remapping scheme. The strategy targets at sparsity and spatial locality improvement with rational computation overhead and better energy consumption for any given graph partition configuration in adjacency matrix format.

#### Androgenic sensitivities and ovarian gene expression profiles prior to treatment in Japanese eel (Anguilla japonica)

Marine Biotechnology, June 2021

Yung-Sen Huang, Wen-Chih Cheng, Chung-Yen Lin

##### Abstract

Androgens stimulate ovarian development in eels. Our previous report indicated a correlation between the initial (debut) ovarian status (determined by kernel density estimation (KDE), presented as a probability density of oocyte size) and the consequence of 17MT treatment (change in ovary). The initial ovarian status appeared to be an important factor influencing ovarian androgenic sensitivity. We postulated that the sensitivities of initial ovaries are correlated with their gene expression profiles. Japanese eels underwent operation to sample the initial ovarian tissues, and the samples were stored in liquid nitrogen. Using high-throughput next-generation sequencing (NGS) technology, ovarian transcriptomic data were mined and analyzed based on functional gene classification with cutoff-based differentially expressed genes (DEGs); the ovarian status was transformed into gene expression profiles globally or was represented by a set of gene list. Our results also implied that the initial ovary might be an important factor influencing the outcomes of 17MT treatments, and the genes related with neuronal activities or neurogenesis seemed to play an essential role in the positive effect.

#### Sequence to General Tree: Knowledge-Guided Geometry Word Problem Solving

ACL-IJCNLP2021, August 2021

Shih-hung Tsai, Chao-Chun Liang, Hsin-Min Wang, and Keh-Yih Su

##### Abstract

With the recent advancements in deep learning, neural solvers have gained promising results in solving math word problems. However, these SOTA solvers only generate binary expression trees that contain basic arithmetic operators and do not explicitly use the math formulas. As a result, the expression trees they produce are lengthy and uninterpretable because they need to use multiple operators and constants to represent one single formula. In this paper, we propose sequence-to-general tree (S2G) that learns to generate interpretable and executable operation trees where the nodes can be formulas with an arbitrary number of arguments. With nodes now allowed to be formulas, S2G can learn to incorporate mathematical domain knowledge into problem-solving, making the results more interpretable. Experiments show that S2G can achieve a better performance against strong baselines on problems that require domain knowledge

#### AlloST: Low-resource Speech Translation without Source Transcription

Interspeech2021, August 2021

Yao-Fei Cheng, Hung-Shin Lee, and Hsin-Min Wang

##### Abstract

The end-to-end architecture has made promising progress in speech translation (ST). However, the ST task is still challenging under low-resource conditions. Most ST models have shown unsatisfactory results, especially in the absence of word information from the source speech utterance. In this study, we survey methods to improve ST performance without using source transcription, and propose a learning framework that utilizes a language-independent universal phone recognizer. The framework is based on an attention-based sequence-to-sequence model, where the encoder generates the phonetic embeddings and phone-aware acoustic representations, and the decoder controls the fusion of the two embedding streams to produce the target token sequence. In addition to investigating different fusion strategies, we explore the specific usage of byte pair encoding (BPE), which compresses a phone sequence into a syllablelike segmented sequence. Due to the conversion of symbols, a segmented sequence represents not only pronunciation but also language-dependent information lacking in phones. Experiments conducted on the Fisher Spanish-English and TaigiMandarin drama corpora show that our method outperforms the conformer-based baseline, and the performance is close to that of the existing best method using source transcription.

#### iTARGEX analysis of yeast deletome reveals novel regulators of transcriptional buffering in S phase and protein turnover

Nucleic Acids Research, July 2021

Huang J.H., Liao, Y.R., Lin, T.C., Tsai, C.H., Lai, W.Y., Chou, Y.K., Leu, J.Y., Tsai, H.K.*, and Kao, C.F.*

##### Abstract

Integrating omics data with quantification of biological traits provides unparalleled opportunities for discovery of genetic regulators by in silico inference. However, current approaches to analyze genetic-perturbation screens are limited by their reliance on annotation libraries for prioritization of hits and subsequent targeted experimentation. Here, we present iTARGEX (identification of Trait-Associated Regulatory Genes via mixture regression using EXpectation maximization), an association framework with no requirement ofa priori knowledge of gene function. After creating this tool, we used it to test associations between gene expression profiles and two biological traits in single-gene deletion budding yeast mutants, including transcription homeostasis during S phase and global protein turnover. For each trait, we discovered novel regulators without prior functional annotations. The functional effects of the novel candidates were then validated experimentally, providing solid evidence for their roles in the respective traits. Hence, we conclude that iTARGEX can reliably identify novel factors involved in given biological traits. As such, it is capable of converting genome-wide observations into causal gene function predictions. Further application of iTARGEX in other contexts is expected to facilitate the discovery of new regulators and provide observations for novel mechanistic hypotheses regarding different biological traits and phenotypes.

#### Scaled-YOLOv4: Scaling Cross Stage Partial Network

Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, June 2021

C. Y. Wang, Alexey Bochkovskiy, H. Y. Mark Liao

##### Abstract

We show that the YOLOv4 object detection neural network based on the CSP approach, scales both up and down and is applicable to small and large networks while maintaining optimal speed and accuracy. We propose a network scaling approach that modifies not only the depth, width, resolution, but also structure of the network. YOLOv4- large model achieves state-of-the-art results: 55.4% AP (73.3% AP50) for the MS COCO dataset at a speed of 15 FPS on Tesla V100, while with the test time augmentation, YOLOv4-large achieves 55.8% AP (73.2 AP50). To the best of our knowledge, this is currently the highest accuracy on the COCO dataset among any published work. The YOLOv4-tiny model achieves 22.0% AP (42.0% AP50) at a speed of 443 FPS on RTX 2080Ti, while by using TensorRT, batch size = 4 and FP16-precision the YOLOv4-tiny achieves 1774 FPS.

#### Enabling Write-reduction Multiversion Scheme with Efficient Dual-range Query over NVRAM

IEEE Transactions on Very Large Scale Integration Systems (TVLSI), June 2021

I-Ju Wang, Yu-Pei Liang, Tseng-Yi Chen, Yuan-Hao Chang, Bo-Jun Chen, Hsin-Wen Wei, and Wei-Kuan Shih

##### Abstract

Due to cyber-physical systems, a large-scale multiversion indexing scheme has garnered significant attention in recent years. However, modern multiversion indexing schemes have significant drawbacks (e.g., heavy write traffic and weak key- or version-range-query performance) while being applied to a computer system with a nonvolatile random access memory (NVRAM) as its main memory. Unfortunately, with the considerations of high memory cell density and zero-static power consumption, NVRAM has been regarded as a promising candidate to substitute for dynamic random access memory (DRAM) in future computer systems. Therefore, it is critical to make a multiversion indexing scheme friendly for an NVRAM-based system. For tackling this issue with modern multiversion indexing schemes, this article proposes a write-reduction multiversion indexing scheme with efficient dual-range queries. According to the experiments, our scheme effectively reduces the amount of write traffic generated by the multiversion indexing scheme to NVRAM. It offers efficient dual-range queries by consolidating the proposed version forest and the multiversion tree.

#### On Minimizing Internal Data Migrations of Flash Devices via Lifetime-Retention Harmonization

IEEE Transactions on Computers (TC), March 2021

Ming-Chang Yang, Chun-Feng Wu, Shuo-Han Chen, Yi-Ling Lin, Che-Wei Chang, and Yuan-Hao Chang

##### Abstract

With the emerge of high-density triple-level-cell (TLC) and 3D NAND flash, the access performance and endurance of flash devices are degraded due to the downscaling of flash cells. In addition, we observe that the mismatch between data lifetime requirement and flash block retention capability could further worsen the access performance and endurance. This is because the “lifetime-retention mismatch” could result in massive internal data migrations during garbage collection and data refreshing, and further aggravate the already-worsened access performance and endurance of high-density NAND flash devices. Such an observation motivates us to resolve the lifetime-retention mismatch problem by proposing a “time harmonization strategy”, which coordinates the flash block retention capability with the data lifetime requirement to enhance the performance of flash devices with very limited endurance degradation. Specifically, this study aims to lower the amount of internal data migrations caused by garbage collection and data refreshing via storing data of different lifetime requirement in flash blocks with suitable retention capability. The trace-driven evaluation results reveal that the proposed design can effectively reduce the average response time by about 99 percent on average without sacrificing the overall endurance, as compared with the state-of-the-art designs.

#### Optimizing Lifetime Capacity and Read Performance of Bit-Alterable 3D NAND Flash

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), February 2021

Shuo-Han Chen, Ming-Chang Yang, and Yuan-Hao Chang

##### Abstract

With the technology advance of bit-alterable 3-D NAND flash, bit-level program and erase operations have been realized and provide the possibility of “bit-level rewrite.” Bit-level rewrite is predicted to be highly beneficial to the performance of the densely packed, bit-error-prone 3-D NAND flash because bit-level rewrites can remove error bits at bit-level granularity, shorten the error correction latency, and boost the read performance. Distinctly, bit-level rewrite can curtail the lifetime expense of refresh operations via correcting the error bit stored in the individual flash cell directly without a full-page rewrite, which is employed by previous refresh techniques. However, because bit-level rewrite is predicted to have similar latency and wearing as conventional full-page rewrites, the throughput of bit-level rewrites needs to be examined to avoid low rewrite efficiency. This observation inspires us to investigate and propose the bit-level error removal (BER) scheme to utilize the bit-level rewrites for optimizing both the read performance and lifetime capacity in a most-efficient way. The experimental results are encouraging and showed that the read performance can be improved by an average of 25.22% with 40.39% reduction of lifetime expense.

#### A data-independent acquisition-based global phosphoproteomics system enables deep profiling

Nature Communications, May 2021

Reta Birhanu Kitata, Wai-Kok Choong, Chia-Feng Tsai, Pei-Yi Lin, Bo-Shiun Chen, Yun-Chien Chang, Alexey I. Nesvizhskii, Ting-Yi Sung and Yu-Ju Chen

##### Abstract

Phosphoproteomics can provide insights into cellular signaling dynamics. To achieve deep and robust quantitative phosphoproteomics profiling for minute amounts of sample, we here develop a global phosphoproteomics strategy based on data-independent acquisition (DIA) mass spectrometry and hybrid spectral libraries derived from data-dependent acquisition (DDA) and DIA data. Benchmarking the method using 166 synthetic phosphopeptides shows high sensitivity (<0.1 ng), accurate site localization and reproducible quantification (~5% median coefficient of variation). As a proof-of-concept, we use lung cancer cell lines and patient-derived tissue to construct a hybrid phosphoproteome spectral library covering 159,524 phosphopeptides (88,107 phosphosites). Based on this library, our single-shot streamlined DIA workflow quantifies 36,350 phosphosites (19,755 class 1) in cell line samples within two hours. Application to drug-resistant cells and patient-derived lung cancer tissues delineates site-specific phosphorylation events associated with resistance and tumor progression, showing that our workflow enables the characterization of phosphorylation signaling with deep coverage, high sensitivity and low between-run missing values.

#### Null Space Component Analysis of One-Shot Single-Channel Source Separation Problem

IEEE Transactions on Signal Processing, To Appear

Wen-Liang Hwang and Jinn Ho

##### Abstract

Extracting multiple unknown sources from a single observation of a single-channel is an ill-posed problem encountered in a variety of applications. This paper characterizes the ambiguity of solutions to the source separation problem, and then proposes a novel adaptive-operator-based approach to deriving solutions based on a combination of separation operators and domain-specific knowledge related to sources. The proposed scheme involves transforming the original problem into a new problem, in which data-dependent operators and the unknown sources are variables to be optimized. We demonstrate that a solution to the proposed optimization problem must reside in the null spaces of the operators, and any such solution also provides an optimal value to the original problem. We then demonstrate the applicability of the proposed method to the separation of sparse sources as well as AM-FM sources. Note that the proposed scheme outperformed corresponding state-of-the-art methods in noiseless as well as noisy environments. Finally, we demonstrate the efficacy of the proposed scheme in separation tasks based on real-world ECG data (i.e., extracting fetal ECG signals from noisy observations in which maternal and fetal ECGs recordings are superimposed) and electrical data (i.e.,separating singularities from harmonic components in an observation of noisy data related to surges in electrical current).

#### Knowledge Based Hyperbolic Propagation

in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), July 2021

Chang-You Tai, Chienkun Huang, Liangying Huang, Lun-Wei Ku

##### Abstract

There has been significant progress in utilizing heterogeneous knowledge graphs (KGs) as auxiliary information in recommendation systems. However, existing KG-aware recommendation models rely solely on Euclidean space, neglecting hyperbolic space, which has already been shown to possess a superior ability to separate embeddings by providing more room''. We propose a knowledge based hyperbolic propagation framework (KBHP) which includes hyperbolic components for calculating the importance of KG attributes' relatives to achieve better knowledge propagation. In addition to the original relations in the knowledge graph, we propose a user purchase relation to better represent logical patterns in hyperbolic space, which bridges users and items for modeling user preference. Experiments on four real-world benchmarks show that KBHP is significantly more accurate than state-of-the-art models. We further visualize the generated embeddings to demonstrate that the proposed model successfully clusters attributes that are relevant to items and highlights those that contain useful information for recommendation.

#### User-Centric Path Reasoning towards Explainable Recommendation

in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), July 2021

Chang-You Tai, Liangying Huang, Chienkun Huang, Lun-Wei Ku

##### Abstract

There has been significant progress in the utilization of heterogeneous knowledge graphs (KGs) as auxiliary information in recommendation systems. Reasoning over KG paths sheds light on the user's decision making process. Previous methods focus on formulating this process as a multi-hop reasoning problem. However, without some form of guidance in the  reasoning process, such a huge search space results in poor accuracy and little explanation diversity. In this paper, we propose UCPR, a user-centric path reasoning network that constantly guides the search from the aspect of user demand and enables explainable recommendation. In this network, a multi-view structure leverages not only local sequence reasoning information but also a panoramic view of the user's demand portfolio while inferring subsequent user decision-making steps. Experiments on five real-world benchmarks show UCPR is significantly more accurate than state-of-the-art methods. Besides, we show that the proposed model successfully identifies users' concerns and increases reasoning diversity to enhance explainability.

#### Beyond Fair Pay: Ethical Implications of Crowdsourcing NLP Task

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2021), June 2021

Boaz Shmueli, Jan Fell, Soumya Ray, Lun-Wei Ku

##### Abstract

The use of crowdworkers in NLP research is growing rapidly, in tandem with the expo-nential increase in research production in ma-chine learning and AI. Ethical discussion re-garding the use of crowdworkers within the NLP research community is typically confined in scope to issues related to labor conditions, such as fair pay. We draw attention to the lack of risk mitigation related to the various tasks performed by workers, including data label-ing, text evaluation, and text production. We find that the Final Rule, the common ethical framework used by researchers, did not antici-pate the use of online crowdsourcing platforms for data collection, and this results in potential gaps between the spirit and practice of human-subjects ethics in NLP research. We enu-merate common scenarios where crowdwork-ers performing NLP tasks are at risk of harm. We thus recommend that researchers evaluate these risks by considering the three ethical principles set up by the Belmont Report. We also clarify some common misconceptions re-garding the Institutional Review Board review process. We hope this paper will serve to re-open the discussion within our community re-garding the ethical use of crowdworkers.

#### End-to-end Recurrent Cross-Modality Attention for Video Dialogue

IEEE/ACM Transactions on Audio, Speech and Language Processing, March 2021

Yun-Wei Chu, Kuan-Yen Lin, Chao-Chun Hsu, Lun-Wei Ku

##### Abstract

Visual dialogue systems need to understand dynamic visual scenes and comprehend semantics in order to converse with users. Constructing video dialogue systems is more challenging than traditional image dialogue systems because the large feature space of videos makes it difficult to capture semantic information. Furthermore, the dialogue system also needs to precisely answer users’ question based on comprehensive understanding of the videos and the previous dialogue. In order to improve the performance of video dialogue system, we proposed an end-to-end recurrent cross-modality attention (ReCMA) model to answer a series of questions about a video from both visual and textual modality. The answer representation of the question is updated based on both visual representation and textual representation in each step of the reasoning process to have a better understanding of both modalities’ information. We evaluate our method on the challenging DSTC7 video scene-aware dialog dataset and the proposed ReCMA achieves a relative 20.8% improvement over the baseline on CIDEr.

#### Perceptual Indistinguishability-Net (PI-Net): Facial Image Obfuscation with Manipulable Semantics

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2021

Jia-Wei Chen, Li-Ju Chen, Chia-Mu Yu, and Chun-Shien Lu

##### Abstract

With the growing use of camera devices, the industry has many image datasets that provide more opportunities for collaboration between the machine learning community and industry. However, the sensitive information in the datasets discourages data owners from releasing these datasets. Despite recent research devoted to removing sensitive information from images, they provide neither meaningful privacy-utility trade-off nor provable privacy guarantees. In this study, with the consideration of the perceptual similarity, we propose perceptual indistinguishability (PI) as a formal privacy notion particularly for images. We also propose PI-Net, a privacy-preserving mechanism that achieves image obfuscation with PI guarantee. Our study shows that PI-Net achieves significantly better privacy utility trade-off through public image data.

#### Adaptive Image Transformer for One-Shot Object Detection

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021

Ding-Jie Chen, He-Yen Hsieh and Tyng-Luh Liu

##### Abstract

One-shot object detection tackles a challenging task that aims at identifying within a target image all object instances of the same class, implied by a query image patch. The main difficulty lies in the situation that the class label of the query patch and its respective examples are not available in the training data. Our main idea leverages the concept of language translation to boost metric-learning-based detection methods. Specifically, we emulate the language translation process to adaptively translate the feature of each object proposal to better correlate the given query feature for discriminating the class-similarity among the proposal-query pairs. To this end, we propose the Adaptive Image Transformer (AIT) module that deploys an attention-based encoder-decoder architecture to simultaneously explore intra-coder and inter-coder (\\ie, each proposal-query pair) attention. The adaptive nature of our design turns out to be flexible and effective in addressing the one-shot learning scenario. With the informative attention cues, the proposed model excels in predicting the class-similarity between the target image proposals and the query image patch. Though conceptually simple, our model significantly outperforms a state-of-the-art technique, improving the unseen-class object classification from 63.8 mAP and 22.0 AP50 to 72.2 mAP and 24.3 AP50 on the PASCAL-VOC and MS-COCO benchmark datasets, respectively.

#### Referring Image Segmentation via Language-Driven Attention

International Conference on Robotics and Automation (ICRA), May 2021

Ding-Jie Chen, He-Yen Hsieh and Tyng-Luh Liu

##### Abstract

This paper aims to tackle the problem of referring image segmentation, which is targeted at reasoning the region of interest referred by a query natural language sentence. One key issue to address the referring image segmentation is how to establish the cross-modal representation for encoding the two modalities, namely, the query sentence and the input image. Most existing methods are designed to concatenate the features from each modality or to gradually encode the cross-modal representation concerning each word's effect. In contrast, our approach leverages the correlation between the two modalities for constructing the cross-modal representation. To make the resulting cross-modal representation more discriminative for the segmentation task, we propose a novel mechanism of language-driven attention to encode the cross-modal representation for reflecting the attention between every single visual element and the entire query sentence. The proposed mechanism, named as Language-Driven Attention (LDA), first decouples the cross-modal correlation to channel-attention and spatial-attention and then integrates the two attentions for obtaining the cross-modal representation. The channel attention and the spatial attention respectively reveal how sensitive each channel, or each pixel of a particular feature map is with respect to the query sentence. With a proper fusion of the two kinds of feature attention, the proposed LDA model can effectively guide the generation of the final cross-modal representation. The resulting representation is further strengthened for capturing the multi-receptive-field and multi-level-semantic for the intended segmentation. We assess our referring image segmentation model on four public benchmark datasets, and the experimental results show that our model achieves state-of-the-art performance.

#### ATACgraph: profiling genome wide chromatin accessibility from ATAC-seq

Frontiers in Genetics, January 2021

Rita Jui-Hsein Lu, Yen-Ting Liu, Chih Wei Huang, Ming-Ren Yen, Chung-Yen Lin and Pao-Yang Chen

##### Abstract

Assay for transposase-accessible chromatin using sequencing data (ATAC-seq) is an efficient and precise method for revealing chromatin accessibility across the genome. Most of the current ATAC-seq tools follow chromatin immunoprecipitation sequencing (ChIP-seq) strategies that do not consider ATAC-seq-specific properties. To incorporate specific ATAC-seq quality control and the underlying biology of chromatin accessibility, we developed a bioinformatics software named ATACgraph for analyzing and visualizing ATAC-seq data. ATACgraph profiles accessible chromatin regions and provides ATAC-seq-specific information including definitions of nucleosome-free regions (NFRs) and nucleosome-occupied regions. ATACgraph also allows identification of differentially accessible regions between two ATAC-seq datasets. ATACgraph incorporates the docker image with the Galaxy platform to provide an intuitive user experience via the graphical interface. Without tedious installation processes on a local machine or cloud, users can analyze data through activated websites using pre-designed workflows or customized pipelines composed of ATACgraph modules. Overall, ATACgraph is an effective tool designed for ATAC-seq for biologists with minimal bioinformatics knowledge to analyze chromatin accessibility. ATACgraph can be run on any ATAC-seq data with no limit to specific genomes. As validation, we demonstrated ATACgraph on human genome to showcase its functions for ATAC-seq interpretation. This software is publicly accessible and can be downloaded at https://github.com/RitataLU/ATACgraph