# 中央研究院 資訊科學研究所

## 研究概況

### 近期研究成果

:::

#### Null Space Component Analysis of One-Shot Single-Channel Source Separation Problem

IEEE Transactions on Signal Processing, To Appear

Wen-Liang Hwang and Jinn Ho

##### Abstract

Extracting multiple unknown sources from a single observation of a single-channel is an ill-posed problem encountered in a variety of applications. This paper characterizes the ambiguity of solutions to the source separation problem, and then proposes a novel adaptive-operator-based approach to deriving solutions based on a combination of separation operators and domain-specific knowledge related to sources. The proposed scheme involves transforming the original problem into a new problem, in which data-dependent operators and the unknown sources are variables to be optimized. We demonstrate that a solution to the proposed optimization problem must reside in the null spaces of the operators, and any such solution also provides an optimal value to the original problem. We then demonstrate the applicability of the proposed method to the separation of sparse sources as well as AM-FM sources. Note that the proposed scheme outperformed corresponding state-of-the-art methods in noiseless as well as noisy environments. Finally, we demonstrate the efficacy of the proposed scheme in separation tasks based on real-world ECG data (i.e., extracting fetal ECG signals from noisy observations in which maternal and fetal ECGs recordings are superimposed) and electrical data (i.e.,separating singularities from harmonic components in an observation of noisy data related to surges in electrical current).

#### End-to-end Recurrent Cross-Modality Attention for Video Dialogue

IEEE/ACM Transactions on Audio, Speech and Language Processing, To Appear

Yun-Wei Chu, Kuan-Yen Lin, Chao-Chun Hsu, Lun-Wei Ku

##### Abstract

Visual dialogue systems need to understand dynamic visual scenes and comprehend semantics in order to converse with users. Constructing video dialogue systems is more challenging than traditional image dialogue systems because the large feature space of videos makes it difficult to capture semantic information. Furthermore, the dialogue system also needs to precisely answer users’ question based on comprehensive understanding of the videos and the previous dialogue. In order to improve the performance of video dialogue system, we proposed an end-to-end recurrent cross-modality attention (ReCMA) model to answer a series of questions about a video from both visual and textual modality. The answer representation of the question is updated based on both visual representation and textual representation in each step of the reasoning process to have a better understanding of both modalities’ information. We evaluate our method on the challenging DSTC7 video scene-aware dialog dataset and the proposed ReCMA achieves a relative 20.8% improvement over the baseline on CIDEr.

#### Knowledge Based Hyperbolic Propagation

The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), July 2021

Chang-You Tai, Chienkun Huang, Liangying Huang, Lun-Wei Ku

##### Abstract

There has been significant progress in utilizing heterogeneous knowledge graphs (KGs) as auxiliary information in recommendation systems. However, existing KG-aware recommendation models rely solely on Euclidean space, neglecting hyperbolic space, which has already been shown to possess a superior ability to separate embeddings by providing more room''. We propose a knowledge based hyperbolic propagation framework (KBHP) which includes hyperbolic components for calculating the importance of KG attributes' relatives to achieve better knowledge propagation. In addition to the original relations in the knowledge graph, we propose a user purchase relation to better represent logical patterns in hyperbolic space, which bridges users and items for modeling user preference. Experiments on four real-world benchmarks show that KBHP is significantly more accurate than state-of-the-art models. We further visualize the generated embeddings to demonstrate that the proposed model successfully clusters attributes that are relevant to items and highlights those that contain useful information for recommendation.

#### User-Centric Path Reasoning towards Explainable Recommendation

The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), July 2021

Chang-You Tai, Liangying Huang, Chienkun Huang, Lun-Wei Ku

##### Abstract

There has been significant progress in the utilization of heterogeneous knowledge graphs (KGs) as auxiliary information in recommendation systems. Reasoning over KG paths sheds light on the user's decision making process. Previous methods focus on formulating this process as a multi-hop reasoning problem. However, without some form of guidance in the  reasoning process, such a huge search space results in poor accuracy and little explanation diversity. In this paper, we propose UCPR, a user-centric path reasoning network that constantly guides the search from the aspect of user demand and enables explainable recommendation. In this network, a multi-view structure leverages not only local sequence reasoning information but also a panoramic view of the user's demand portfolio while inferring subsequent user decision-making steps. Experiments on five real-world benchmarks show UCPR is significantly more accurate than state-of-the-art methods. Besides, we show that the proposed model successfully identifies users' concerns and increases reasoning diversity to enhance explainability.

#### Beyond Fair Pay: Ethical Implications of Crowdsourcing NLP Task

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2021), June 2021

Boaz Shmueli, Jan Fell, Soumya Ray, Lun-Wei Ku

##### Abstract

The use of crowdworkers in NLP research is growing rapidly, in tandem with the expo-nential increase in research production in ma-chine learning and AI. Ethical discussion re-garding the use of crowdworkers within the NLP research community is typically confined in scope to issues related to labor conditions, such as fair pay. We draw attention to the lack of risk mitigation related to the various tasks performed by workers, including data label-ing, text evaluation, and text production. We find that the Final Rule, the common ethical framework used by researchers, did not antici-pate the use of online crowdsourcing platforms for data collection, and this results in potential gaps between the spirit and practice of human-subjects ethics in NLP research. We enu-merate common scenarios where crowdwork-ers performing NLP tasks are at risk of harm. We thus recommend that researchers evaluate these risks by considering the three ethical principles set up by the Belmont Report. We also clarify some common misconceptions re-garding the Institutional Review Board review process. We hope this paper will serve to re-open the discussion within our community re-garding the ethical use of crowdworkers.

#### Scaled-YOLOv4: Scaling Cross Stage Partial Network

IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2021

Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao

##### Abstract

We show that the YOLOv4 object detection neural network based on the CSP approach, scales both up and down and is applicable to small and large networks while maintaining optimal speed and accuracy. We propose a network scaling approach that modifies not only the depth, width, resolution, but also structure of the network. YOLOv4- large model achieves state-of-the-art results: 55.4% AP (73.3% AP50) for the MS COCO dataset at a speed of 15 FPS on Tesla V100, while with the test time augmentation, YOLOv4-large achieves 55.8% AP (73.2 AP50). To the best of our knowledge, this is currently the highest accuracy on the COCO dataset among any published work. The YOLOv4-tiny model achieves 22.0% AP (42.0% AP50) at a speed of 443 FPS on RTX 2080Ti, while by using TensorRT, batch size = 4 and FP16-precision the YOLOv4-tiny achieves 1774 FPS.

#### Perceptual Indistinguishability-Net (PI-Net): Facial Image Obfuscation with Manipulable Semantics

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Jia-Wei Chen, Li-Ju Chen, Chia-Mu Yu, and Chun-Shien Lu

##### Abstract

With the growing use of camera devices, the industry has many image datasets that provide more opportunities for collaboration between the machine learning community and industry. However, the sensitive information in the datasets discourages data owners from releasing these datasets. Despite recent research devoted to removing sensitive information from images, they provide neither meaningful privacy-utility trade-off nor provable privacy guarantees. In this study, with the consideration of the perceptual similarity, we propose perceptual indistinguishability (PI) as a formal privacy notion particularly for images. We also propose PI-Net, a privacy-preserving mechanism that achieves image obfuscation with PI guarantee. Our study shows that PI-Net achieves significantly better privacy utility trade-off through public image data.

#### Adaptive Image Transformer for One-Shot Object Detection

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021

Ding-Jie Chen, He-Yen Hsieh and Tyng-Luh Liu

##### Abstract

One-shot object detection tackles a challenging task that aims at identifying within a target image all object instances of the same class, implied by a query image patch. The main difficulty lies in the situation that the class label of the query patch and its respective examples are not available in the training data. Our main idea leverages the concept of language translation to boost metric-learning-based detection methods. Specifically, we emulate the language translation process to adaptively translate the feature of each object proposal to better correlate the given query feature for discriminating the class-similarity among the proposal-query pairs. To this end, we propose the Adaptive Image Transformer (AIT) module that deploys an attention-based encoder-decoder architecture to simultaneously explore intra-coder and inter-coder (\\ie, each proposal-query pair) attention. The adaptive nature of our design turns out to be flexible and effective in addressing the one-shot learning scenario. With the informative attention cues, the proposed model excels in predicting the class-similarity between the target image proposals and the query image patch. Though conceptually simple, our model significantly outperforms a state-of-the-art technique, improving the unseen-class object classification from 63.8 mAP and 22.0 AP50 to 72.2 mAP and 24.3 AP50 on the PASCAL-VOC and MS-COCO benchmark datasets, respectively.

#### Referring Image Segmentation via Language-Driven Attention

International Conference on Robotics and Automation (ICRA), May 2021

Ding-Jie Chen, He-Yen Hsieh and Tyng-Luh Liu

##### Abstract

This paper aims to tackle the problem of referring image segmentation, which is targeted at reasoning the region of interest referred by a query natural language sentence. One key issue to address the referring image segmentation is how to establish the cross-modal representation for encoding the two modalities, namely, the query sentence and the input image. Most existing methods are designed to concatenate the features from each modality or to gradually encode the cross-modal representation concerning each word's effect. In contrast, our approach leverages the correlation between the two modalities for constructing the cross-modal representation. To make the resulting cross-modal representation more discriminative for the segmentation task, we propose a novel mechanism of language-driven attention to encode the cross-modal representation for reflecting the attention between every single visual element and the entire query sentence. The proposed mechanism, named as Language-Driven Attention (LDA), first decouples the cross-modal correlation to channel-attention and spatial-attention and then integrates the two attentions for obtaining the cross-modal representation. The channel attention and the spatial attention respectively reveal how sensitive each channel, or each pixel of a particular feature map is with respect to the query sentence. With a proper fusion of the two kinds of feature attention, the proposed LDA model can effectively guide the generation of the final cross-modal representation. The resulting representation is further strengthened for capturing the multi-receptive-field and multi-level-semantic for the intended segmentation. We assess our referring image segmentation model on four public benchmark datasets, and the experimental results show that our model achieves state-of-the-art performance.

#### ATACgraph: profiling genome wide chromatin accessibility from ATAC-seq

Frontiers in Genetics, January 2021

Rita Jui-Hsein Lu, Yen-Ting Liu, Chih Wei Huang, Ming-Ren Yen, Chung-Yen Lin and Pao-Yang Chen

##### Abstract

Assay for transposase-accessible chromatin using sequencing data (ATAC-seq) is an efficient and precise method for revealing chromatin accessibility across the genome. Most of the current ATAC-seq tools follow chromatin immunoprecipitation sequencing (ChIP-seq) strategies that do not consider ATAC-seq-specific properties. To incorporate specific ATAC-seq quality control and the underlying biology of chromatin accessibility, we developed a bioinformatics software named ATACgraph for analyzing and visualizing ATAC-seq data. ATACgraph profiles accessible chromatin regions and provides ATAC-seq-specific information including definitions of nucleosome-free regions (NFRs) and nucleosome-occupied regions. ATACgraph also allows identification of differentially accessible regions between two ATAC-seq datasets. ATACgraph incorporates the docker image with the Galaxy platform to provide an intuitive user experience via the graphical interface. Without tedious installation processes on a local machine or cloud, users can analyze data through activated websites using pre-designed workflows or customized pipelines composed of ATACgraph modules. Overall, ATACgraph is an effective tool designed for ATAC-seq for biologists with minimal bioinformatics knowledge to analyze chromatin accessibility. ATACgraph can be run on any ATAC-seq data with no limit to specific genomes. As validation, we demonstrated ATACgraph on human genome to showcase its functions for ATAC-seq interpretation. This software is publicly accessible and can be downloaded at https://github.com/RitataLU/ATACgraph

#### Adaptive and Generative Zero-Shot Learning

Ninth International Conference on Learning Representations (ICLR), May 2021

Yu-Ying Chou, Hsuan-Tien Lin and Tyng-Luh Liu

##### Abstract

We address the problem of generalized zero-shot learning (GZSL) where the task is to predict the class label of a target image whether its label belongs to the seen or unseen category. Similar to ZSL, the learning setting assumes that all class-level semantic features are given, while only the images of seen classes are available for training. By exploring the correlation between image features and the corresponding semantic features, the main idea of the proposed approach is to enrich the semantic-to-visual (S2V) embeddings via a seamless fusion of adaptive and generative learning. To this end, we extend the semantic features of each class by supplementing image-adaptive attention so that the learned S2V embedding can account for not only inter-class but also intra-class variations. In addition, to break the limit of training with images only from seen classes, we design a generative scheme to simultaneously generate virtual class labels and their visual features by sampling and interpolating over seen counterparts. In inference, a testing image will give rise to two different S2V embeddings, seen and virtual. The former is used to decide whether the underlying label is of the unseen category or otherwise a specific seen class; the latter is to predict an unseen class label. To demonstrate the effectiveness of our method, we report state-of-the-art results on four standard GZSL datasets, including an ablation study of the proposed modules.

#### Efficient Video Captioning on Heterogeneous System Architectures

35th IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2021

Horng-Ruey Huang, Ding-Yong Hong, Jan-Jan Wu, Pangfeng Liu, Wei-Chung Hsu

##### Abstract

Video captioning is the core technology to drive the development of many important multidisciplinary applications, such as AI-assisted medical diagnosis, storytelling through videos, video question answering, lip-reading, just to name a few. Video captioning employs a hybrid CNN+RNN neural network model to translate video scenes into natural language descriptions. For deep learning inference, a typical approach is running both the CNN and the RNN on a GPU. Such a GPU-only approach often suffers long inference time due to underutilization of the computing power offered by the CPU+GPU heterogeneous system architecture, which is a common architecture in modern computers. This work is an early effort to tackle the performance issue of performing deep learning inference using a hybrid CNN+RNN model on a heterogeneous system with a CPU and a GPU. This is a challenging task because of (1) CNN and RNN exhibit very different computing behaviors. This raises the question of how to split the two models into computing tasks and properly assign the tasks to the CPU and the GPU to minimize the inference time for a video frame, and (2) Data dependency exists between the CNN and the RNN within a video frame, as well as between the adjacent RNNs across two video frames. These data dependencies prohibit full parallelization of the hybrid model. To solve these two problems, we propose two optimizations: a finegrained scheduling scheme for mapping computation and devices within a video frame, and a pipeline scheduling scheme to exploit maximum parallelism between the execution of the video frames. To facilitate our optimizations, we also develop an accurate regression-based cost model to predict the computation time of CNN/RNN operations and the communication time for moving data between CPU and GPU. Experimental results show that our optimization improves the performance of video captioning by up to 3.24× on the CPU+GPU system, compared with the GPU-only execution.

#### Not by Equations Alone: Reasoning with Extensible Effects

Journal of Functional Programming, January 2021

Oleg Kiselyov, Shin-Cheng Mu and Amr Sabry

##### Abstract

The challenge of reasoning about programs with (multiple) effects such as mutation, jumps or IO dates back to the inception of program semantics in the works of Strachey and Landin. Using monads to represent individual effects and the associated equational laws to reason about them proved exceptionally effective. Even then it is not always clear what laws are to be associated with a monad — for a good reason, as we show for non-determinism. Combining expressions using different effects brings challenges not just for monads, which do not compose, but also for equational reasoning: the interaction of effects may invalidate their individual laws, as well as induce emerging properties that are not apparent in the semantics of individual effects. Overall, the problems are judging the adequacy of a law; determining if or when a law continues to hold upon addition of new effects; and obtaining and easily verifying emergent laws. We present a solution relying on the framework of (algebraic, extensible) effects, which already proved itself for writing programs with multiple effects. Equipped with a fairly conventional denotational semantics, this framework turns useful, as we demonstrate, also for reasoning about and optimizing programs with multiple interacting effects. Unlike the conventional approach, equational laws are not imposed on programs/effect handlers, but induced from them: our starting point hence is a program (model), whose denotational semantics, besides being used directly, suggests and justifies equational laws and clarifies side-conditions. The main technical result is the introduction of the notion of equivalence modulo handlers (modulo observation’) or a particular combination of handlers — and proving it to be a congruence. It is hence usable for reasoning in any context, not just evaluation contexts — provided particular conditions are met. Concretely, we describe several realistic handlers for non-determinism and elucidate their laws (some of which hold in the presence of any other effect). We demonstrate appropriate equational laws of non-determinism in the presence of global state, which have been a challenge to state and prove before.

#### Multi-Q 2 software facilitates isobaric labeling quantitation analysis with improved accuracy and coverage

Scientific Reports, January 2021

Ching-Tai Chen, Jen-Hung Wang, Cheng-Wei Cheng, Wei-Che Hsu, Chu-Ling Ko, Wai-Kok Choong, and Ting-Yi Sung

##### Abstract

Mass spectrometry-based proteomics using isobaric labeling for multiplex quantitation has become a popular approach for proteomic studies. We present Multi-Q 2, an isobaric-labeling quantitation tool which can yield the largest quantitation coverage and improved quantitation accuracy compared to three state-of-the-art methods. Multi-Q 2 supports identification results from several popular proteomic data analysis platforms for quantitation, offering up to 12% improvement in quantitation coverage for accepting identification results from multiple search engines when compared with MaxQuant and PatternLab. It is equipped with various quantitation algorithms, including a ratio compression correction algorithm, and results in up to 336 algorithmic combinations. Systematic evaluation shows different algorithmic combinations have different strengths and are suitable for different situations. We also demonstrate that the flexibility of Multi-Q 2 in customizing algorithmic combination can lead to improved quantitation accuracy over existing tools. Moreover, the use of complementary algorithmic combinations can be an effective strategy to enhance sensitivity when searching for biomarkers from differentially expressed proteins in proteomic experiments. Multi-Q 2 provides interactive graphical interfaces to process quantitation and to display ratios at protein, peptide, and spectrum levels. It also supports a heatmap module, enabling users to cluster proteins based on their abundance ratios and to visualize the clustering results. Multi-Q 2 executable files, sample data sets, and user manual are freely available at http://ms.iis.sinica.edu.tw/COmics/Software_Multi-Q2.html.

#### Parallel Asynchronous Stochastic Dual Coordinate Descent Algorithms for Efficiency and Convergence

29th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2021), March 2021

Yung-Chen Chen, Pangfeng Liu, Jan-Jan Wu

##### Abstract

Parallel asynchronous stochastic dual coordinate descent algorithm (PASSCoDe) is an efficient method to train linear models in multi-core shared memory systems.
{\\tt PASSCoDe} enjoys a good speedup when the number of threads is less than 8 on sparse datasets, i.e., the percentage of nonzero elements in the training data is relatively small.
However, due to the memory conflict and delayed parameter access problem in parallel execution, it often diverges or does not converge to the best accuracy as a serial dual coordinate descent algorithm does.
In this paper, we proposed two algorithms -- {\\em Adaptive Hybrid} algorithm and {\\em Lazy-Sync} algorithm, to overcome the convergence issues in parallel execution.
Experiment results indicate that both algorithms converge to the {\\em same} high accuracy as a sequential program does on {\\em all}  datasets we tested, except on one extremely small dataset.
On the other hand, PASSCoDe sometimes converges to a less accurate value, or does not converge at all on some datasets.
Our methods also outperform PASSCoDe-Fix, an improved version of PASSCoDe, in stable convergence, execution speed, and scalability.

#### PASSLEAF: A Pool-bAsed Semi-Supervised LEArning Framework for Uncertain Knowledge Graph Embedding

The 35th AAAI Conference on Artificial Intelligence (AAAI 2021), February 2021

Zhu-Mu Chen, Mi-Yen Yeh, and Tei-Wei Kuo

##### Abstract

In this paper, we study the problem of embedding uncertain knowledge graphs, where each relation between entities is associated with a confidence score. Observing the existing embedding methods may discard the uncertainty information, only incorporate a specific type of score function, or cause many false-negative samples in the training, we propose the PASSLEAF framework to solve the above issues. PASSLEAF consists of two parts, one is a model that can incorporate different types of scoring functions to predict the relation confidence scores and the other is the semi-supervised learning model by exploiting both positive and negative samples associated with the estimated confidence scores. Furthermore, PASSLEAF leverages a sample pool as a relay of generated samples to further augment the semi-supervised learning. Experiment results show that our proposed framework can learn better embedding in terms of having higher accuracy in both the confidence score prediction and tail entity prediction.

#### Accelerating Continuous Normalizing Flow with Trajectory Polynomial Regularization

The 35th AAAI Conference on Artificial Intelligence (AAAI 2021), February 2021

Han-Hsien Huang and Mi-Yen Yeh

##### Abstract

In this paper, we propose an approach to effectively accelerating the computation of continuous normalizing flow (CNF), which has been proven to be a powerful tool for the tasks such as variational inference and density estimation. The training time cost of CNF can be extremely high because the required number of function evaluations (NFE)  for solving corresponding ordinary differential equations (ODE) is very large. We think that the high NFE results from large truncation errors of solving ODEs. To address the problem, we propose to add a regularization. The regularization penalizes the difference between the trajectory of the ODE and its fitted polynomial regression. The trajectory of ODE will approximate a polynomial function, and thus the truncation error will be smaller. Furthermore, we provide two proofs and claim that the additional regularization does not harm training quality. Experimental results show that our proposed method can result in 42.3\%  to  71.3\% reduction of NFE on the task of density estimation, and 19.3\%  to  32.1\% reduction of NFE on  variational auto-encoder, while the testing losses are not affected at all.

#### Positions, Channels, and Layers: Fully Generalized Non-Local Network for Singer Identification

Thirty-Fifth AAAI Conference on Artificial Intelligence, February 2021

I-Yuan Kuo, Wen-Li Wei, and Jen-Chun Lin

##### Abstract

Recently, a non-local (NL) operation has been designed as the central building block for deep-net models to capture long-range dependencies (Wang et al. 2018). Despite its excellent performance, it does not consider the interaction between positions across channels and layers, which is crucial in fine-grained classification tasks. To address the limitation, we target at singer identification (SID) task and present a fully generalized non-local (FGNL) module to help identify finegrained vocals. Specifically, we first propose a FGNL operation, which extends the NL operation to explore the correlations between positions across channels and layers. Secondly, we further apply a depth-wise convolution with Gaussian kernel in the FGNL operation to smooth feature maps for better generalization. More, we modify the squeeze-and-excitation (SE) scheme into the FGNL module to adaptively emphasize correlated feature channels to help uncover relevant feature responses and eventually the target singer. Evaluating results on the benchmark artist20 dataset shows that the FGNL module significantly improves the accuracy of the deep-net models in SID. Codes are available at https://github.com/ian-k-1217/Fully-Generalized-Non-Local-Network.

#### A Flexible Template Generation and Matching Method with Applications for Publication Reference Metadata Extraction

Journal of the Association for Information Science and Technology, To Appear

Ting-Hao Yang, Yu-Lun Hsieh, Shih-Hung Liu, Yung-Chun Chang, and Wen-Lian Hsu

##### Abstract

Conventional rule‐based approaches use exact template matching to capture linguistic information and necessarily need to enumerate all variations. We propose a novel flexible template generation and matching scheme called the principle‐based approach (PBA) based on sequence alignment, and employ it for reference metadata extraction (RME) to demonstrate its effectiveness. The main contributions of this research are threefold. First, we propose an automatic template generation that can capture prominent patterns using the dominating set algorithm. Second, we devise an alignment‐based template‐matching technique that uses a logistic regression model, which makes it more general and flexible than pure rule‐based approaches. Last, we apply PBA to RME on extensive cross‐domain corpora and demonstrate its robustness and generality. Experiments reveal that the same set of templates produced by the PBA framework not only deliver consistent performance on various unseen domains, but also surpass hand‐crafted knowledge (templates). We use four independent journal style test sets and one conference style test set in the experiments. When compared to renowned machine learning methods, such as conditional random fields (CRF), as well as recent deep learning methods (i.e., bi‐directional long short‐term memory with a CRF layer, Bi‐LSTM‐CRF), PBA has the best performance for all datasets.

#### LBERT: Lexically-aware Transformers based Bidirectional Encoder Representation model for learning Universal Bio-Entity Relations

Bioinformatics, To Appear

Neha Warikoo, Yung-Chun Chang, and Wen-Lian Hsu

##### Abstract

Natural Language Processing techniques are constantly being advanced to accommodate the influx of data as well as to provide exhaustive and structured knowledge dissemination. Within the biomedical domain, relation detection between bio-entities known as the Biomedical relation extraction (BRE) task has a critical function in knowledge structuring. Although recent advances in deep learning-based biomedical domain embedding have improved BRE predictive analytics, these works are often task selective or employ external knowledge-based pre/post processing. In addition, deep learning-based models do not account for local syntactic contexts, which have improved data representation in many kernel classifier-based models. In this study, we propose a universal BRE model, i.e. LBERT, which is a Lexically-aware Transformer-based Bidirectional Encoder Representation model, and which explores both local and global contexts representations for sentence level classification tasks. This paper presents one of the most exhaustive BRE studies ever conducted over five different bio-entity relation types. Our model outperforms state-of-the-art deep learning models in protein-protein (PPI), drug-drug (DDI) and protein-bio-entity (REL) relation classification tasks by 0.02%, 11.2% and 41.4% respectively. LBERT representations show a statistically significant improvement over BioBERT in detecting true bio-entity relation for large corpora like PPI. Our ablation studies clearly indicate the contribution of the lexical features and distance-adjusted attention in improving prediction performance by learning additional local semantic context along with bi-directionally learned global context.

#### Text-guided Graph Neural Networks for Referring 3D Instance Segmentation

Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI), February 2021

Pin-Hao Huang, Han-Hung Lee, Hwann-Tzong Chen and Tyng-Luh Liu

##### Abstract

This paper addresses a new task called referring 3D instance segmentation, which aims to segment out the target instance in a 3D scene given a query sentence. Previous work on scene understanding has explored visual grounding with natural language guidance, yet the emphasis is mostly constrained on images and videos. We propose a Text-guided Graph Neural Network for referring 3D instance segmentation on point clouds. Given a query sentence and the point cloud of a 3D scene, our method learns to extract per-point features and predicts an offset to shift each point toward its object center. Based on the point features and the offset, we cluster the points to produce fused features and coordinates for the candidate objects. The resulting clusters are modeled as nodes in a Graph Neural Network (GNN) to learn the representations that encompass the relation structure for each candidate object. The GNN layers leverage each object's features and its relations with neighbors to generate an attention heatmap for the input sentence expression. Finally, the attention heatmap is used to `guide" the aggregation of information from neighborhood nodes. Our method achieves state-of-the-art performance on the tasks of referring 3D instance segmentation and 3D localization on ScanRefer, Nr3D, and Sr3D benchmarks.

#### Comparison of different variant sequence types coupled with decoy generation methods used in concatenated target-decoy database searches for proteogenomic research

Journal of Proteomics, January 2021

Wai-Kok Choong and Ting-Yi Sung

##### Abstract

Concatenated target-decoy database searches are commonly used in proteogenomic research for variant peptide identification. Currently, protein-based and peptide-based sequence databases are applied to store variant sequences for database searches. The protein-based database records a full-length wild-type protein sequence but using the given variant events to replace the original amino acids, whereas the peptide-based database retains only the in silico digested peptides containing the variants. However, the performance of applying various decoy generation methods on the peptide-based variant sequence database is still unclear, compared to the protein-based database. In this paper, we conduct a thorough comparison on target-decoy databases constructed by the above two types of databases coupled with various decoy generation methods for proteogenomic analyses. The results show that for the protein-based variant sequence database, using the reverse or the pseudo reverse method achieves similar performance for variant peptide identification. Furthermore, for the peptide-based database, the pseudo reverse method is more suitable than the widely used reverse method, as shown by identifying 6% more variant PSMs in a HEK293 cell line data set.

#### Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent

Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 2020

Yun-Hsuan Jen, Chieh-Yang Huang, MeiHua Chen, Ting-Hao Huang and Lun-Wei Ku

##### Abstract

Many language learners have trouble using near-synonym words (e.g.,small vs.little; briefly vs.shortly) correctly, and often look for example sentences to learn how two nearly synonymous terms differ. Prior work uses hand-crafted scores to recommend sentences but has difficulty in adopting such scores to all the near-synonyms as near-synonyms differ in various ways. We notice that the helpfulness of the learning material would reflect on the learners performance. Thus, we propose the inference-based learner-like agent to mimic learner behavior and identify good learning materials by examining the agents performance. To enable the agent to behave like a learner, we leverage entailment modelings capability of inferring answers from the provided materials. Experimental results show that the proposed agentis equipped with good learner-like behavior to achieve the best performance in both fill-in-the-blank (FITB) and good example sentence selection tasks. We further conduct a classroom user study with college ESL learners.The results of the user study show that the proposed agent can find out example sentencesthat help students learn more easily and efficiently. Compared to other models, the proposed agent improves the score of more than 17% of students after learning.

#### Reactive Supervision: A New Method for Collecting Sarcasm Data

Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 2020

Boaz Shmueli, Lun-Wei Ku and Soumya Ray

##### Abstract

Sarcasm detection is an important task in affective computing, requiring large amounts of labeled data. We introduce reactive supervision, a novel data collection method that utilizes the dynamics of online conversations to overcome the limitations of existing data collection techniques. We use the new method to create and release a first-of-its-kind large dataset of tweets with sarcasm perspective labels and new contextual features. The dataset is expected to advance sarcasm detection research. Our method can be adapted to other affective computing domains, thus opening up new research opportunities.

#### Subspace-based Representation and Learning for Phonotactic Spoken Language Recognition

IEEE/ACM Transactions on Audio, Speech, and Language Processing, November 2020

Hung-Shin Lee, Yu Tsao, Shyh-Kang Jeng, and Hsin-Min Wang

##### Abstract

Phonotactic constraints can be employed to distinguish languages by representing a speech utterance as a multinomial distribution or phone events. In the present study, we propose a new learning mechanism based on subspace-based representation, which can extract concealed phonotactic structures from utterances, for language verification and dialect/accent identification. The framework mainly involves two successive parts. The first part involves subspace construction. Specifically, it decodes each utterance into a sequence of vectors filled with phone-posteriors and transforms the vector sequence into a linear orthogonal subspace based on low-rank matrix factorization or dynamic linear modeling. The second part involves subspace learning based on kernel machines, such as support vector machines and the newly developed subspace-based neural networks (SNNs). The input layer of SNNs is specifically designed for the sample represented by subspaces. The topology ensures that the same output can be derived from identical subspaces by modifying the conventional feed-forward pass to fit the mathematical definition of subspace similarity. Evaluated on the "General LR" test of NIST LRE 2007, the proposed method achieved up to 52\\%, 46%, 56%, and 27% relative reductions in equal error rates over the sequence-based PPR-LM, PPR-VSM, and PPR-IVEC methods and the lattice-based PPR-LM method, respectively. Furthermore, on the dialect/accent identification task of NIST LRE 2009, the SNN-based system performed better than the aforementioned four baseline methods.

#### Learning From Music to Visual Storytelling of Shots: A Deep Interactive Learning Mechanism

ACM Multimedia Conference, October 2020

Jen-Chun Lin, Wen-Li Wei, Yen-Yu Lin, Tyng-Luh Liu, and Hong-Yuan Mark Liao

##### Abstract

Interesting and emerging task. It produces a coherent visual story in the form of a shot type sequence, which not only expands the storytelling potential for a song but also facilitates automatic concert video mashup process and storyboard generation. In this study, we present a deep interactive learning (DIL) mechanism for building a compact yet accurate sequence-to-sequence model to accomplish the task. Different from the one-way transfer between a pre-trained teacher network (or ensemble network) and a student network in knowledge distillation (KD), the proposed method enables collaborative learning between an ensemble teacher network and a student network. Namely, the student network also teaches. Specifically, our method first learns a teacher network that is composed of several assistant networks to generate a shot type sequence and produce the soft target (shot types) distribution accordingly through KD. It then constructs the student network that learns from both the ground truth label (hard target) and the soft target distribution to alleviate the difficulty of optimization and improve generalization capability. As the student network gradually advances, it turns to feed back knowledge to the assistant networks, thereby improving the teacher network in each iteration. Owing to such interactive designs, the DIL mechanism bridges the gap between the teacher and student networks and produces more superior capability for both networks. Objective and subjective experimental results demonstrate that both the teacher and student networks can generate more accurate for improving the performance.

#### Self-similarity Student for Partial Label Histopathology Image Segmentation

16th European Conference on Computer Vision (ECCV), August 2020

Hsien-Tzu Cheng, Chun-Fu Yeh, Po-Chen Kuo, Andy Wei, Keng-Chi Liu, Mong-Chi Ko, Kuan-Hua Chao, Yu-Ching Peng, and Tyng-Luh Liu

##### Abstract

Delineation of cancerous regions in gigapixel whole slide images (WSIs) is a crucial diagnostic procedure in digital pathology. This process is time-consuming because of the large search space in the gigapixel WSIs, causing chances of omission and misinterpretation at indistinct tumor lesions. To tackle this, the development of an automated cancerous region segmentation method is imperative. We frame this issue as a modeling problem with partial label WSIs, where some cancerous regions may be misclassified as benign and vice versa, producing patches with noisy labels. To learn from these patches, we propose Self-similarity Student, combining teacher-student model paradigm with similarity learning. Specifically, for each patch, we first sample its similar and dissimilar patches according to spatial distance. A teacher-student model is then introduced, featuring the exponential moving average on both student model weights and teacher predictions ensemble. While our student model takes patches, teacher model takes all their corresponding similar and dissimilar patches for learning robust representation against noisy label patches. Following this similarity learning, our similarity ensemble merges similar patches' ensembled predictions as the pseudo-label of a given patch to counteract its noisy label. On the CAMELYON16 dataset, our method substantially outperforms state-of-the-art noise-aware learning methods by 5% and the supervised-trained baseline by 10% in various degrees of noise. Moreover, our method is superior to the baseline on our TVGH TURP dataset with 2% improvement, demonstrating the generalizability to more clinical histopathology segmentation tasks.