Page 14 - 2017 Brochure
P. 14
earch Laboratories

Bioinformatics Lab

Our current research is focused on bioinformatics for structure properties, and significantly improves the accuracy
“omics” studies, classified into two main areas: (i) genomics of transcription factor binding site prediction. In addition,
and transcriptomics, and (ii) proteomics and metabolomics. we have made two important discoveries related to nucleic
These areas are described below. acid structure/function: (a) The occurrence of non-B DNA
structure motifs is significantly correlated to exon skipping
1. Genomics and Transcriptomics events, demonstrating that structural blockage plays a role
in transcriptional-coupled splicing. (b) Enhancers are highly
Developing Methods for Sequencing Data. With the related to DNA looping and the transcripts (eRNAs) may be
ascension of next-generation sequencing (NGS) as the associated with the selective activation of enhancer target
predominant technology for genomics and transcriptomics genes in mouse.
studies, we have devoted ourselves to developing new
methodologies and tools for analyzing NGS data. For NGS Regulatory Networks. To explore important nodes/hubs
read mapping, we have developed an ultra-efficient divide- and fragile motifs in a complex biological network, we have
and-conquer algorithm, called Kart, which divides a read implemented eleven topological algorithms as cytohubba
into small fragments that can be aligned independently. (http://apps.cytoscape.org/apps/cytohubba). Since 2011,
Our experiments show that Kart is 3 to 10-times faster than it has been downloaded over 9,000 times and cited over
other aligners and still produces reliable alignments, even 180 times. The new version of cytohubba was released in
when the error rate is as high as 15%. The same strategy Jan, 2017 and within three months was downloaded more
has also been applied to our RNA-seq mapper, and we than 400 times. All the algorithms will be interfaced into the
obtained superior results to comparable technologies. Galaxy framework and distributed as an image via Docker
For de novo genome assembly, we have proposed an and virtual machine (VM). In this way, these algorithms can
extension-based assembler, called JR-Assembler. This be easily integrated into our analytic pipelines and support
tool can assemble giga-base-pair genomes from lllumina broad applications for the biomedical research community.
short reads, while achieving better overall assembly quality
with faster execution times. Moreover, JR-Assembler has 2. Proteomics and Metabolomics
advantages of improved memory usage and execution time
that increases slowly as the read length increases. Bioinformatics for Mass Spectrometry-based Proteomics.
Mass Spectrometry (MS) has become the predominant
Integrated Tool and Platform Development for Sequencing technology for proteomics research. There are two
Data. We are developing pre-assembly and post-assembly complementary approaches for MS experiments: a bottom-
analytics for NGS and third generation sequencing (3GS), up approach (also called shotgun) and a top-down
based on a MapReduce framework, which will predict approach. Currently, most proteomics research uses the
repetitiveness and sequencing errors of a read and optimize shotgun approach. Thus, we have developed applicable
efficacy and efficiency of de novo genome assembly. computational methods and tools for protein identification
Meanwhile, we are also developing a cloud-based and quantitation. Though many sequence database
architecture to further speed up the execution of de novo search tools are available for protein identification, they
genome assemblers. For assembled genomes, we have cannot be used to identify intact glycopeptides. We have
implemented pipelines that can decipher genome structure, proposed algorithms and implemented an automated
annotate genes and rapidly estimate expression profiles. tool, called MAGIC, for intact glycoprotein identification.
Using our own web platform (http://molas.iis. sinica. Furthermore, we have developed a web server, called
edu.tw), and our integrated approach toward genomics, MAGIC-web, to tackle large-scale and targeted glycoprotein
transcriptomics, proteomics, methylomics and other omics, identification. For quantification of individual proteins, we
we and our collaborators are tackling projects related to
fusion gene discovery from clinical biopsies, functional
annotation for non-model organisms (e.g., Giant grouper,
http://molas.iis.sinica.edu.tw/grouper2016 and Japanese
eel, http://molas.iis.sinica.edu.tw/jpeel2016), precision
phenotyping of pathogens and identification of mechanisms
that restrict virus replication for vaccine development.

Transcriptional Regulation. Transcription factor binding is
determined by the presence of specific sequence motifs
and chromatin accessibility. We have developed a random
forest model, which considers the chromatin state and DNA

12 研究群 Research Laboratories
   9   10   11   12   13   14   15   16   17   18   19