Research Description

Our major research topics focus on bioinformatics and biological problem study. What I have been doing is to develop useful bioinformatics tools and novel methodologies not only for service to a wide range of biologists but also for ourselves to study important biological problems as well as to answer interesting biological questions. Although I am a computer scientist by training, I firmly believe that studying biological problems and questions through collaboration with biologists can definitely enhance our understanding of what problems and questions are important and interesting to the biology community and what kinds of new bioinformatics tools are urgently needed for biologists.

Currently, our lab focuses on using the next-generation sequencing (NGS) technology to study the genomics and transcriptomics related topics including: (a) gene regulation and molecular evolution of Kranz anatomy formation during C4 plant development, (b) microRNAs in diseases and B cell differentiation, and (c) development of short read sequence assembler and related analysis tools. In what follows, I will introduce these topics and the collaborators in detail.

(a) Gene regulation and molecular evolution of Kranz anatomy formation during C4 plant development

Most plants can be divided into two types: C3 and C4. C4 plants, such as maize, have more efficient photosynthesis than C3 plants and can survive better in more extreme environments, such as high temperatures and arid lands. However, several food plants, such as rice or wheat, are all C3 plants; so their growth environments and areas are very limited. If we can increase the photosynthesis efficiency or even construct the C4 photosynthesis pathway to these C3 food plants, their productivity can be definitely increased and planting environments can be extended to more areas. It will be greatly helpful to solve the food crisis for Third World countries. However, the regulatory network of photosynthesis is very complex. Particularly, C4 plants require the coordination of the mesophyll and bundle sheath cells, called Kranz leaf anatomy, to confer high rates of photosynthesis. Moreover, how many genes are involved in the Kranzanatomy formation and how they regulate the formation during the early leaf development are largely unknown. In 2010, Academician Dr. Wen-Hsiung Li formed a group to study this topic. The participants are from different institutions in Academia Sinica and National Chiayi University, Taiwan and their professional specialties include plant photosynthesis, plant physiology, gene transformation, molecular evolution, bioinformatics, and computational biology. We want to study gene regulatory differences between C3 and C4 photosynthesis pathways and the Kranz anatomy formation during the early leaf development. In this project, our lab is responsible for processing and analyzing transcriptomic deep sequencing data, predicting key regulators of Kranz anatomy formation and C4 photosynthesis pathway, and reconstructing gene regulatory networks of C3 and C4 photosynthesis pathways.

(b) MicroRNAs co-targeting in diseases and B cell differentiation

MicroRNAs (miRNAs) play an important role in development, cell differentiation, and diseases. Most related studies only focused on studying regulatory mechanisms of individual miRNAs and single target genes. But most gene regulations are so complex and may compensate each other in vivo. That is, individual target genes could be regulated by a set of miRNAs and several miRNAs may co-target a common target gene. However, only few studies on miRNA co-targeting have been reported in literature. In this study, we are working on development of a systematic approach to integrate the experimental data, acquired from NGS and other platforms, and the information, from databases and literature, to predict miRNA co-targeting networks in diseases and cell differentiation. The topics we want to study include breast cancer metastasis (with Dr. Yu, Alice Lin-Tsing, GRC, Academia Sinica, Taiwan), B cell differentiation (with Dr. Lin, Kuo-I, GRC, Academia Sinica, Taiwan), and cardiac hypertrophy (with Dr. Chen, Chien-Chang, IBMS, Academia Sinica, Taiwan). We have sequenced small RNA samples collected from these collaborators' laboratories. We start to analyze the sequenced raw read data and will predict potential co-targeting pathways and finally validate by experiment. We expect that our results can increase the knowledge of miRNA regulations in different diseases and cell differentiation.

(c) Development of short read sequence assembler and related analysis tools

Short read sequencing (SRS) platforms, including Illumina Genome Analyzer II and HiSeq 2000 and AppliedBiosystems SOLiD with high throughput and relatively low cost, have been widely used for genomic andtranscriptomic studies in recent years. However, in contrast to the Sanger technique, which produces reads of ~1000 bp, short read sequencers produce shorter but a much greater number of reads, thus posing new challenges to many computational issues, such as de novo genome and transcriptome assemblies. A set of state-of-the-art approaches for processing SRS data were developed while the read lengths of SRS were from 40 bp to 76 bp. However, the read length and throughput have been greatly improved. The read length byIllumina Genome Analyzer IIx (GAIIx), for example, has been increased to ~150 bp, and the throughput has been increased from 12 Gb to 95 Gb per run. Although longer read length and higher numbers of reads provide more information for data processing, the computer memory requirements and execution times usually also increase with read length and total read count. In this topic, we have focused on developing a new de novo assembly algorithm to assemble a genome using Illumina short reads alone. The prototype has been almost completed and achieves better memory usage, speed, and contig quality than many current methods. Next, we will focus on developing effective and efficient algorithms to identify genome structural variations and the breakpoints by Illumina short read data directly.