Institute of Information Science Academia Sinica
Topic: TIGP (BIO) -- Handling the heterogeneity in genomic datasets
Speaker: Dr. Yingying Wei (Department of Statistics, The Chinese University of Hong Kong)
Date: 2017-08-16 (Wed) 10:00 – 11:30
Location: Auditorium 101 at IIS new Building
Host: TIGP Bioinformatics Program

Abstract:

High-throughput experimental data are accumulating exponentially in public databases. Unfortunately, however, mining valid scientific discoveries from these abundant resources is hampered by technical artifacts and inherent biological heterogeneity. Ignoring heterogeneity would lead to not only low statistical power but also often misleading scientific conclusions. In this talk, I will present two examples as illustration. In the first part, we propose a novel Bayesian hierarchical model to correct batch effects when sample groupings are unknown. We prove the model identifiability and provide conditions for study designs under which batch effects can be corrected. Application of the proposed model to a real breast cancer dataset combined from three bathes measured on two platforms offer much better biological insights compared to existing methods. In the second part, I will discuss transcription factor (TF) networks. TF networks are dynamic over diverse biological conditions and heterogeneous across the genome within each biological condition. We propose a Bayesian nonparametric dynamic Poisson graphical model for legitimate inference on heterogeneous TF networks. We develop an efficient parallel Markov Chain Monte Carlo algorithm for posterior computation and study TF associations in ENCODE cell lines.