Seminar

Several issues in Data Mining (very) high-dimension and (very) low-sample size: an application to Metagenomics data

LecturerProf. Jean-Daniel Zucker (Senior Researcher, IRD)
Host: Wen-Lian Hsu
Time2013-08-27 (Tue.) 14:00 ~ 15:30
LocationAuditorium 106 at New IIS Building

Abstract

High-throughput technologies and today NGS have allowed the production of large genomic datasets: for instance, microarray data contain the simultaneous expression of tens of thousands of genes whereas NGS (Next Generation Sequencing) may reach several millions of genes. In this talk we will address several questions raised when addressing the task of building reliable classifier for this type of Data corresponding to such N << D problems. Feature selection and Feature Stability are several of the issues we will discuss mainly experimentally. We will point out a strong empirical correlation between the dimensionality/sample size ratio and selection instability. Finally we will discuss an original algorithm to learn ternary weight classifiers well adapted to deal with metagenomics data.

Keywords: High dimension data mining, abstraction, feature selection, feature stability.

BIO

Jean-Daniel Zucker (諸葛梁) is a former Engineer (ESIAE, 1985) in Computer Science and Aeronautical Engineering. He then received in 1986 a Master in Artificial Intelligence applied to Life Science. He worked for the New England Medical Center (Boston, USA), IBM and Thomson in R&D for six years. After a Master in Machine Learning in 1992 he got is PhD. in 1996 in Machine Learning from Paris 6 University where he became an associate professor focusing on representation changes and abstraction in learning. In 2002 he became Full Professor of Computer Science at Paris 13 University where he led a CNRS (French NSCF) team on Machine Learning and Transcriptomics. In 2008 he became a Senior Researcher at the National Institute of Research for Development (IRD) on the themes of Data Mining and Complex Systems in functional genomics and metagenomics. He is deputy director of the Complex Systems Laboratory UMMISCO UMI 209 (IRD and University Pierre and Marie Curie in Paris, France). At present he is posted in Hanoi Vietnam and participate to several projects in South-East Asia and Europe. In the past twenty years his main research interest has been Reformulation and Abstraction in Machine Learning. His Google Scholar h-index is 25, he has written more than 200 publications including 46 in peer reviewed journals. He co-authored very recently a book published by Springer “Abstraction in Artificial Intelligence and Complex Systems”(see http://abstractionthebook.com) and two papers in Press in Nature related to “Dietary intervention impact on gut microbial gene richness”.

Institute of Information Science, Academia Sinica

Events

Seminar

Several issues in Data Mining (very) high-dimension and (very) low-sample size: an application to Metagenomics data

Abstract

BIO