Institute of Information Science, Academia Sinica

Events

Print

Press Ctrl+P to print from browser

Seminar

TIGP--Visualization of Categorical Phenomena

  • LecturerDr. Chun-houh Chen (Institute of Statistical Science, Academia Sinica)
    Host: Miss Elsa Pan
  • Time2010-12-02 (Thu.) 14:00 – 15:00
  • LocationAuditorium 106 at new IIS Building
Abstract
Exploratory data analysis (EDA, Tukey, 1977) has been introduced and extensively used for more than 30 years yet boxplot and scatterplot are still the major EDA tools for visualizing continuous data in the 21st century. On the other hand, multiple morrespondence analysis (MCA) type of methods (HOMALS: Gifi, 1990; MCA: Benzecri et al., 1973; Dual Scaling: Nishisato, 1984) and mosaic plots (Hartigan and Kleiner,1981; Friendly, 1994) are most popular in practice for visualizing multivariate categorical data. But all these methods loose their efficiency when data dimensionality gets really high (hunderds/thousands), particularly when data is of nominal nature. The categorical generalized association plots (cGAP) is an extension of the generalized association plots (GAP: Chen, 2002; Tien et al., 2008; Wu et al., 2010), which was developed as a matrix visualization environment for high-dimensional categorical data. Integrating matrix visualization with HOMALS’s reduced joint space for samples and variables of categorical nature, cGAP can effectively present complex patterns for thousands of categorical variables for thousands of subjects in one matrix visualization. Data generated and collected from biomedical experiments and studies are quite often of categorical nature. In this talk cGAP will be applied to analyze several such high dimensional categorical data sets. We believe GAP and cGAP related matrix visualization techniques have great potential to become major data/information visualization tools for next generation EDA. Related information can be obtained at: http://gap.stat.sinica.edu.tw/Software/index.htm �� Tukey, John Wilder (1977). Exploratory Data Analysis. Addison-Wesley. �� A. Gifi (1990), Nonlinear Multivariate Analysis, John Wiley & Sons Ltd., reprint 1996. �� Benzecri, J. P. et al., 1973. L’analyse desdonnées. II. L’analyse ces correspondances. Dunod, Paris, 619 pp. �� Nishisato, S. (1984). Dual scaling by reciprocal medians. it Estratto Dagli Atti della XXXII riunione Scientifica. Sorrento, 141-147. �� Hartigan, J. A., and Kleiner, B. (1981). Mosaics for contingency tables. In W. F. Eddy (Ed.), Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface. New York: Springer-Verlag. �� Friendly, M. (1994a). Mosaic displays for multi-way contingency tables. Journal of the American Statistical Association, 89, 190-200. �� Chen, C. H. (2002). Generalized Association Plots for Information Visualization: The applications of the convergence of iteratively formed correlation matrices, Statistica Sinica, 12, 1-23. �� Tien, Y. J., Lee, Y. S., Wu, H. M., and Chen, C. H*. (2008). Methods for Simultaneously Identifying Coherent Local Clusters with Smooth Global Patterns in Gene Expression Profiles, BMC Bioinformatics, 9:155. �� Wu, H. M., Tien, Y. J., and Chen, C. H*. (2010). GAP: A graphical environment for matrix visualization and cluster analysis, Computational Statistics and Data Analysis, 54 (3), 767-778.