Research Fellow  |  Tsai, Huai-Kuang  
lab (New window)
Research Descriptions

        The transcription of genes is controlled by interaction between transcription factors (TFs) and their binding sites (TFBSs) (or cis-regulatory elements). Inferring the function of a TF and identifying its binding sites are helpful for understanding the mechanism of transcriptional regulation. In the past years, we develop a TFBS identification method, TFBSfinder, which utilizes several data sources, including DNA sequences, phylogenetic information, microarray data and ChIP-chip data. For a TF, TFBSfinder rigorously selects a set of reliable target genes and a set of non-target genes (as a background set) to find overrepresented and conserved motifs in target genes. A new metric for measuring the degree of conservation at a binding site across species and methods for clustering motifs and for inferring position weight matrices are proposed. Besides, TFBSs are usually degenerate. We further propose a method for discovering gapped motifs. A gapped transcription factor-binding site (TFBS) contains one or more highly degenerate positions. Discovering gapped motifs is difficult, because allowing highly degenerate positions in a motif greatly enlarges the search space and complicates the discovery process. These obstacles were surmounted by the following a pattern mining technique and position concurrences. Empirical tests on known TFBSs show that the new method is highly accurate in identifying gapped motifs, outperforming current methods, and it also works well on non-gapped motifs, achieving high sensitivity and specificity for predicting experimentally verified TFBSs. In addition, we constructed a user-friendly interactive platform for dynamic binding site mapping using ChIP-chip data and phylogenetic footprinting as two filters. Based MYBS, we further investigated the impact of DNA binding position variants on yeast gene expression. Although the prevailing assumption is that nucleotide variants at such positions are functionally equivalent, there is increasing evidence that such variants play a role in regulation of gene expression. We therefore propose a method for studying the relationship between the expression of target genes and nucleotide variants in TFBS motifs at a genome-wide scale in Saccharomyces cerevisiae, especially the combinatorial effects of variants at two positions. Our analysis shows that nucleotide variations in more than one-third of variable positions and in 20% of dependent position pairs are highly correlated to gene expression. We define such positions as functional. However, some positions are only functional as dependent pairs, but not individually. In addition, a significant proportion of the functional positions have been well conserved across all yeast-related species studied. Our analysis supports the importance of nucleotide variants at variable positions of TFBSs in gene regulation.