Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18] [ 19] [ 20] [ 21] [ 22] [ 23] [ 24]


Journal of Information Science and Engineering, Vol. 27 No. 3, pp. 855-868 (May 2011)

Similarity Analysis between Transcription Factor Binding Sites by Bayesian Hypothesis Test*

+Department of Mathematics
School of Computer Science and Technology
Xidian University
Xi'an, 710071 P.R. China

Transcription factor binding sites (TFBS) in promoter sequences of higher eukaryotes are commonly modeled using position frequency matrices (PFM). The ability to compare PFMs representing binding sites is especially important for de novo sequence motif discovery, where it is desirable to compare putative matrices to one another and to known matrices. We propose to identify and group similar profiles using Bayesian hypothesis test between PFMs, describing a column-by-column method for PFM similarity quantification based on Bayes factor and posterior probability of null model that aligned columns are independent and identically distributed observation from the same multinomial distribution. We group TFBS frequency matrices from less redundant JASPAR into matrix families by cluster analysis according to Bayes factors and posterior probability of similar PFMs. Clusters of highly similar matrices are identified. We further compare the performance of this method to Pearson q2 test on simulated data. The proposed method is very simple, easily implemented and outperforms the other method in our test. Taking Pearson product moment correlation coefficient as an objective criterion of the performance, results indicate that Bayesian test performs better than the classical methods on average.

Keywords: transcription factor binding site, position frequency matrices, similarity, Bayes factor, posterior probability, cluster analysis

Full Text () Retrieve PDF document (201105_04.pdf)

Received October 30, 2009; revised May 5 & July 7, 2010; accepted August 23, 2010.
Communicated by Jorng-Tzong Horng.
* This work was supported by the National Natural Science Foundation of China (Grants No. 60705004) and the Fundamental Research Funds for the Central Universities (k50510030004).