| Previous | [ 1] | [ 2] | [ 3] | [ 4] | [ 5] | [ 6] | [ 7] | [ 8] | [ 9] | [ 10] | [ 11] | [ 12] | [ 13] | [ 14] | [ 15] | [ 16] | [ 17] | [ 18] | [ 19] | [ 20] | [ 21] | [ 22] | [ 23] | [ 24] |
¡@
QIAN LIU+, SAN-YANG LIU AND LI-FANG LIU
+Department of Mathematics
School of Computer Science and Technology
Xidian University
Xi'an, 710071 P.R. China
Transcription factor binding sites (TFBS) in promoter sequences of higher eukaryotes
are commonly modeled using position frequency matrices (PFM). The ability to
compare PFMs representing binding sites is especially important for de novo sequence
motif discovery, where it is desirable to compare putative matrices to one another and to
known matrices. We propose to identify and group similar profiles using Bayesian hypothesis
test between PFMs, describing a column-by-column method for PFM similarity
quantification based on Bayes factor and posterior probability of null model that aligned
columns are independent and identically distributed observation from the same multinomial
distribution. We group TFBS frequency matrices from less redundant JASPAR into
matrix families by cluster analysis according to Bayes factors and posterior probability
of similar PFMs. Clusters of highly similar matrices are identified. We further compare
the performance of this method to Pearson £q2 test on simulated data. The proposed
method is very simple, easily implemented and outperforms the other method in our test.
Taking Pearson product moment correlation coefficient as an objective criterion of the
performance, results indicate that Bayesian test performs better than the classical methods
on average.
Received October 30, 2009; revised May 5 & July 7, 2010; accepted August 23, 2010.
Communicated by Jorng-Tzong Horng.
* This work was supported by the National Natural Science Foundation of China (Grants No. 60705004) and the
Fundamental Research Funds for the Central Universities (k50510030004).