| Previous | [ 1] | [ 2] | [ 3] | [ 4] | [ 5] | [ 6] | [ 7] | [ 8] | [ 9] | [ 10] | [ 11] | [ 12] | [ 13] | [ 14] | [ 15] | [ 16] | [ 17] | [ 18] | [ 19] | [ 20] | [ 21] | [ 22] | [ 23] | [ 24] | [ 25] |
¡@
LI-FANG LIU AND LI-CHENG JIAO+
School of Computer Science and Technology
+Institute of Intelligent Information Processing
Xidian University
Xi'an, 710071 P.R. China
For the motif discovery problem of DNA or protein sequences, a greedy two-stage
Gibbs sampling algorithm is presented, and the related software package is called Greedy
Motifsam. Based on position weight matrix (PWM) motif model, a greedy strategy for
choosing the initial parameters of PWM is employed. Two sampling methods, site sampler
and motif sampler, are used. Site sampler is used to find one occurrence per sequence of
the motif in the dataset. Motif sampler is used to find zero or more non-overlapping occurrences
of the motif in each sequence. The algorithm is capable of discovering several
different motifs with differing numbers of occurrences in a single dataset. We use the binding
sites (motif) information of eukaryotic transcription factors stored in TRANSFAC
database to test our methods. The prediction accuracy, scalability and reliability are
compared to several other methods. Our proposed method is also illustrated as applied to
helix-turn-helix proteins, lipocalins, and prenyltransferases. The Greedy Motifsam software
is available at http://lxy.xidian.edu.cn/math/intro/teachers/ qxg/MotifSAM.zip.
Received April 30, 2009; revised July 22, 2009; accepted September 3, 2009.
Communicated by Jorng-Tzong Horng.
* This work was supported by the National Natural Science Foundation of China under Grant No. 60705004
and the Fundamental Research Funds for the Central Universities, 2010.