| Previous | [ 1] | [ 2] | [ 3] | [ 4] | [ 5] | [ 6] | [ 7] | [ 8] | [ 9] | [ 10] | [ 11] | [ 12] | [ 13] | [ 14] | [ 15] | [ 16] | [ 17] | [ 18] | [ 19] | [ 20] | [ 21] | [ 22] | [ 23] | [ 24] | [ 25] |
¡@
ABDOLLAH DEHZANGI, SOMNUK PHON-AMNUAISUK AND OMID DEHZANGI*
Center of Artificial Intelligence and Intelligent Computing
Faculty of Information Technology
Multimedia University
Cyberjaya, Selangor, 63100 Malaysia
E-mail: {abdollah.dehzangi07; somnuk.amnuaisuk}@mmu.edu.my
*School of Computer Engineering
Nanyang Technological University
Nanyang Avenue, 639798 Singapore
E-mail: omid0002@ntu.edu.sg
The functioning of a protein in biological reactions crucially depends on its threedimensional
structure. Prediction of the three-dimensional structure of a protein (tertiary
structure) from its amino acid sequence (primary structure) is considered as a challenging
task for bioinformatics and molecular biology. Recently, due to tremendous advances
in the pattern recognition field, there has been a growing interest in applying
classification approaches to tackle the protein fold prediction problem. In this paper,
Random Forest, as a kind of ensemble method, is employed to address this problem. The
Random Forest, is a recently introduced method based on bagging algorithm that trains a
group of base classifiers by randomly selecting sets of features and then, combining results
obtained from base classifiers by majority voting. To investigate the effectiveness
of the number of base learners to the performance of the Random Forest, twelve different
numbers of base classifiers (between 30 and 600) are applied for this classifier. To
study the performance of the Random Forest and compare its results with previously reported
results, the dataset produced by Ding and Dubchak is used. Our experimental results
show that the Random Forest enhances the prediction accuracy (using same set of
features proposed by Dubchak et al.) as well as reduces time consumption of the protein
fold prediction task, compared to the previous works found in the literature.
Received November 16, 2009; revised February 4, 2010; accepted May 6, 2010.
Communicated by Jorng-Tzong Horng.