[中文版] [English Version]
  HomeContact us
 About IASL | Research | Publications | Demos | People
  Home>>Research>>Biological Computing

 
Knowledge-based Structure Prediction
For Protein secondary structure

Beta-turns play an important role in protein structures not only because of their sheer abundance, which is estimated to be approximately 25% of all protein residues, but also because of their significance in high-order structures of proteins. In this study, we introduce a new method of beta-turn prediction that uses a two-stage classification scheme and an integrated framework for input features. Ten-fold cross validation based on a benchmark dataset of 426 non-homologue protein chains is used to evaluate our method’s performance. The experimental results demonstrate that it achieves substantial improvements over the current best method.


(1) A knowledge-based hybrid method for protein secondary structure prediction based on local prediction confidence (Hsin-Nan Lin)

*Motivation:*

In our previous approach, we proposed a hybrid method for protein secondary structure prediction, called HYPROSP, which combined our proposed knowledge-based prediction algorithm PROSP and PSIPRED. The knowledge base constructed for PROSP contains small peptides together with their secondary structural information. The hybrid strategy of HYPROSP uses a global quantitative measure, *match rate* to determine which of PROSP and PSIPRED to be used for prediction of a target protein. HYPROSP made slight improvement of Q3 over PSIPRED because PROSP predicted well for proteins with match rate above 80%. As the portion of proteins with match rate above 80% is quite small and the performance of PSIPRED also improves, the advantage of HYPROSP is diluted. To overcome this limitation and further improve the hybrid prediction method, we present in this paper a new hybrid strategy HYPROSP II which is based on a new quantitative measure called *local match rate*.

*Results:*

Local match rate indicates the amount of structural information that each amino acid can extract from the knowledge base. With the local match rate, we are able to define a confidence level of the PROSP prediction results for each amino acid. Our new hybrid approach, HYPROSP II, is proposed as follows: for each amino acid in a target protein, we combine the prediction results of PROSP and PSIPRED using a hybrid function defined on their respective confidence levels. Two datasets in nrDSSP and EVA are used to perform a tenfold cross validation. The average Q3 of HYPROSP II is 81.8% and 80.7% on nrDSSP and EVA datasets, respectively, which is 2.0% and 1.1% better than that of PSIPRED. For local structures with match rate higher than 80%, the average Q3 improvement is 4.4% on the nrDSSP dataset. Using local match rate improves the accuracy better than global match rate. There has been a long history of attempts to improve secondary structure prediction. We believe HYPROSP II has greatly utilized the power of peptide knowledge base and raised the prediction accuracy to a new high. The method we developed in this paper could have a profound effect on the general use of knowledge base techniques for various prediction algorithms.

(2) HYPROSP: a hybrid protein secondary structure prediction algorithm— a knowledge-based approach (Kuen-Pin Wu, Hsin-Nan Lin)

We develop a knowledge-based approach (called PROSP) for protein secondary structure prediction. The knowledge base contains small peptide fragments together with their secondary structural information. A quantitative measure M, called match rate, is defined to measure the amount of structural information that a target protein can extract from the knowledge base. Our experimental results show that proteins with a higher match rate will likely be predicted more accurately based on PROSP. That is, there is roughly a monotone correlation between the prediction accuracy and the amount of structure matching with the knowledge base. To fully utilize the strength of our knowledge base, a hybrid prediction method is proposed as follows: if the match rate of a target protein is at least 80%, we use the extracted information to make the prediction; otherwise, we adopt a popular machine-learning approach. This comprises our hybrid protein structure prediction (HYPROSP) approach. We use the DSSP and EVA data as our datasets and PSIPRED as our underlying machine-learning algorithm. For target proteins with match rate at least 80%, the average Q3 of PROSP is 3.96 and 7.2 better than that of PSIPRED on DSSP and EVA data, respectively.

(3) SymPred: A dictionary based protein secondary structure prediction server

When characterizing the structural topology of proteins, protein secondary structure (PSS) plays an important role in analyzing and modeling protein structures because it represents the local conformation of amino acids intoregular structures. Although PSS prediction has been studied for decades, the prediction accuracy reaches a bottleneck at around 80%, and further improvement is very difficult. we present an improved dictionary-based PSS prediction method called SymPred, which adopts the concept behind natural language processing techniques and propose synonymous words to capture local sequence similarities in a group of similar proteins. A synonymous word is an n-gram pattern of amino acids that reflects the sequence variation in a protein’s evolution. We generate a protein-dependent synonymous dictionary from a set of protein sequences for PSS prediction. On a large non-redundant dataset of 8,297 protein chains, the average Q3 score of SymPred is 81.0%, which is higher than that of PSIPRED by 0.9%.


For Protein Local Structure

(3) A Knowledge-based Approach to Protein Local Structure Prediction (Ching-Tai Chen)

Local structure prediction is an important step towards protein tertiary structure prediction. We build up a sequence-structure knowledge base which stores information of local structures and amino acids. Then a hybrid method based on our knowledge base and neural network is proposed for predicting backbone local structure of proteins. Our method yields over 60% accuracy, which is significantly higher than others. Results also demonstrates that our approach is applicable to different local structure libraries. Related paper of our work appear in Journal of Bioinformatics and Computational Biology 2006.
For Protein beta-turn

(4) A two-stage classifier for protein beta-turn prediction using support vector machines (Hua-Sheng Chiu)

Beta-turns play an important role in protein structures not only because of their sheer abundance, which is estimated to be approximately 25% of all protein residues, but also because of their significance in high-order structures of proteins. In this study, we introduce a new method of beta-turn prediction that uses a two-stage classification scheme and an integrated framework for input features. Ten-fold cross validation based on a benchmark dataset of 426 non-homologue protein chains is used to evaluate our method’s performance. The experimental results demonstrate that it achieves substantial improvements over the current best method.
top

Back to
Research
 
 
 
Wen-Lian Hsu
Professor, IEEE Fellow
Research Fellow
Institute of Information Science ,
Academia Sinica, Taipei,
Taiwan, R. O. C.
Phone:
886-2-27883799 ext.1804
Fax:
886-2-27824814
E-mail: hsu@iis.sinica.edu.tw

 

 
 
Ting-Yi Sung
Research Fellow
Institute of Information Science ,
Academia Sinica, Taipei,
Taiwan, R. O. C.
Phone:
886-2-27883799 ext.1711
Fax:
886-2-27824814
E-mail:
 tsungiis.sinica.edu.tw

 

 
Intelligent Agent Systems Lab., Institute of Information Science, Academia Sinica.
128 Academia Road, Sec.2, Nankang, Taipei, Taiwan, ROC
Tel: +886-2-2788-3799, Fax: 886-2-2782-4814, 886-2-2651-8660