Multi-class protein fold recognition using adaptive codes

Eugene Iea, Jason Westonb, William Stafford Noblec, Christina Lesliea

 

aCenter for Computational Learning Systems, Columbia University, New York, NY 10115, USA, bNEC Research Institute, Princeton, NJ 08540, USA, 
cDepartment of Genome Sciences, Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA.


Abstract

We develop a novel multi-class classification method based on output codes for the problem of classifying a sequence of amino acids into one of many known protein structural classes, called folds. Our method learns relative weights between one-vs-all classifiers and encodes information about the protein structural hierarchy for multi-class prediction. Our code weighting approach significantly improves on the standard one-vs-all method for the fold recognition problem. In order to compare against widely used methods in protein sequence analysis, we also test nearest neighbor approaches based on the PSI-BLAST algorithm. Our code weight learning algorithm strongly outperforms these PSI-BLAST methods on every structure recognition problem we consider.


This material is based upon work supported by the National Science Foundation under Grant No. 0312706.
Contact: Christina Leslie

Results
PDF
Supplementary data and code (setup with ranking perceptron code)
Computational Biology Group