Abstract


Because the general problem of predicting the tertiary structure of a globular protein from its sequence is so difficult, researchers have tried to predict regular substructures, known as secondary structures, of proteins. Knowledge of the position of these structures in the sequence can significantly constrain the possible conformations of the protein. Traditional protein secondary structures are α-helices, β-sheets, and coil. Secondary structure prediction programs have been developed, based upon several different algorithms. Such systems, despite their varied natures, are noted for their universal limit on prediction accuracy of about 65%. A possible cause for this limit is that traditional secondary structure classes are only a coarse characterization of local structure in proteins. This work presents the results of an alternative approach where local structure classes in proteins are derived using neural network and clustering techniques. These give a set of local structure categories, which we call Structural Building Blocks (SBBs), based upon the data itself, rather than a priori categories imposed upon the data. Analysis of SBBs shows that these categories are general classifications, and that they account for recognized helical and strand regions, as well as novel categories such as N- and C-caps of helices and strands.