Pred2ary Juries

As indicated in Figure 5 of the paper, the accuracy of secondary structure prediction increases with the size of the jury of networks used to do the prediction.

There are currently three sizes of juries that can be used to do predictions:

Small

This is a jury containing just one set of cascading networks. The first level network is 17 x 30 x 2, and the second level network is 19 x 15 x 2. (The numbers refer to the number of consecutive residues examined, the size of the hidden layer, and the number of outputs). The "prediction set estimated accuracy" method described in the paper is used to estimate the probabilities of helix, strand, and coil at each position. The expected 3-state accuracy of this method on a new profile of unknown sequences is 74.58%, as measured by cross validated tests.

Medium

This is a jury containing eight sets of cascading networks, as described in the paper and shown in Figure 5. The eight jurors are:
  Juror 1  level 1 - 17 x 30 x 2
           level 2 - 19 x 15 x 2

  Juror 2  level 1 - 17 x 30 x 2
           level 2 - 19 x 16 x 2

  Juror 3  level 1 - 17 x 30 x 2
           level 2 - 19 x 17 x 2

  Juror 4  level 1 - 17 x 30 x 2
           level 2 - 19 x 18 x 2

  Juror 5  level 1 - 17 x 30 x 2
           level 2 - 19 x 18 x 2
           Reduced training set algorithm used for this juror.

  Juror 6  level 1 - 17 x 30 x 2
           level 2 - 19 x 19 x 2

  Juror 7  level 1 - 17 x 30 x 2
           level 2 - 19 x 19 x 2

  Juror 8  level 1 - 17 x 30 x 2
           level 2 - 19 x 20 x 2
The expected 3-state accuracy of this method on a new profile of sequences is 74.76%, as measured by cross validation on non-homologous sequences. Matthews correlation coefficients for three states are 0.684, 0.544, and 0.550 for helix, strand, and coil. A more detailed analysis of accuracy is given in Table 8 in the paper.

Large

In performing cross validated tests to determine the expected accuracy of the medium jury on new sequences, 120 separate sets of cascading networks were trained. For each of the 8 topologies described above, networks were trained on 15 different training sets. The large jury combines these 120 network sets. The expected accuracy of this jury on unknown sequences has (for obvious reasons) not been tested by cross validation. Accuracy will be determined by future application on new sequences. However, extrapolation of Figure 5 suggests that a jury of 120 networks will be more accurate than a jury of 8.