Pred2ary Juries
As indicated in Figure 5 of the paper, the accuracy of secondary
structure prediction increases with the size of the jury
of networks used to do the prediction.
There are currently three sizes of juries that can be used
to do predictions:
Small
This is a jury containing just one set of cascading networks. The first
level network is 17 x 30 x 2, and the second level network is
19 x 15 x 2. (The numbers refer to the number of consecutive residues
examined, the size of the hidden layer, and the number of outputs).
The "prediction set estimated accuracy" method described in the
paper is used to estimate the probabilities of helix, strand, and
coil at each position. The expected 3-state accuracy of this method on
a new profile of unknown sequences is 74.58%, as measured by cross
validated tests.
Medium
This is a jury containing eight sets of cascading networks, as
described in the paper and shown in Figure 5. The eight jurors are:
Juror 1 level 1 - 17 x 30 x 2
level 2 - 19 x 15 x 2
Juror 2 level 1 - 17 x 30 x 2
level 2 - 19 x 16 x 2
Juror 3 level 1 - 17 x 30 x 2
level 2 - 19 x 17 x 2
Juror 4 level 1 - 17 x 30 x 2
level 2 - 19 x 18 x 2
Juror 5 level 1 - 17 x 30 x 2
level 2 - 19 x 18 x 2
Reduced training set algorithm used for this juror.
Juror 6 level 1 - 17 x 30 x 2
level 2 - 19 x 19 x 2
Juror 7 level 1 - 17 x 30 x 2
level 2 - 19 x 19 x 2
Juror 8 level 1 - 17 x 30 x 2
level 2 - 19 x 20 x 2
The expected 3-state accuracy of this method on
a new profile of sequences is 74.76%, as measured by cross validation on
non-homologous sequences. Matthews correlation coefficients
for three states are 0.684, 0.544, and 0.550 for helix, strand, and coil.
A more detailed analysis of accuracy is given in Table 8 in the
paper.
Large
In performing cross validated tests to determine the expected accuracy
of the medium jury on new sequences, 120 separate sets of cascading
networks were trained. For each of the 8 topologies described above,
networks were trained on 15 different training sets. The large
jury combines these 120 network sets. The expected accuracy of this jury
on unknown sequences has (for obvious reasons) not been tested
by cross validation. Accuracy will be determined by future application on
new sequences. However, extrapolation of Figure 5 suggests that
a jury of 120 networks will be more accurate than a jury of 8.