Here is an example of this format (some aligned sequences for BPTI):
MSF this is my profile of bpti sequences sequence0 RPTFCNLLPE TGRCNALIPA FYYNSHLHKC QKFNYGGCGG NANNFKTIDE CQRTC... anotherseq ....CTSPPV TGPCRAGFKR YNYNTRTKQC EPFKYGGCKG NGNRYKSEQD CLDACSG. sequence2 .REVCSEQAE TGPCRAMISR WYFDVTEGKC APFFYGGCGG NRNNFDTEEY CMAVCGSA morebpti ..EVCSEQAE TGPCRAMISR WYFDVTEGKC APFFYGGCGG NRNNFDTEEY CMAVCG.. stillmore PPDLCQLPQA RGPCKAALLR YFYNSTSNAC EPFTYGGCQG NNBNFETTEM CLPPECIR lotsoseqs KPDFCFLEED PGICRGYITR YFYNNQSKQC ERFKYGGCLG NLNNFESLEE CKNTCENPYou can also load sequence data with interleaved lines and line numbers (which are ignored).
Pred2ary can't save files in this format, but it can load them correctly (except for insertions in the sequence of interest, which are ignored).
Pred2ary doesn't even try to save files in this format!
>sequence0 RPTFCNLLPE TGRCNALIPA FYYNSHLHKC QKFNYGGCGG NANNFKTIDE CQRTC... >anotherseq ....CTSPPV TGPCRAGFKR YNYNTRTKQC EPFKYGGCKG NGNRYKSEQD CLDACSG. >sequence2 .REVCSEQAE TGPCRAMISR WYFDVTEGKC APFFYGGCGG NRNNFDTEEY CMAVCGSA >morebpti ..EVCSEQAE TGPCRAMISR WYFDVTEGKC APFFYGGCGG NRNNFDTEEY CMAVCG.. >stillmore PPDLCQLPQA RGPCKAALLR YFYNSTSNAC EPFTYGGCQG NNBNFETTEM CLPPECIR >lotsoseqs KPDFCFLEED PGICRGYITR YFYNNQSKQC ERFKYGGCLG NLNNFESLEE CKNTCENP
Because this can store both profile info and the results on predicted secondary structure, it is the default format for saving output. An example of YAPF output is here, with my comments in italics:
YAPF crambinThe file starts off with a name for the whole profile.
NALIGN 60This line shows how many sequences are in the profile, just like in HSSP.
SEQNAME 1 emb|CAA57353| (X81709) Thionin class 1 [Tulipa gesneriana] SEQNAME 2 bbs|85043 thionin [Hordeum jubatum, Peptide, 137 aa] >gi|246216| bbs|85042thionin [Hordeum marinum=barley, leaf, Peptide, 137 aa]This part shows the full names of each sequences. These names get truncated in a MSF file. In YAPF format, each is on one line, so the line might be really long... however, no info gets lost. (58 more sequence names deleted for clarity)
SEQ 1 T T------------T-T---T--------T-------------------T-S--------- SEQ 2 T TSSSSSSSSSSSST-TSSSTSS-SSSSSTSSSSSSSSSSSS--T---STSS-S------- SEQ 3 C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC--CCTCCCCCC-C-CThis is the sequence at each position for each member of the profile. (remainder of sequence deleted for clarity)
PREDSS 1 T - 0.011084 0.022133 PREDSS 2 T - 0.019588 0.056546 PREDSS 3 C - 0.024238 0.060772 PREDSS 4 C - 0.035713 0.060518 PREDSS 5 P - 0.065223 0.061858 PREDSS 6 S - 0.069242 0.057937This shows the predicted secondary structure at each position ('-' means coil). The first column of floating point numbers contains the predicted helix probabilities (all 1-7% in this example), and the second such column shows the predicted strand probabilities. The coil probabilities is not shown explicitly in this file format; subtract the sum of the other two from 100% to calculate it.
(other secondary structure predictions deleted for clarity)
ENDThe format ends with an 'END' record, so it's easy to store multiple predictions in one file.
Lines beginning with * indicate the beginning of a new protein, and give the protein name.
Expected accuracy (and actual accuracy, if you loaded proteins in a format that contains the real secondary structure) are printed on the next couple of lines, in comments (comments in EA files begin with # characters).
Every subsequent line contains the sequence number, the consensus residue, the actual secondary structure (if supplied to the program; otherwise a '?' is shown), and the predicted secondary structure ('H' for helix, 'E' for extended, or strand, and '-' for coil). The final three numbers are the estimated probabilities of finding helix, strand, or coil at that position.
*9pti # expected accuracy is 77.87% # accuracy is 93.10% 1 R - 1.28% 2.13% 96.59% 2 P - 0.68% 0.68% 98.63% 3 D G - 2.79% 3.91% 93.30% 4 F G - 3.49% 16.91% 79.60% 5 C G - 3.96% 23.74% 72.30% 6 L G - 3.59% 34.08% 62.33% 7 E S - 1.26% 21.38% 77.36% 8 P - 3.39% 7.63% 88.98% 9 P - 0.63% 5.03% 94.34% 10 Y - 2.79% 3.91% 93.30% (more deleted)
Example:
>LCA_HUMAN MRFFVPLFLV GILFPAILAK QFTKCELSQL LKDIDGYGGI ALPELICTMF HTSGYDTQAI VENNESTEYG LFQISNKLWC KSSQVPQSRN ICDISCDKFL DDDITDDIMC AKKILDIKGI DYWLAHKALC TEKLEQWLCE KL
Example:
SEQRES 1 58 ARG PRO ASP PHE CYS LEU GLU PRO PRO TYR THR GLY PRO 9PTI 32 SEQRES 2 58 CYS LYS ALA ARG ILE ILE ARG TYR PHE TYR ASN ALA LYS 9PTI 33 SEQRES 3 58 ALA GLY LEU CYS GLN THR PHE VAL TYR GLY GLY CYS ARG 9PTI 34 SEQRES 4 58 ALA LYS ARG ASN ASN PHE LYS SER ALA GLU ASP CYS MET 9PTI 35 SEQRES 5 58 ARG THR CYS GLY ALA 9PTI 36
Sequences can be saved to PDB files, but there is no standard way to put secondary structure predictions in PDB files. Pred2ary uses a non-standard method of inserting JMCSTR records (which contain the same data as the EA files above).