The validation sets for multiple structure alignments described in (citation) are available here for download as text files. File formats are as follows:
Each motif that is present in at least two members of a SCOP family or superfamily in the astral-40 subset is listed. Motifs are sorted by SCOP family (or superfamily).
Files are sorted by (super)family. The SCOP (super)family identifier, the (super)family description, and domains in the astral-40 subset of SCOP version 1.65 are given in a line beginning with a carat (">"). Fields are separated by a vertical bar ("|"), and domain identifiers are delimited by commas.
Each line underneath the SCOP information contains information on one eMOTIF. Fields are separated by a vertical bar ("|"). The first field is the eMOTIF and the second is the eMOTIF description. Each remaining field contains a SCOP domain identifier and the residue at which the motif begins. The character after the residue is the insertion code (or a dash, if none), and the next character is the chain identifier (or a dash, if none).
Some eMOTIFs may overlap. We suggest counting each residue covered by a motif only once.
A domain may contain multiple copies of the same PROSITE pattern. When determining the alignment accuracy in such a case, we suggest counting only the motif copy with the highest alignment accuracy.
Many PROSITE patterns contain variable repeats. We have rewritten each pattern to specify the exact number of repeats found in each particular hit. For example, the pattern "[RK]-x(2,3)-[DE]-x(2,3)-Y." may be listed as "[RK]-x-x-[DE]-x-Y."
| Family motifs | Superfamily motifs |
| Family motifs | Superfamily motifs |