Structures in the PDB repository are of varying quality. In order to maintain accuracy when classifying and clustering protein domains, it is important only to include high quality structures. For this reason we have defined a strict set of criteria, called SIFT, to state whether or not a PDB chain can be accepted into CATH.
For chains imported after version 2.x of CATH:
- resolve method must be either 'X-RAY' or 'NMR'
- fraction of non-alpha carbon must be >= 0.7
- sequence length must be >= 40 residues
For chains imported from version 2.x of CATH:
- resolve method can be unknown as long as resolution ⇐ 4.0
- sequence length can be < 40 residues