CATH Data Downloads

This page provides information on the data files that are available to download from the CATH FTP site.

See CATH Releases for more information on CATH and CATH-Plus.

CATH (daily snapshot)

ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/daily-release/newest/

File name Description
cath-b-newest-all.gz List the latest domain boundaries and superfamily (C.A.T.H) annotations for all CATH domains
cath-b-newest-names.gz Provides the names for each node in the CATH hierarchy
cath-b-newest-latest-release.gz List the latest domain boundaries and superfamily annotations for CATH domains in the most recent release of CATH-Plus
cath-b-newest-putative.gz List the latest domain boundaries and superfamily annotations for CATH domains released since the most release release of CATH-Plus
cath-b-s35-newest.gz List the latest domain boundaries and sequence family (C.A.T.H.S) annotations for all non-redundant sequence representatives

CATH-Plus (full release)

ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/

For information on the statistics for specific releases, see release notes.

CATH classification data

ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/cath-classification-data/

File name Description
cath-chain-list-<version>.txt Lists all of the PDB chain IDs in CATH, whether they are chopped into domains or not.
cath-domain-boundaries-*-<version>.txt Description of domain and segment boundaries for domains classified into CATH.
cath-domain-description-file-<version>.txt Description of each protein domain in CATH
cath-domain-list-<S35%|S60|S95|S100|all>-<version>.txt Lists of domains classified into CATH
cath-domain-pdb-*-<version>.txt Description of each domain PDB classified into CATH
cath-names-<version>.txt Name description of each node in the CATH hierarchy, along with an example domain
cath-superfamily-list-<version>.txt List of all the superfamilies in the CATH hierarchy
cath-unclassified-list-<version>.txt List of all unclassified protein chains and domains that are still being processed

Non-redundant data sets

ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/non-redundant-data-sets/

File name Description
cath-dataset-nonredundant-S[20|40].atom.fa The ATOM sequences of the domains in the dataset (which only contain residues that have ATOM records in the PDB file)
cath-dataset-nonredundant-S[20|40].fa The sequences of the domains in the dataset
cath-dataset-nonredundant-S[20|40].list A list of the domains in the dataset; one domain ID per line
cath-dataset-nonredundant-S[20|40].pdb.tgz (A gzipped tar file containing) the PDB files of the domains in the data set

Sequence data

ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/sequence-data/

File name Description
cath-domain-seqs-*-<version>.fa Sequences for each CATH domain
cath-S35-<version>-hmm3.lib.gz HMMs for each CATH representative domain from the sequence clusters at 35% sequence identity
funfam-hmm3-<version>.lib.gz HMMs for each functional family (FunFam)
cath-superfamily-seqs-<superfamily>-<version>.fa Sequences for each CATH superfamily in FASTA format