FAQ: Questions about the definition of CATH code

The following is a reply from a recent email that may provide a useful explanation to others regarding the CATH code numbering with sequence clusters (SOLID).

Received on Aug 21, 2008 (reproduced with permission)

Hi, CATH team:

Upon reading the original paper published in 1997 and visiting your website, I am 
still confused about the definition of CATH code in sequence family levels. Take 
the CATH 3.1 reflected by these three below proteins as a example, my questions of 
them were as following:

1) the code in CATHSOLI level of 2a8vA01 and 1a8vA01 are all the same, but why 
their codes in D level were different?

    Does 2a8vA01 and 1a8vA01 not belong to the same s100 family?

2) the code in CATHSO  ID level of 1a8vA01 and 1a62001 are all the same, but why 
their codes in L level were different?

    If 1a8vA01 and 1a62001 are 100% sequence identical, why they were assigned to 
    different 95% sequence group? 

Sincerely.

backy

 
                                   35%   60%   95%   100%

            C    A     T     H      S     O     L     I     D 

2a8vA01     1    10   720    10     2     1     2     1     3    47 2.400
1a8vA01     1    10   720    10     2     1     2     1     1    49 2.000
1a62001     1    10   720    10     2     1     1     1     1    44 1.550

The reply on Aug 21, 2008

Hi Backy,

Thanks for getting in touch with us, hopefully I can answer your questions below:
 
    1) the code in CATHSOLI level of 2a8vA01 and 1a8vA01 are all the same, but why 
       their codes in D level were different?


The D level stands for "Domain Count" and is just there to provide a unique code 
for every domain - so if two domains are identical (i.e. they share everything up 
to the I, or 100% Identical, code) then we use the D level to differentiate 
between them - this is just a sequential counter.
 

        Does 2a8vA01 and 1a8vA01 not belong to the same s100 family?


Yes, they do - they share up to the I count so they are 100% identical - as mentioned
above - the domain level is just a counter to differentiate between domains in the 
same I cluster.

    2) the code in CATHSO  ID level of 1a8vA01 and 1a62001 are all the same, but 
       why their codes in L level were different?


You need to bear in mind that CATH is a tree-like hierarchy with the trunk of the
tree represented on the left of the CATHSOLID classification (e.g. the C code) and
the leaves of the tree on the right (e.g. the D code). In the example you give above 
- you have to read the CATH codes from left to right and stop the first time one of 
the codes differs. In this case, they differ at the 'L' code so they are in different 
S95% clusters. It doesn't matter that the numbers after this (I, D) are the same as 
they are talking about different branches of the tree.

        If 1a8vA01 and 1a62001 are 100% sequence identical, why they were assigned 
        to different 95% sequence group? 

The simple answer is that they aren't 100% identical - they have a seq id of 94.7% 
so they are in different L codes. As mentioned above, the I and D happen to be the 
same, but that doesn't mean anything if the L code is different (CATHSOLID needs to 
be read from left to right).

So for the following three domains:

2a8vA01 1.10.720.10.2.1.2.1.3
1a8vA01 1.10.720.10.2.1.2.1.1
1a62001 1.10.720.10.2.1.1.1.1

The tree/hierarchy would look something like:

C                   1
A                  10
T                  720
H                  10
S                   2
O                   1
L       1                        2
I       1                        1
D       1                  1           3
      1a62001           1a8vA01     2a8vA01

This seems like a good question/answer to add to our FAQ section of the website - would you mind?

Best wishes,

Ian Sillitoe
CATH Team

Linkbacks

faq [CATH] , 2008/10/20 11:30 (Pingback)
[...] this blog entry [...]