CATH

BADASP

BADASP can produce different measures:

bad: similar the Type II of functional divergence. The threshold to choose depend if we want to be stringeant (i.e. BAD > 4) or more relaxed (BAD > 2).
badn = BADN variant of BAD: similar the Type I of functional divergence, between two groups.
badx = BADX variant of BAD: similar the Type II of functional divergence, between many groups.
ssc = Livingstone & Barton method (SSC) ⇒ doesn't use ancestral reconstruction. Was developed prior to BAD.
pdad = Property Difference After Duplication (PDAD) method
eta = Basic Evolutionary Trace Analysis (ETA) ⇒ Strictly conserved residues = 1, else = 0.
etaq = Quantitative variant of ETA

All these methods are described in details in the manual, chapter 3.1: Functional Specificity Prediction.

Installation

Download the badasp archive and unzip it: http://www.southampton.ac.uk/~re1u06/software/badasp/index.html

wget http://www.southampton.ac.uk/~re1u06/software/downloads/badasp.zip
unzip badasp.zip

Analysis of the V-type proton ATPase 116 kDa subunit a gene family

We want to identify the residues making differences between the isoforms 1 and isoforms 4 of the V-type proton ATPase 116 kDa subunit a.

First, visualise briefly the multiple alignment in Jalview. (File “badasp_eg.fas” in the badasp folder.

Execute badasp by importing the multiple alignment in FASTA format (“badasp_eg.fas”) and activating the interactive mode (i=1):

cd ./badasp  # Folder of installation
python badasp.py seqin=badasp_eg.fas i=1

Badasp will ask for the associated tree, in newick format (“badasp_eg.nsf”):

Looking for treefile badasp_eg.nsf.
Tree: ['seqin=badasp_eg.fas', 'i=1', 'nsfin=badasp_eg.nsf']  <ENTER> to continue

=> nsfin=badasp_eg.nsf

=> Press enter

Display Tree, with two groups of sequences: V-type proton ATPase 116 kDa subunit a

VPP1 = VPP Isoform 1 (8 genes)
NVL = VPP Isoform 4 (3 genes)

Rooted Tree (1000 bootstraps). Branch Lengths given. 21 nodes.  <ENTER> to continue.
=> Press enter

Tree is rooted at node 21 => perfect
=> Press 0, then enter.

 *** Tree Menu *** 

Sequence Data are already imported => we quit the menu.

Choice [default=Q]:  q 
Quit Tree Menu? (y/n) [default=Y]:  y

The tree is now loaded and we need to define the two groups to analyse:

#*# Grouping Summary #*#

Currently 0 groups. (11 Orphans)
=> Press enter

# We need to split the tree on the node 21,
# so we need to define two groups from the children nodes 20 (= VPP1 subfamily) and 19 (= VPP4 subfamily) .
=> Press M, then enter.  # Manual grouping
(Tree displayed)
Choice? [default=Q]:  c  # We collapse nodes
Node [default=0]: 20
=> Type VPP1, then Press enter

Choice? [default=Q]:  c  # We collapse nodes
Node [default=0]:  19
=> Type VPP4, then Press enter

Choice? [default=Q]:  Q, then enter  # We collapse node
Quit Tree Edit? (y/n) [default=Y]:  y

#*# Grouping Summary #*#
ENTER> to continue.
Choice for Grouping? [default=K]: K, then enter
Keep Groups? (y/n) [default=Y]:  Y, then enter
Save groups? (y/n) [default=Y]:  y
Name of Groupfile? [default=badasp_eg.grp]:  enter
Write Group Names? (y/n) [default=N]:  N
Use badasp_eg for output filenames? (y/n) [default=Y]:  enter
Use these parameters? (y/n) [default=Y]:  enter

Badasp will now perform some computations. It will reconstruct the ancestral sequences at each node of the tree, using GASP (ref: http:dx.doi.org/10.1186/1471-2105-5-123 )

Making Ancestral Sequences - Variable PAM Weighting
Reading PAM1 matrix from jones.pam

# #Start computing
Saving Ancestral Sequences in badasp_eg.anc.fas...  <ENTER> to continue.
Method BADX needs query but none given. Drop BADX from specificity methods? (y/n) [default=Y]:  n
Method BADX needs query but none given. Use sequence 1 (vpp1_HUMAN/Q8N5G7)? (y/n) [default=N]: y 

Calculating ['BAD', 'BADN', 'BADX', 'SSC', 'PDAD', 'ETA', 'ETAQ'] scores... (849 residues) ...win(0)  <ENTER> to continue.
...Done!  <ENTER> to continue.
...win(0)  <ENTER> to continue. # (many times !)

Now, Badasp will ask you the kind of output you want. Let's say yes to everything.

Output additional, filtered results? (y/n) [default=N]:  y
Name for partial results file? [default=badasp_eg.partial.badasp]: enter 

Output subfam 1 (VPP4) details (pos,aa & win)? (y/n) [default=Y]:  y
Output subfam 2 (VPP1) details (pos,aa & win)? (y/n) [default=Y]:  y
Output BAD results? (y/n) [default=Y]:  
Output BADN results? (y/n) [default=Y]:  y
Output BADX results? (y/n) [default=Y]:  y
Output SSC results? (y/n) [default=Y]:  y
Output PDAD results? (y/n) [default=Y]:  y
Output ETA results? (y/n) [default=Y]:  y
Output ETAQ results? (y/n) [default=Y]:  y
Output Info results? (y/n) [default=Y]:  y
Output PCon_Abs results? (y/n) [default=Y]:  y
Output PCon_Mean results? (y/n) [default=Y]:  y
Output QPCon_Mean results? (y/n) [default=Y]:  y
Output QPCon_Abs results? (y/n) [default=Y]:  y
Filter Rows by Results VALUES? (y/n) [default=Y]:  y
Min. value for BAD? [default=-6.708333]:  
=> New value = "-6.708333"? (y/n) [default=Y]:  
Min. value for BADN? [default=-6.708333]:  
=> New value = "-6.708333"? (y/n) [default=Y]:  
Min. value for BADX? [default=-3.500000]:  
=> New value = "-3.500000"? (y/n) [default=Y]:  
Min. value for SSC? [default=0.000000]:  
=> New value = "0.000000"? (y/n) [default=Y]:  
Min. value for PDAD? [default=-0.297619]:  
=> New value = "-0.297619"? (y/n) [default=Y]: 
 Min. value for ETA? [default=0.000000]:  
=> New value = "0.000000"? (y/n) [default=Y]: 
 Min. value for ETAQ? [default=0.000000]:  
=> New value = "0.000000"? (y/n) [default=Y]:  
Min. value for Info? [default=0.424111]:  
=> New value = "0.424111"? (y/n) [default=Y]:  
Min. value for PCon_Abs? [default=1.000000]:  
=> New value = "1.000000"? (y/n) [default=Y]:  
Min. value for PCon_Mean? [default=5.000000]:  
=> New value = "5.000000"? (y/n) [default=Y]:  
Min. value for QPCon_Mean? [default=9.375000]:  
=> New value = "9.375000"? (y/n) [default=Y]:  
Min. value for QPCon_Abs? [default=0.000000]:  
=> New value = "0.000000"? (y/n) [default=Y]:  

BADASP Partial Results Output (badasp_eg.partial.badasp) ... Done!

#LOG    00:23:06        BADASP V:1.3 End: Thu Sep  6 13:59:24 2012

Analysis

Open the file in your spreadsheet (or cut&space).

The columns are separated by a tab.

Color the “BAD”, “BADN” and “BAD” columns with a conditional formating, with value > 3.

In Jalview:

Load multiple alignment: badasp_eg.fas

Load tree: badasp_eg.nsf

Put a vertical line a the root of the tree to split the tree in two.

Some sites are interesting, i.e.:

Positon 3 BAD
Position 762 BAD
Position 223 BADX

There are only three genes in the group de VPP4, that explains why the BADX score are very close to the BAD score.

CATH Documentation

BADASP

Installation

Analysis of the V-type proton ATPase 116 kDa subunit a gene family

Analysis

In Jalview: