[Bioperl-l] Pfam_Scan

Dave Messina David.Messina at sbc.su.se
Sat May 1 22:28:48 UTC 2010


Hi Rad,

As far as I can tell the Pfam_Scan output is simply tab-delimited text (see details below), so you should be able to group sequences which share domains by sorting on the sixth column. I suspect that sequences with multiple domain hits will have multiple lines in the output, one per hit, so if you want to identify sequences which share the same _set_ of domains you will have to do the bookkeeping yourself.

That being said, Pfam_Scan is not part of BioPerl — it's distributed by the Pfam team — so you may want to contact them directly for help (pfam-help at sanger.ac.uk).

Dave


[from the Pfam_Scan documentation]
The output format is:
<seq id> <alignment start> <alignment end> <envelope start> <envelope end> <hmm acc> <hmm name> <type> <hmm start> <hmm end> <hmm length> <bit score> <E-value> <significance> <clan> <predicted_active_site_residues>

Example output (with -pfamB, -as options):

  Q5NEL3.1      2    224      2    227 PB013481  Pfam-B_13481      Pfam-B     1   184   226    358.5  1.4e-107  NA NA
  O65039.1     38     93     38     93 PF08246   Inhibitor_I29     Domain     1    58    58     45.9   2.8e-12   1 No_clan
  O65039.1    126    342    126    342 PF00112   Peptidase_C1      Domain     1   216   216    296.0   1.1e-88   1 CL0125   predicted_active_site[150,285,307]





More information about the Bioperl-l mailing list