[Bioperl-l] Possible bug in Bio::Tools::SeqStats->get_mol_wt?

Chris Fields cjfields at illinois.edu
Thu Mar 24 11:47:29 EDT 2011


On Mar 24, 2011, at 8:47 AM, Roy Chaudhuri wrote:

> Hi all,
> 
> I have discovered a possible bug in Bioperl, although maybe it's my expectations that are wrong, not the code.
> 
> I noticed that when calculating molecular weights for a bunch of protein sequences using Bio::Tools::SeqStats->get_mol_wt, the values I was getting were slightly different from the ones given by Emboss pepstats. This was due to my protein sequences ending with *, since they were derived from translating annotated genes including the stop codon. Surprisingly (to me, at least) Bio::Seq->length gives a value that counts the terminal *, so one greater than the number of amino acids. SeqStats->get_mol_wt calls Bio::Seq->length to determine the number of water molecules to subtract from the total molecular weight, so the reported weights for my sequence were the weight of one water molecule less than they should have been. I'm not sure if this is a bug in get_mol_wt, in Bio::Seq->length, or if it's bad practice to use protein sequences with a terminal asterisk (I've never had a problem doing so before).

The method should account for the possibility that '*' is present; should be easy enough to fix with something like:

my $len = $seq =~ tr/A-Za-z/A-Za-z/;

I'm not able to do this right away (on fam vacation), can you file this on our new bug server?

http://redmine.open-bio.org

> Cheers,
> Roy.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

chris



More information about the Bioperl-l mailing list