[Bioperl-l] Bio::SeqIO::gcg bug

Stefan Kirov stefan.kirov at bms.com
Tue Nov 7 15:54:06 UTC 2006


Bio::SeqIO::gcg is checking the checksum against the GCG generated one. 
There is a problem with the way this is done:
1. Bio::SeqIO::gcg removes all characters, except [A-Za-z] (which by the 
way is always wrong).
2. GCG calculates the checksum on uppercase

I assume Hilmar removed the $_ = uc($_); line for a very good reason, 
but the call to validate should be:
_validate_checksum(uc($sequence),$chksum))

Also I believe the regexp for checking the alphabet should remove 
explicitly numbers and whitespaces. Removing everything else is not a 
good idea because gaps, end of translation are removed also and possible 
parsing errors might be suppressed incorrectly.
Let me know if I am missing some other considerations here. If not I 
will commit these changes.
Stefan



More information about the Bioperl-l mailing list