[Biopython-dev] Code to submit: CRC64

Peter biopython-dev at maubp.freeserve.co.uk
Thu Jun 21 15:37:05 UTC 2007


Sebastian Bassi wrote:
> On 6/21/07, Peter <biopython-dev at maubp.freeserve.co.uk> wrote:
>> Please could you fill an enhancement bug, and attach the code to it -
> 
> By attach do you mean to include it into the "description" field? Or
> is there an attach option in the bug report form that I am missing?

You have to file the bug first, and then you can attach files 
afterwards. Its a bit odd - possibly a limitation in bugzilla, or just 
the way its setup here.

> 1) Check if the data you have is the same as data in a public DB
> without downloading the whole sequences, just download the CRC info
> and calculate the CRC with your local sequences and compare them.
> There are chances by a random match but it's very low.

Could you give a couple of explicit examples (URLs), as I personally 
don't remember ever noticing CRC information.

> 2)  You have your own sequences and want to store them in fasta format
> and want to include CRC64 in the description, to retrieve it later to
> check for consistency.

Nice example; this should work on any sequence format that can store 
annotations (provided the CRC64 is calculated purely from the sequence).

>> In typical usage, does the case of the sequences matter?
> 
> Case matters. AA is checksumed in uppercase and DNA in lowercase. I
> will see if I can force this for seq objects (and leave it alone if it
> is a plain string).

Seq objects can have upper or lower case letters (or a mixture) 
regardless of the alphabet. Rather than writing some complicated code to 
convert amino acids into upper case and DNA into lowercase, maybe just 
state these conventions in the function's doc string at leave it up to 
the user.

>> Looking at the code, it looks like it would fail when used on
>> sequences (Seq objects) where the "letters" are not single characters
>> (e.g. sequences using the three letter amino acid codes). This is
>> probably not a big problem.
> 
> CRC is always calculated in one letter code.

Fine. I would state this explicitly in the function's doc string (the 
comment at the beginning which documents the arguments etc).

Peter




More information about the Biopython-dev mailing list