[Biopython-dev] Code to submit: CRC64

Sebastian Bassi sbassi at gmail.com
Thu Jun 21 18:06:29 UTC 2007


On 6/21/07, Peter <biopython-dev at maubp.freeserve.co.uk> wrote:
> You have to file the bug first, and then you can attach files
> afterwards. Its a bit odd - possibly a limitation in bugzilla, or just
> the way its setup here.

OK, I am working on it.

> Could you give a couple of explicit examples (URLs), as I personally
> don't remember ever noticing CRC information.

I did notice because my first bioinformatics program that I ever used
was DNAstar and it included checksum information.
http://expasy.org/uniprot/P04293 (near the bottom, in Sequence information).
http://bioinformatics.anl.gov/seguid/overview.aspx (seguid proposed as
a stronger id that crc64).
http://lists.open-bio.org/pipermail/bioruby-cvs/2007-February.txt
On this page there are some formats using some kind of checksum
(included crc64):
http://www.ebi.ac.uk/help/formats_frame.html

BTW, I could code also the GCGchecksum based on bioperl implementation.

And from Swiss prot manual:

"The SQ (SeQuence header) line marks the beginning of the sequence
data and gives a quick summary of its content.
The format of the SQ line is:
    SQ   SEQUENCE XXXX AA; XXXXX MW; XXXXXXXXXXXXXXXX CRC64;
The line contains the length of the sequence in amino acids ('AA')
followed by the molecular weight ('MW') rounded to the nearest mass
unit (Dalton) and the sequence 64-bit CRC (Cyclic Redundancy Check)
value ('CRC64')."


> Seq objects can have upper or lower case letters (or a mixture)
> regardless of the alphabet. Rather than writing some complicated code to
> convert amino acids into upper case and DNA into lowercase, maybe just
> state these conventions in the function's doc string at leave it up to
> the user.

I am doing both versions for you to choose.

Best,
SB.

-- 
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318



More information about the Biopython-dev mailing list