[Biopython-dev] sff reader

Fri Aug 14 09:36:31 UTC 2009

On Fri, Aug 14, 2009 at 7:01 AM, Blanca Postigo Jose
Miguel<jblanca at btc.upv.es> wrote:
>
>> The coding style is quite different, but it looks the essential idea
>> is the same - we both scan the file to find each record, and use
>> a dictionary to record the offset. Interestingly you and Peio also
>> keeps the record's length in the dictionary, which will double the
>> memory requirements - for something you don't actually need.
>
> We keep the record length to be able to return the record without
> having to scan the file again.

If you want to be able to extract the raw record, that makes sense.
It is still a trade off between memory usage and speed of access,
and depending on your requirements either way makes sense.

For Bio.SeqIO, I want to parse the raw record on access via the
key in order to return a SeqRecord, so I have no need to keep
the raw record length in memory. I'm using this github branch:
http://github.com/peterjc/biopython/commits/index

Peter