[Biopython-dev] sff reader

Fri Aug 14 13:25:43 UTC 2009

Jose wrote:
>>> We keep the record length to be able to return the record without
>>> having to scan the file again.

Peter wrote:
>> If you want to be able to extract the raw record, that makes sense.
>> It is still a trade off between memory usage and speed of access,
>> and depending on your requirements either way makes sense.
>>
>> For Bio.SeqIO, I want to parse the raw record on access via the
>> key in order to return a SeqRecord, so I have no need to keep
>> the raw record length in memory. I'm using this github branch:
>> http://github.com/peterjc/biopython/commits/index

Jose wrote:
> We want the raw record because we plan to use this FileIndex on several
> different files, not just for sequences. In fact you have an example on how to
> use it for sequences in SequenceFileIndex, a class that uses the general
> FileIndex. I think that this FileIndex class will be able even to index xml
> files. This is the motivation for the design.

I see - that makes sense.

Peter