[Biopython-dev] Bio.SeqIO.index extension, Bio.SeqIO.index_many

Brad Chapman chapmanb at 50mail.com
Tue Dec 7 13:59:41 UTC 2010


Peter;

> You may recall some previous discussion about extending the
> Bio.SeqIO.index functionality. I'm particularly interested in
> keeping the index on disk to reduce the memory overhead
> and thus support NGS files with many millions of reads. e.g.
[...]
> I've been working on the follow idea on branches in github,
> and have something workable using SQLite3 to store a
> table of record identifiers, file offset, and file number
> (for where we have multiple files indexed together).
[...]
> https://github.com/peterjc/biopython/tree/index-many

This is great and definitely needed. The implementation
looks nice and fits with the current index functionality,
and SQLite definitely seems like the right choice.
So a big +1 on all of this.

My only suggestion would be the naming: index_file makes it a little
clearer about the intentions, instead of index_many (the best
naming would be 'index' for this functionality and 'index_memory' for
the in-memory indexing, but the ship has probably sailed on that).

Thanks much for taking this on,
Brad



More information about the Biopython-dev mailing list