[Biopython-dev] Storing Bio.SeqIO.index() offsets in SQLite

Peter biopython at maubp.freeserve.co.uk
Mon Jun 7 18:23:05 UTC 2010


Peter wrote:
>...
>
> http://github.com/peterjc/biopython/tree/index-sqlite
>
> ... an SQLite index is used to hold
> the offsets. This means very low RAM requirements, but is a lot
> slower because the offsets are written to disk and the SQLite
> index is updated as we go. I expect this part can be optimised
> (e.g. try to build the index at the end, try committing in batches).

Having now tried using this on some files with tens of millions of
records, tuning how we use SQLite is going to be important.

Peter



More information about the Biopython-dev mailing list