[Biopython-dev] Storing Bio.SeqIO.index() offsets in SQLite
Peter
biopython at maubp.freeserve.co.uk
Mon Jun 7 18:23:05 UTC 2010
Peter wrote:
>...
>
> http://github.com/peterjc/biopython/tree/index-sqlite
>
> ... an SQLite index is used to hold
> the offsets. This means very low RAM requirements, but is a lot
> slower because the offsets are written to disk and the SQLite
> index is updated as we go. I expect this part can be optimised
> (e.g. try to build the index at the end, try committing in batches).
Having now tried using this on some files with tens of millions of
records, tuning how we use SQLite is going to be important.
Peter
More information about the Biopython-dev
mailing list