[Biopython-dev] Storing Bio.SeqIO.index() offsets in SQLite

Kevin Jacobs <jacobs@bioinformed.com> bioinformed at gmail.com
Mon Jun 7 21:10:42 UTC 2010


On Mon, Jun 7, 2010 at 2:23 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> Peter wrote:
> >...
> >
> > http://github.com/peterjc/biopython/tree/index-sqlite
> >
> > ... an SQLite index is used to hold
> > the offsets. This means very low RAM requirements, but is a lot
> > slower because the offsets are written to disk and the SQLite
> > index is updated as we go. I expect this part can be optimised
> > (e.g. try to build the index at the end, try committing in batches).
>
> Having now tried using this on some files with tens of millions of
> records, tuning how we use SQLite is going to be important.
>
>
Wouldn't a Berkeley database be much much faster for constructing simple key
to offset mappings?

-Kevin



More information about the Biopython-dev mailing list