[Biopython-dev] Biopython-dev Digest, Vol 89, Issue 8

Tue Jun 8 07:00:10 UTC 2010

On 08/06/10 08:39, biopython-dev-request at lists.open-bio.org wrote:
> On Mon, Jun 7, 2010 at 2:23 PM, Peter<biopython at maubp.freeserve.co.uk>wrote:
>
>> >  Peter wrote:
>>> >  >...
>>> >  >
>>> >  >  http://github.com/peterjc/biopython/tree/index-sqlite
>>> >  >
>>> >  >  ... an SQLite index is used to hold
>>> >  >  the offsets. This means very low RAM requirements, but is a lot
>>> >  >  slower because the offsets are written to disk and the SQLite
>>> >  >  index is updated as we go. I expect this part can be optimised
>>> >  >  (e.g. try to build the index at the end, try committing in batches).
>> >
>> >  Having now tried using this on some files with tens of millions of
>> >  records, tuning how we use SQLite is going to be important.
>> >
>> >
> Wouldn't a Berkeley database be much much faster for constructing simple key
> to offset mappings?
>
> -Kevin
>

Yes. If one is only looking for a key/value associative structure, the 
NOSQL solutions will be faster (tokyocabinet seems to be one of the 
fastest, up to 100x when compared to BerkleyDB
http://www.ioremap.net/node/235
).

L.