[Biopython-dev] Biopython-dev Digest, Vol 89, Issue 8
Laurent
lgautier at gmail.com
Tue Jun 8 03:00:10 EDT 2010
On 08/06/10 08:39, biopython-dev-request at lists.open-bio.org wrote:
> On Mon, Jun 7, 2010 at 2:23 PM, Peter<biopython at maubp.freeserve.co.uk>wrote:
>
>> > Peter wrote:
>>> > >...
>>> > >
>>> > > http://github.com/peterjc/biopython/tree/index-sqlite
>>> > >
>>> > > ... an SQLite index is used to hold
>>> > > the offsets. This means very low RAM requirements, but is a lot
>>> > > slower because the offsets are written to disk and the SQLite
>>> > > index is updated as we go. I expect this part can be optimised
>>> > > (e.g. try to build the index at the end, try committing in batches).
>> >
>> > Having now tried using this on some files with tens of millions of
>> > records, tuning how we use SQLite is going to be important.
>> >
>> >
> Wouldn't a Berkeley database be much much faster for constructing simple key
> to offset mappings?
>
> -Kevin
>
Yes. If one is only looking for a key/value associative structure, the
NOSQL solutions will be faster (tokyocabinet seems to be one of the
fastest, up to 100x when compared to BerkleyDB
http://www.ioremap.net/node/235
).
L.
More information about the Biopython-dev
mailing list