[Open-bio-l] OBDA redux?
Raoul Bonnal
bonnal at ingm.org
Fri Nov 18 04:35:56 EST 2011
Dear all,
Would be possible to have a test dataset and clear requirements,
functionalities? Not a huge doc, just few points for benchmarking.
On 17/11/11 18.11, "Pjotr Prins" <pjotr.public41 at thebird.nl> wrote:
> On Thu, Nov 17, 2011 at 02:39:49PM +0000, Peter Cock wrote:
>>> +1. This will only get worse, with the projections for upcoming HiSeq
>>> upgrades, it is possible 1-2 channel runs would hit that limit.
>>
>> That's a useful scale to aim to cover in profiling then, 100M to 500M
>> records. Jason, do you have any more details about the slowdown
>> you found with SQLite? For this use case we want to write the index
>> once, and read it many times. I found it is quicker to populate the
>> offset table before creating the index - perhaps you were seeing the
>> index being updated while adding records?
>
> I have also found that hammering SQLite quickly deteriorates
> performance. Rather too quickly in fact. Don't forget that SQL is
> inherently slower that 'simple' indexers. Also SQLite is a convenience
> library, rather than a library designed for optimized performance. We
> used to run sleepycat/bdb for that reason, now it is Tokyo/Kyoto
> cabinet.
>
> In the (rather) near future we will be looking at parallel feeds from
> multiple machines, to keep it somewhat interesting. Hadoop has
> indexing support. In fact, Hadoop should be ideal for indexed sequence
> information, though I have not used it. Still, when the time comes, I
> am kinda interested in parallelized NoSQL solutions for scaling up.
> Hadoop kills me because of its complexity. I hate complexity (one
> reason I have tried to avoid SQL servers).
>
> BTW 500M records takes significant RAM for an in-memory index. Quite a
> number of solutions, to retain their performance, have to have the
> indexes in memory. 500M records now, will grow to 500G records soon.
> Just a thing to keep in mind. I would opt for a non-RAM solution.
>
> Pj.
> _______________________________________________
> Open-Bio-l mailing list
> Open-Bio-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/open-bio-l
More information about the Open-Bio-l
mailing list