[Open-bio-l] OBDA redux?

Peter Cock p.j.a.cock at googlemail.com
Mon Nov 14 13:14:18 EST 2011


Hi Chris,

[Did you mean to CC BioPerl-l? Should I have?]

On Mon, Nov 14, 2011 at 5:59 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Nov 13, 2011, at 6:24 AM, Peter Cock wrote:
>
>> So, Chris and I seem in general agreement that an OBDA v2
>> using SQLite but based on essentially the same approach as
>> the BDB or flat file based OBDA v1 is a good idea. i.e. Tables
>> mapping record identifiers to file offsets in the original sequence
>> files.
>
> The worry I have is adhering to a specific backend (e.g. SQLite).
> The reason I say this is b/c BDB in it's time was the go-to way
> of storing simple index data, but that is no longer feasible for
> very large data sets.  Who's to say something similar won't
> happen to SQLite, or that it is the best option available?

Right now I would think SQLite is one of the best (if not the
best) option. If supporting the old back ends is important for
cross-project compatibility, I'm willing to have another go
at using BDB in Biopython, but had limited success last
time I tried.

> Maybe we should focus on the data storage schema, as
> simple as it may be, then indicate the default backend
> must be SQLite but others are allowed (maybe with a
> mention that SQLite can be replaced by alternatives in
> the future if needed).

It would make sense to talk about an SQL schema if
the "other options" would also be SQL based. But they
might not be... but certainly we should keep potential
alternative back ends in mind.

>> I hope to get BioRuby on board, they already have an OBDA
>> v1 support so that shouldn't be too hard.
>>
>> Right now I don't recall if BioJava has/had OBDA v1 support,
>> and if they did if it was affected in their recent move to BioJava
>> v3 (I understand from their mailing list that some older lower
>> priority functionality has not all been ported yet).
>
> I wouldn't be surprised at that, OBDA kind of lingered for a
> while, and I'm not sure how widely adopted it became
> (maybe others can shed light on that?)

Well, OBDA went beyond just indexing flat files - it also
tried to standard things like remote database access.
I don't think we every really had that side working in
Biopython, so I am less familiar with it. I know EMBOSS
has something fairly extensive for online databases,
but have not checked if it uses the OBDA style or their
own.

For now I was only planning to tackle indexing sequence
files in this "OBDA redux".

>> Also EMBOSS are likely to be interested, certainly Peter Rice
>> was interested in the SQLite indexing we're already using in
>> Biopython for sequence files (i.e. what is effectively the
>> prototype for OBDA v2).
>>
>> Note that in addition to simple indexing of text files, we are
>> already using the same simple offset + length approach for
>> indexing binary files (e.g. SFF).
>
> I think that's the general idea, that is how all bioperl data
> was indexed, before with the Bio::Index modules and with
> the OBDA implementations as well.

Good.

>> On the immediate practical side, I think I can edit the
>> current OBDA website of http://obda.open-bio.org/
>> via /home/websites/obda.open-bio.org/html on the
>> server.
>
> See below w/ regards to my thoughts on the wiki.
>
>> We need to work out where the current OBDA indexing
>> specification lives (CVS or SVN?) and perhaps move
>> that to github. We may need a general OBF organisation
>> account on git hub for this and any other cross-project
>> repositories.
>
> +1 to a move to github, but maybe this belongs in an
> OBF-specific organization.

Yes, definitely under an OBF github account (not under
Biopython, BioPerl, etc).

> And maybe we should take advantage of the simple
> wiki or project homepage that GitHub offers and move
> everything (docs) there.

That could work. We'd have to go through all the old
documentation and relocate it, then we could make the
obda.open-bio.org domain point at the github pages.

>> I see there is already an OBDA project on RedMine,
>> (Chris can you add me to that please?)
>> https://redmine.open-bio.org/projects/obda
>>
>> Peter
>
> Done (last night actually, but I didn't have time to respond
> immediately).
>
> chris

Thanks,

Peter



More information about the Open-Bio-l mailing list