[BioSQL-l] Consistency between bio* projects

Hilmar Lapp hlapp at gnf.org
Mon Jan 17 15:00:22 EST 2005


On Jan 16, 2005, at 6:41 PM, mark.schreiber at group.novartis.com wrote:

> It would seem that what is needed is a mapping of each field from a 
> file
> format to a field in a BioSQL table. I think initially this would only
> need to be done for EMBL, SwissProt and GenBank.

Note that this has at least been started for bioperl in the extent that 
the destination in the bioperl object model is documented. (Jason, 
Brian, anything you wanted to comment?)

Once you know where it is in the bioperl object model, it is relatively 
straight forward to predict where it ends up in the schema; still it's 
not written down in plain text anywhere I think.

> In many ways I prefer the idea of developing a SQL API which would be 
> more
> robust and would serve to define what is expected of each proceedure 
> call.
> However I think it should be achievable for the schema. In fact there 
> is
> no reason why both cannot co-exist. For any API there should be a 
> possbile
> implementation so naturally the schema could be used to generate an 
> API.
> People could then happilly make other schemata that fit the API which 
> may
> be optimised for their needs.

Right - I guess so far my idea was that the object model is the API, 
and the OR mapper implements the bridge to your chosen schema.

Clearly, the problem with this level of API is that it's not cross-bio* 
by definition since we don't use the exact same object model.

>
> Does anyone have a recent UML or similar diagram for the schema?

There is a ERD in the doc directory. Other than that, there is no UML 
model.

> I can then use this to suggest mappings from GenBank fields to the 
> API. I think
> it may be easier in many cases to follow bioperl's lead. BioJava seems 
> to
> follow the 'store everything that isn't a feature as a 
> bioentry_qualifier'
> approach so I just need to add some special cases.
>
> Hilmar, would you be prepared to do any work on the BioPerl side for
> synchronization of the two?

Certainly, if it is really needed. Generally speaking, I would not want 
to introduce object model and genbank-to-object model mapping changes 
to bioperl if they openly break backward compatibility unless everybody 
agrees to go forward. It's also not necessarily needed; the OR mapping 
code (bioperl-db) may be the better place, depending on scope and 
what's involved.

	-hilmar


>
> - Mark
>
>
>
>
>
> Hilmar Lapp <hlapp at gnf.org>
> 01/15/2005 01:58 AM
>
>
>         To:     Mark Schreiber/GP/Novartis at PH
>         cc:     biosql-l at open-bio.org
>         Subject:        Re: [BioSQL-l] Consistency between bio* 
> projects
>
>
>
> On Friday, January 14, 2005, at 01:10  AM,
> mark.schreiber at group.novartis.com wrote:
>>  Unfortunately, Bioperl stores identifiers as
>> follows:
>>
>> Bioentry.bioentry_id is the unique internal reference number
>> Bioentry.name is the GI number
>
> The GI number goes to Bioentry.Identifier, which is was designated the
> purpose of storing the identifier within an external database.
>
> Bioentry.name should hold the locus name, which for contigs and many
> other entries etc will be identical to the accession (but not the GI
> number!).
>
> If you find it in Bioentry.name then I suspect you weren't loading from
> genbank or embl formatted input?
>
>  From memory the basic idea of BioSQL was to define a schema that bio*
>> projects could both read and write from in a language independant
>> manner.
>> For reasons best left to the designers (mostly I think cause MySQL
>> couldn't handle stored proceedures) the level of interaction is right
>> down
>> at the schema level.
>
> Right. Also, not all database drivers in all languages support stored
> procedure calls equally well. In e.g. PostgreSQL and Oracle you can
> always get around this by writing a view and putting an INSTEAD OF
> INSERT (or UPDATE) trigger on it that will then call the procedure, but
> this is clearly not even close to an option in MySQL.
>
> It's maybe worth considering whether opening a dichotomy here between
> MySQL and the rest to provide people who need it with a SQL-level API
> that both perl and java will use. People who are interested in this by
> definition will not be interested in MySQL anyway.
>
>>  Unfortunaltey this means that the way data is stored
>> needs to be very consistent between projects if any API's that use
>> BioSQL
>> can be portable. My biojava API cannot be applied to a DB previously
>> setup
>> with bioperl which was the original idea behind BioSQL in the first
>> place.
>>
>> Help!!
>
> I think you're raising a great point. Indeed, such a contract hasn't
> really been written. We're probably one of few who use both perl and
> java to access a biosql database (and I'm not using biojava as the
> object model on the java side, which is why I'm not running into this
> problem). (Note as an aside that you could also write adaptors that
> transform between the SymGene and the Biojava model when storing or
> retrieving objects from/to the database.)
>
> It'd be great if you were willing to take the lead for getting this all
> spelled out and laid down in a document?
>
>                  -hilmar
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
>
>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the BioSQL-l mailing list