[Bioperl-l] Re: BioSQL or chado

Hilmar Lapp hlapp at gnf.org
Wed Jul 30 00:15:22 EDT 2003


Excellent reply Chris.

There's almost nothing I can meaningfully add. Biosql has essentially 
been frozen since a few months and the only changes during this time 
have been minor bug fixes (actually some bigger ones in the Oracle 
part).

It really depends a lot on your use case(s) and what your pool of 
expertise is on which you can rely.

As for the tight language bindings of biosql in bioperl, the OR layer 
is relatively flexible (with some limitations) in what the underlying 
schema is. If you guys should choose chado but still want to have the 
tight language binding, I can guide as to how best to extend bioperl-db 
to persist to chado; this has been on my wish list for a while.

I guess one of the advantages of biosql is the bio* support (there are 
bindings in biojava, biophython, and bioruby, apart from bioperl), but 
the question is whether this helps you. Also, biosql is supported on 
mysql, Pg, and Oracle, but again this may or may not help you.

I'm also x-posting this to biosql-l, in case someone wants to report 
experiences with using biosql or share testimonials ...

I gave an extensive talk at BOSC03 on biosql and bioperl-db, and I can 
email you the slides if you are interested. (Chris, is there actually a 
website for speakers to post their slides?)

	-hilmar

On Tuesday, July 29, 2003, at 06:31  PM, Chris Mungall wrote:

>
> [x-posting to GMOD-schema]
>
> On Tue, 29 Jul 2003, Nathan (Nat) Goodman wrote:
>
>> I'm thinking about converting our homegrown relational schema to one 
>> of the
>> emerging BioPerl-friendly "standard" schemas.  I'm looking for 
>> something
>> that (1) works now, and (2) is likely to be popular in the BioPerl 
>> world for
>> some time to come.
>>
>> I think the choices are BioSQL and chado.  Are there others?  Is one 
>> of
>> these the obvious right choice?
>
> ensembl and GUS are the other main choices
>
> I originally viewed BioSQL as a way of doing relational queries over 
> data
> slurped from EMBL/GenBank and SwissProt. It has since evolved into
> something more generic and resembles chado in many ways. Hilmar and 
> Dave
> Block use it at GNF all the time for a lot more than slurping genbank.
>
> Chado encompasses more than BioSQL - genetics, expression, 
> publications.
> But I'm assuming it's the sequence part you are interested in here? 
> There
> is nothing to stop BioSQL moving into this area.
>
> BioSQL certainly has the tightest integration with bioperl (and the 
> other
> bio* projects). This is through an O/R layer.
>
> chado has no direct integration with bioperl. I don't think there is 
> any
> O/R layer or OO API planned (although some biojava folks have 
> expressed an
> interest in this), Scott Cain has written a chado adapter for gbrowse
> (which uses bioperl objects) which could be extracted to form an API in
> its own right (although it is currently limited to the kind of API
> calls you need to make a genome viewer).
>
> Many of the chado developers favour XML over objects. Chado-XML DTD is
> derived directly from the relational schema. The chado developers at
> Harvard have written a generic XML<->DB tool, which can be used in 
> place
> of an API or O/R mapping. Of course, we still want to be able to use
> bioperl objects, so there are Bio::SeqIO::chadoxml classes being
> developed. The most likely route will be DB<->ChadoXML<->bioperl.
>
> BioSQL is semantically almost identical to the bioperl object model,
> whereas there are some differences with chado, specifically with 
> respect
> to locations.
>
> chado does not allow discontinuous/split feature locations
> chado does not support the full fuzzy genbank model
> chado allows multiple redundant locations
>   (eg a SNP on a protein vs genomic; features on clone and chromosome)
> chado uses interbase
> chado uses a different mechanism for 'remote' locations
>   the source feature (ie the one which start/end is relative to)
>   is part of the location in chado, unlike bioSQL
> chado abandons the artifical distinction between 'sequence' and 
> 'feature',
>   there is only one entity 'feature' in the _logical_ model
> chado has no equivalent of biosql.bioentry (other than 'feature')
>
> These aspects of chado are more fully documented in the sql ddl, and 
> in a
> document which is.... Stan/Dave.... where abouts is that doc?
>
> other than that, there are more similarities than differences. eg Both
> allow arbitrary feature graphs (preferably conforming to SO 
> partonomies),
> features are typed by an ontology etc. I'm sure migrating data one way 
> or
> the other wouldn't be too much of a problem.
>
> ensembl is a different kettle of fish altogether. The main difference 
> is
> that typing is enforced at the relational layer in ensembl. this has 
> many
> advantages and disadvantages which have been discussed to death, it
> depends on your project really.
>
> ensembl is the most mature, and chado is the new kid on the block.
> however, chado 1_01 has just been frozen, and that's what most apps 
> will
> be targetting.
>
> chado has a lot riding on it right now; flybase will become completely
> chado-dependent for its genome annotation data by the end of this year.
>
>> Thanks,
>> Nat
>
> Cheers
> Chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




More information about the Bioperl-l mailing list