[BioSQL-l] BioJavaX ready for testing

Hilmar Lapp hlapp at gnf.org
Wed Nov 2 02:55:49 EST 2005


Sounds pretty cool! -hilmar

On Oct 31, 2005, at 1:28 AM, Richard HOLLAND wrote:

> Hello people!
>
> Mark is away so I'm taking the liberty of sneaking this one out... :)
>
> I've cross-posted this to both BioJava and BioSQL as much of what is 
> new in BioJavaX will probably be of interest to BioSQL users too.
>
> We've been doing a lot of work recently on creating some extensions to 
> BioJava called BioJavaX. Primarily the purpose of these extensions is 
> to provide better interaction with BioSQL databases, which has been 
> achieved using Hibernate (www.hibernate.org). You can now fully 
> interact with every column of every table in BioSQL, using Hibernate's 
> own HQL language to construct queries that result in sets of BioJavaX 
> objects. Selects, inserts, updates, primary key assignment, foreign 
> key relations, and deletes are all handled transparently by Hibernate, 
> removing the need for any SQL at all to be included in BioJavaX.
>
> As a side effect of constructing a Hibernate-compatible extension to 
> the BioJava object model, we were required to define objects that hold 
> much more detailed information about themselves. For instance, a 
> Sequence object cannot tell you what namespace it lives in in the 
> BioSQL database, but our extension to it, RichSequence, can. As 
> RichSequence extends Sequence and doesn't replace it, this means you 
> can use the new objects with your existing code without any hassle 
> casting them.
>
> To be able to load information from files into these new RichSequence 
> objects in a meaningful way, we had to create a more detailed 
> SeqIOListener, called RichSeqIOListener. Then, we had to create new 
> file parsers for the common file formats which were able to extract 
> more detailed information than before in order to satisfy the 
> RichSeqIOListener.
>
> It's pretty safe to say that the file parsers in BioJavaX are leagues 
> ahead of the existing ones in BioJava, even if I do say so myself. :P 
> The downside of this extra detail though is that the parsers are much 
> more sensitive and will not play well at all with incomplete or 
> incorrectly formed files. If someone can edit them to be less 
> sensitive whilst still retaining the level of detail required, that'd 
> be great.
>
> We've included parsers for FASTA, GenBank, EMBL, UniProt, INSDseq, 
> EMBLxml, UniProtXML, and an extra one for parsing NCBI Taxonomy data.
>
> Do note that BioJavaX cannot fully convert sequences created using the 
> old BioJava model into the new BioJavaX model. It'll do its best, but 
> the RichSequence object you'll end up with will have lots of 
> properties set to null and a tonne of annotations instead, pretty much 
> the same as the original Sequence object I suppose. So its best to try 
> to avoid conversions and deal with RichSequence objects from the 
> ground up. This is particularly important to consider when converting 
> a BioSQL database previously used with BioJava into one for use with 
> BioJavaX. You'll also find that if you pass a converted old-style 
> Sequence object to one of the new file parsers for writing it may fail 
> or produce output with lots of missing fields, as it will not find the 
> information it is looking for in the places it expects.
>
> The whole lot is specifically designed to mimic and be compatible with 
> BioSQL, but you don't need to have a BioSQL database to use it. 
> Everything is standalone and will work just fine without a backing 
> data source. Also there is no reason why you couldn't create a new set 
> of Hibernate mappings that map the BioJavaX object model to some other 
> relational database schema of your choice.
>
> The upshot of it all is the org.biojavax package, which you can find 
> in biojava-live branch on CVS. Development is pretty much complete, 
> and it now needs some serious testing.
>
> We need volunteers to:
>
> 	a) test the BioSQL interaction via Hibernate with the various 
> database flavours supported (HSQL, Oracle, MySQL, PostGreSQL)
> 	b) test the various file formats, particularly looking for 
> special-case exceptions which the parsers may not be aware of yet
> 	c) do some load-testing and help us find ways to improve it if it 
> turns out to be too slow when under pressure
>
> Documentation of the new features can be found in DocBook XML format 
> in docs/docbook/BioJavaX.xml in the biojava-live branch of CVS. It's 
> as detailed as I could make it without getting bored to death writing 
> it. I've never been the world's best documentation writer, so if 
> anyone would like to help improve it you're more than welcome.
>
> Our plan is to make all this an official part of BioJava come the 1.5 
> release, whenever that may be. For now though it is very very much a 
> testing-stage thing, not even an alpha release.
>
> Questions on a postcard to either Mark or myself. Feedback most 
> welcome.
>
> cheers,
> Richard
>
>
> Richard Holland
> Bioinformatics Specialist
> Genome Institute of Singapore
> 60 Biopolis Street, #02-01 Genome, Singapore 138672
> Tel: (65) 6478 8000   DID: (65) 6478 8199
> Email: hollandr at gis.a-star.edu.sg
> ---------------------------------------------
> This email is confidential and may be privileged. If you are not the 
> intended recipient, please delete it and notify us immediately. Please 
> do not copy or use it for any purpose, or disclose its content to any 
> other person. Thank you.
> ---------------------------------------------
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




More information about the BioSQL-l mailing list