[Biojava-l] BioJavaX ready for testing
mark.schreiber at novartis.com
mark.schreiber at novartis.com
Fri Nov 4 05:29:00 EST 2005
Richard has done a really excellent job of making some pretty
comprehensive docs here with lots of examples. You should be able to use
it to take biojavax out for a spin!
- Mark
"Richard HOLLAND" <hollandr at gis.a-star.edu.sg>
Sent by: biojava-l-bounces at portal.open-bio.org
10/31/2005 05:28 PM
To: <biojava-l at biojava.org>
cc: Biosql <biosql-l at open-bio.org>, (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-l] BioJavaX ready for testing
Hello people!
Mark is away so I'm taking the liberty of sneaking this one out... :)
I've cross-posted this to both BioJava and BioSQL as much of what is new
in BioJavaX will probably be of interest to BioSQL users too.
We've been doing a lot of work recently on creating some extensions to
BioJava called BioJavaX. Primarily the purpose of these extensions is to
provide better interaction with BioSQL databases, which has been achieved
using Hibernate (www.hibernate.org). You can now fully interact with every
column of every table in BioSQL, using Hibernate's own HQL language to
construct queries that result in sets of BioJavaX objects. Selects,
inserts, updates, primary key assignment, foreign key relations, and
deletes are all handled transparently by Hibernate, removing the need for
any SQL at all to be included in BioJavaX.
As a side effect of constructing a Hibernate-compatible extension to the
BioJava object model, we were required to define objects that hold much
more detailed information about themselves. For instance, a Sequence
object cannot tell you what namespace it lives in in the BioSQL database,
but our extension to it, RichSequence, can. As RichSequence extends
Sequence and doesn't replace it, this means you can use the new objects
with your existing code without any hassle casting them.
To be able to load information from files into these new RichSequence
objects in a meaningful way, we had to create a more detailed
SeqIOListener, called RichSeqIOListener. Then, we had to create new file
parsers for the common file formats which were able to extract more
detailed information than before in order to satisfy the
RichSeqIOListener.
It's pretty safe to say that the file parsers in BioJavaX are leagues
ahead of the existing ones in BioJava, even if I do say so myself. :P The
downside of this extra detail though is that the parsers are much more
sensitive and will not play well at all with incomplete or incorrectly
formed files. If someone can edit them to be less sensitive whilst still
retaining the level of detail required, that'd be great.
We've included parsers for FASTA, GenBank, EMBL, UniProt, INSDseq,
EMBLxml, UniProtXML, and an extra one for parsing NCBI Taxonomy data.
Do note that BioJavaX cannot fully convert sequences created using the old
BioJava model into the new BioJavaX model. It'll do its best, but the
RichSequence object you'll end up with will have lots of properties set to
null and a tonne of annotations instead, pretty much the same as the
original Sequence object I suppose. So its best to try to avoid
conversions and deal with RichSequence objects from the ground up. This is
particularly important to consider when converting a BioSQL database
previously used with BioJava into one for use with BioJavaX. You'll also
find that if you pass a converted old-style Sequence object to one of the
new file parsers for writing it may fail or produce output with lots of
missing fields, as it will not find the information it is looking for in
the places it expects.
The whole lot is specifically designed to mimic and be compatible with
BioSQL, but you don't need to have a BioSQL database to use it. Everything
is standalone and will work just fine without a backing data source. Also
there is no reason why you couldn't create a new set of Hibernate mappings
that map the BioJavaX object model to some other relational database
schema of your choice.
The upshot of it all is the org.biojavax package, which you can find in
biojava-live branch on CVS. Development is pretty much complete, and it
now needs some serious testing.
We need volunteers to:
a) test the BioSQL interaction via Hibernate with the
various database flavours supported (HSQL, Oracle, MySQL, PostGreSQL)
b) test the various file formats, particularly looking
for special-case exceptions which the parsers may not be aware of yet
c) do some load-testing and help us find ways to improve
it if it turns out to be too slow when under pressure
Documentation of the new features can be found in DocBook XML format in
docs/docbook/BioJavaX.xml in the biojava-live branch of CVS. It's as
detailed as I could make it without getting bored to death writing it.
I've never been the world's best documentation writer, so if anyone would
like to help improve it you're more than welcome.
Our plan is to make all this an official part of BioJava come the 1.5
release, whenever that may be. For now though it is very very much a
testing-stage thing, not even an alpha release.
Questions on a postcard to either Mark or myself. Feedback most welcome.
cheers,
Richard
Richard Holland
Bioinformatics Specialist
Genome Institute of Singapore
60 Biopolis Street, #02-01 Genome, Singapore 138672
Tel: (65) 6478 8000 DID: (65) 6478 8199
Email: hollandr at gis.a-star.edu.sg
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please do
not copy or use it for any purpose, or disclose its content to any other
person. Thank you.
---------------------------------------------
_______________________________________________
Biojava-l mailing list - Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l
More information about the Biojava-l
mailing list