[Biojava-l] Schema and Docs for BioSQL
Brian Gilman
gilmanb@genome.wi.mit.edu
Wed, 20 Feb 2002 18:16:12 -0500 (EST)
Hello Thomas,
What did you use to generate the postgres ddl?? I haven't found
anything that works very well...
Thanks!
-b
-----------------------
Brian Gilman <gilmanb@genome.wi.mit.edu>
Sr. Software Engineer MIT/Whitehead Inst. Center for Genome Research
One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
phone +1 617 252 1069 / fax +1 617 252 1902
On Wed, 20 Feb 2002, Thomas Down wrote:
> On Wed, Feb 20, 2002 at 01:52:10PM -0500, Marc Colosimo wrote:
> > Hi,
> >
> > Is there any information about using the BioSQL classes in BioJava, such
> > as the schema for the database or examples in using it? I am interest in
> > using postgre and biojava to store lots of sequence data.
>
> BioSQL is based on bioperl-db. There's a little bit about
> it in the document from the first (O'Reilly) hackathon meeting:
>
> http://www.technophage.com/open-bio-database.pdf
>
> The BioJava code's quite new -- I've got a little tutorial
> planned, but I'm afraid (ahem) it's not written yet.
>
> In the mean time, the code is integrated into the main
> trunk version of biojava-live (although it didn't quite
> make it into 1.2), and hopefully shouldn't be too
> problematic to use (touch wood!).
>
> You can get schemas (MySQL and PostgreSQL) from:
>
> http://www.biojava.org/download/biosql/
>
> Right now, there are actually two PostgreSQL schemas --
> one was auto-generated from the MySQL one, the other was
> hand edited by me (identified by the -thomasd suffix).
> Right now, I'd advise the hand-edited version, but this
> should go away in future once the automated conversion has
> been perfected.
>
> If you're using PostgreSQL, note the following:
>
> - You need at least version 7.1 -- previous versions didn't
> support storing large strings in normal table attributes.
>
> - There's a file of stored procedures (biosqlprocs.sql)
> which you can load into the database after loading the
> schema. These are auto-detected by the BioJava code,
> and can increase write performance by a significant
> amount (a factor of 3, using my test setup).
>
>
> On the BioJava side, there isn't really any API for BioSQL
> as such. You can just do something like:
>
> SequenceDB seqs = new BioSQLSequenceDB(
> "jdbc:postgresql://dbbox.mydomain.org/biosql_db",
> "username",
> "password",
> "database-name",
> true
> );
>
> The first three arguments are just standard JDBC-style database
> connection details. There's a `database name' parameter because
> BioSQL allows each `physical' SQL database to contain a number of
> `logical' databases. Perhaps namespace would be a better term
> for these (but hey, I didn't write the original schema). The final
> argument specifies whether the namespace should be created if it
> doesn't already exist. Note that right now, the BioJava code
> won't create the actual SQL database, or load the schema, for you.
> You'll have to do this manally using your database's normal tools.
>
> Having connected to the database, you can write complete
> Sequence entries using the addSequence(Sequence) method.
>
> You can retreive sequences by ID using the getSequence(String)
> method. Objects extracted by this method retain live connections
> to the database. Alterations to the sequence (for instance,
> using the createFeature(Feature.Template) method) are immediately
> reflected in the database (in a transactionally safe manner, if
> the database supports this -- PostgreSQL does). So they're true
> persistant implementations of the BioJava interfaces.
>
> The aim is to have everything work just like in-memory
> SequenceDB, Sequence, and Feature objects. For many purposes,
> BioSQL is now pretty close to this ideal.
>
> Basic BioSQL doesn't support hierarchical features, so theseg
> get flattened when adding a sequence to a database (and attempts
> to create new child features on a BioSQL sequence will fail).
> However, I've got an /experimental/ extension for handling
> this. There's an extra table (seqfeature_hierarchy) in my
> schema. Once again, this is autodetected by the client code
> and used if available.
>
>
> Let me know how you get on,
>
> Thomas.
> _______________________________________________
> Biojava-l mailing list - Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>