[BioSQL-l] Consistency between bio* projects

Fri Jan 14 04:10:32 EST 2005

Hello all -

I've begun using BioSQL for a project we have here and have noticed a 
consistency problem with the way that BioJava and BioPerl persist objects. 
Basically I want to know if there is some contract for how certain common 
formats should be stored in the DB? If not then there really needs to be 
for the reasons outlined below.

For example I'm developing an object that keeps my DB in sync with a small 
subset of GenBank (Dengue Virus sequences). To do this the object gets 
GenBank GI's  for dengue virus and compares them with the DB to see if any 
new ones need adding. Unfortunately, Bioperl stores identifiers as 
follows:

Bioentry.bioentry_id is the unique internal reference number
Bioentry.name is the GI number
Bioentry.accession is the accession
Bioentry.biodatabase_id refers to Biodatabase, from which (if you set this 
correctly whilst loading the sequence) you can check to see which database 
the sequence came from (eg. GenBank), so that you know it really is a GI 
number.

BioJava is the same except that Bioentry.name is the Accession as is 
Bioentry.accession, BioJava (oddly) stores GIs as Annotations. Arguably 
BioPerl's system is better but it's all meaningless if the two don't 
agree. What if I use bioperl to load some sequences and later change to 
biojava. Suddenly my first set of records has GIs in different places from 
my second. 

>From memory the basic idea of BioSQL was to define a schema that bio* 
projects could both read and write from in a language independant manner. 
For reasons best left to the designers (mostly I think cause MySQL 
couldn't handle stored proceedures) the level of interaction is right down 
at the schema level. Unfortunaltey this means that the way data is stored 
needs to be very consistent between projects if any API's that use BioSQL 
can be portable. My biojava API cannot be applied to a DB previously setup 
with bioperl which was the original idea behind BioSQL in the first place.

Help!!

Mark Schreiber
Principal Scientist (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910