From hlapp at gmx.net  Sun Jul  2 09:20:53 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 2 Jul 2006 09:20:53 -0400
Subject: [BioSQL-l] BioSQL Schema problem
In-Reply-To: <44A275E5.2040104@librophyt.com>
References: <44A275E5.2040104@librophyt.com>
Message-ID: <2F4506F2-84FC-412A-9BC5-8E3C92E086C8@gmx.net>

The biosqldb-views-pg.sql is badly outdated I notice. Sorry about  
that. Are you sure you need it? (Most applications will not.)

I probably shouldn't just delete but try to update it. The offending  
seqfeature_key table has long been removed from the schema and you  
can safely delete the view definition from the file, but there may be  
a few more errors given its age.

I need to investigate the script's failure on inserting nodes - this  
is assuming that you put the file by hand in the right place.  
Apparently there is an alphanumerical value that gets parsed as the  
taxon id (which must be numeric indeed).

--download is a switch and hence does not take any arguments, -- 
download 0 does ask to download, which is why you see the error. I  
don't know why the download fails, maybe there's a problem with  
extended ftp mode (EPSV/EPRT commands) but I don't know off hand how  
you disable them in Net::FTP.

	-hilmar

On Jun 28, 2006, at 8:28 AM, Samuel Thoraval wrote:

>
> Hello,
>
> I am new to biosql and I have 2 problems installing last CVS version
> (*1.4.2.1*, /Sun Jun 16)/:
> - running biosqldb-views-pg.sql after biosqldb-pg.sql gives errors,  
> the
> first one being:
> psql:biosqldb-views-pg.sql:6: ERROR:  relation "seqfeature_key"  
> does not
> exist
> - running load_ncbi_taxonomy.pl with
> ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
> <ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz> (the script
> download option set to 1 doesn't download anything) gives the  
> following
> error :
> ---------------------------------------------------------------------- 
> ------------------------------------------------------------------
>  ./scripts/load_ncbi_taxonomy.pl --dbname bioseqdb --driver Pg -- 
> download 0
> gunzip: taxdata/taxdump.tar.gz: No such file or directory
> tar: taxdump.tar: ne peut open: Aucun fichier ou r?pertoire de ce type
> tar: Erreur non r?cup?rable: fin de l'ex?cution imm?diate
> Loading NCBI taxon database in taxdata:
>         ... retrieving all taxon nodes in the database
>         ... reading in taxon nodes from nodes.dmp
>         ... insert / update / delete taxon nodes
> failed to insert node (1;1;1;no rank;1;0): ERROR:  column  
> "taxon_id" is
> of type integer but expression is of type character varying
> HINT:  You will need to rewrite or cast the expression.
> ---------------------------------------------------------------------- 
> ------------------------------------------------------------------
>
> The schema expected from the biosqldb-views-pg.sql or taxonomy dump
> file  does not match  the one in biosqldb-pg.sql.
>
>
> Best regards,
>
> -- 
> Samuel Thoraval
> LIBROPHYT, Bioinformatique
> Centre de Cadarache
> B?timent 185, DEVM
> 13108 St Paul-Lez-Durance
> France
> T?l:  +33 442 574 799
> Fax: +33 442 574 439
> e-mail : samuel.thoraval at librophyt.com
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sun Jul  2 13:44:21 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 2 Jul 2006 13:44:21 -0400
Subject: [BioSQL-l] Versioning of features
In-Reply-To: <s4a5406d.040@ohsu.edu>
References: <s4a5406d.040@ohsu.edu>
Message-ID: <39FD8AB6-26F2-40B6-A3BC-42A42A42A06F@gmx.net>

It should be straightforward. In essence you control it through the  
source type which as you say is an ontology term.

You can for instance include the software version in the source term.  
This is what I did for the BLAT-derived genome mappings in SymAtlas  
(which runs on top of BioSQL). This wouldn't even necessitate to  
'obsolete' a previous source term.

You'd only have to do that if you wanted to have the exact same name  
for the source term, and have old and new 'version' term in the same  
ontology. I probably wouldn't be in much favor of doing so because  
then you don't have an explicit version anywhere. However, of course  
if you include it into the name then if compared by name two source  
types appear different even though they are effectively the same  
(e.g., same algorithm), just different versions. You can take care of  
that though by introducing 'parent' source (e.g. algorithm) terms  
that would have the versioned ones as children.

Let me know if this doesn't help.

	-hilmar

On Jun 30, 2006, at 6:16 PM, Sandie Peters wrote:

> In the BioSQL v. 1.0 schema overview, the author briefly mentions  
> the possibility of feature set versioning using "dated" source  
> ontology terms.  Has anyone tried this or any other versioning  
> methods with seqfeatures in BioSQL?
>
> Thanks,
> Sandie Peters
> Vollum Institute/OHSU
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From darin.london at duke.edu  Mon Jul  3 08:41:33 2006
From: darin.london at duke.edu (Darin London)
Date: Mon, 03 Jul 2006 08:41:33 -0400
Subject: [BioSQL-l] Call For Birds of a Feather Suggestions
Message-ID: <44A9107D.2050304@duke.edu>

The BOSC organizing comittee is currently seeking suggestions for Birds
of a Feather meeting ideas. Birds of a Feather meetings are one of the
more popular activities at BOSC, occurring at the end of each days
session. These are free-form meetings organized by the attendees
themselves to discuss one or a few topics of interest in greater detail.
BOF?s have been formed to allow developers and users of individual OBF
software to meet each other face-to-face to discuss the project, or to
discuss completely new ideas, and even start new software development
projects. These meetings offer a unique opportunity for individuals to
explore more about the activities of the various Open Source Projects,
and, in some cases, even take an active role influencing the future of
Open Source Software development. If you would like to create a BOF,
just sign up for a wiki account, login, and edit the <a
href="http://www.open-bio.org/wiki/BOSC_2006/Birds-of-a-Feather">BOSC
2006 Birds of a Feather page</a>.

From hlapp at gmx.net  Mon Jul  3 13:04:48 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 3 Jul 2006 13:04:48 -0400
Subject: [BioSQL-l] a biosql/biojavax localization question
In-Reply-To: <44996380.6060300@autohandle.com>
References: <44996380.6060300@autohandle.com>
Message-ID: <D58A55AA-3849-483F-9726-D0E3C8FB2EB5@gmx.net>

Hi David, sorry for dropping (or rather, not ever picking up) the  
ball on this ... got lost in inbox stack.

The earlier consensus was if I recall correctly to include  
is_circular as a biosequence attribute in the 1.1 version.

isTaxonHidden is new to me and I don't even understand what it would  
mean. Can you elaborate?

	-hilmar

On Jun 21, 2006, at 11:19 AM, David Scott wrote:

> biojavax is using hibernate to o/r map the biosql database to biojavax
> objects. biojavax is planning support in the biojavax objects for  
> fields
> not directly supported in the biosql database (e.g. isCircular,
> isTaxonHidden). in order to conform to the current biosql database,  
> the
> default mapping file from biosql to biojavax will comment out the
> unsupported fields (so the object fields will not be initialized) and
> the objects will default an appropriate conforming value (e.g.  
> false for
> isCircular and isTaxonHidden). for users wishing to localize biojavax:
> the user would uncomment the mapping file and alter the database  
> tables.
> altering the database would require running ddl on the existing  
> database
> to create the new table columns. what is the best way to review and  
> then
> distribute the alter/create ddl for users to localize their database?
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Mon Jul  3 14:07:10 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 3 Jul 2006 14:07:10 -0400
Subject: [BioSQL-l] a biosql/biojavax localization question
In-Reply-To: <44A95A2E.8000203@autohandle.com>
References: <44996380.6060300@autohandle.com>
	<D58A55AA-3849-483F-9726-D0E3C8FB2EB5@gmx.net>
	<44A95A2E.8000203@autohandle.com>
Message-ID: <DCF4100B-5C15-4C16-9013-68DEC5B929FB@gmx.net>

Hi David, I wish I were in the south of France soaking up sun ...  
although there is no shortage of sun (or heat for that matter, and  
throw humidity in there too) where I am.

Is_Circular is a general attribute that will apply to any sequence  
(given the fact that many sequences are indeed circular). This, and  
the fact that one may even want to search for it, would justify  
inclusion directly as a column in the biosequence table.

Is_Taxon_Hidden is one of those attributes that BioSQL by design  
handles through attribute/value associations, that is, using ontology  
term associations that have a value (the term is the attribute name).

However, there is no taxon_qualifier_value table in BioSQL, so in  
essence you are asking for adding that table.

Does anybody else have ideas for taxon attributes for which this  
table may be used?

I don't really favor a proliferation of 'localized' versions of  
BioSQL - this tends to defeat the purpose both of the rationale  
behind a standardized persistence interface, as well as the design of  
the schema for ultimate extensibility through weak typing and the use  
of controlled vocabularies.

Any thoughts to this end welcome.

	-hilmar

On Jul 3, 2006, at 1:55 PM, David Scott wrote:

> sure hilmar-
>
> in the genbank taxonomy file - nodes.dmp:
> ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt
> there is a field:
>
> GenBank hidden flag (1 or 0)            -- 1 if name is suppressed  
> in GenBank entry lineage
>
> this field controls whether the level is included in the taxonomy  
> hierarchy when the genbank ORGANISM section is generated - but the  
> more general problem trying to be solved is:
> o parse genbank entries
> o store parsed entry in biosql
> o pull parsed entry from biosql
> o (re)create the genbank entry
> o compare the recreated entry with the source document for  
> identity. well - ok - almost identical.
>
> there are several parameters missing from biosql to make this  
> possible. the general approach to a solution has been:
> o alter the biosql table to add a new column (a sql ddl file)
> o add a private get/set for the column in the biojavax object (a  
> java file)
> o add the column to the biojavax hibernate o/r mapping (an xml file)
>
> to help others that might have the same objective, and to  
> accomodate those that don't wish these nonstandard columns  - it is  
> planned to release the o/r mapping files with the additional  
> columns/fields commented out - these xml files along with the java  
> files are checked out with cvs. it was not clear what to do with  
> the ddl files - and it would be helpful to have them reviewed - no  
> matter what is done with them.
>
> thanks for helping me - i just assumed you were late in responding  
> because it is summer - and, well - you were in the the south of  
> france soaking up the sun.
>
> looking to you for suggestions-
> david
>
>
> Hilmar Lapp wrote:
>> Hi David, sorry for dropping (or rather, not ever picking up) the  
>> ball on this ... got lost in inbox stack.
>>
>> The earlier consensus was if I recall correctly to include  
>> is_circular as a biosequence attribute in the 1.1 version.
>>
>> isTaxonHidden is new to me and I don't even understand what it  
>> would mean. Can you elaborate?
>>
>>     -hilmar
>>
>> On Jun 21, 2006, at 11:19 AM, David Scott wrote:
>>
>>> biojavax is using hibernate to o/r map the biosql database to  
>>> biojavax
>>> objects. biojavax is planning support in the biojavax objects for  
>>> fields
>>> not directly supported in the biosql database (e.g. isCircular,
>>> isTaxonHidden). in order to conform to the current biosql  
>>> database, the
>>> default mapping file from biosql to biojavax will comment out the
>>> unsupported fields (so the object fields will not be initialized)  
>>> and
>>> the objects will default an appropriate conforming value (e.g.  
>>> false for
>>> isCircular and isTaxonHidden). for users wishing to localize  
>>> biojavax:
>>> the user would uncomment the mapping file and alter the database  
>>> tables.
>>> altering the database would require running ddl on the existing  
>>> database
>>> to create the new table columns. what is the best way to review  
>>> and then
>>> distribute the alter/create ddl for users to localize their  
>>> database?
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>
>>
>> --===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>>
>>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From david at autohandle.com  Mon Jul  3 13:55:58 2006
From: david at autohandle.com (David Scott)
Date: Mon, 03 Jul 2006 10:55:58 -0700
Subject: [BioSQL-l] a biosql/biojavax localization question
In-Reply-To: <D58A55AA-3849-483F-9726-D0E3C8FB2EB5@gmx.net>
References: <44996380.6060300@autohandle.com>
	<D58A55AA-3849-483F-9726-D0E3C8FB2EB5@gmx.net>
Message-ID: <44A95A2E.8000203@autohandle.com>

sure hilmar-

in the genbank taxonomy file - nodes.dmp:
ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt
there is a field:

GenBank hidden flag (1 or 0)            -- 1 if name is suppressed in GenBank entry lineage
 

this field controls whether the level is included in the taxonomy 
hierarchy when the genbank ORGANISM section is generated - but the more 
general problem trying to be solved is:
o parse genbank entries
o store parsed entry in biosql
o pull parsed entry from biosql
o (re)create the genbank entry
o compare the recreated entry with the source document for identity. 
well - ok - almost identical.

there are several parameters missing from biosql to make this possible. 
the general approach to a solution has been:
o alter the biosql table to add a new column (a sql ddl file)
o add a private get/set for the column in the biojavax object (a java file)
o add the column to the biojavax hibernate o/r mapping (an xml file)

to help others that might have the same objective, and to accomodate 
those that don't wish these nonstandard columns  - it is planned to 
release the o/r mapping files with the additional columns/fields 
commented out - these xml files along with the java files are checked 
out with cvs. it was not clear what to do with the ddl files - and it 
would be helpful to have them reviewed - no matter what is done with them.

thanks for helping me - i just assumed you were late in responding 
because it is summer - and, well - you were in the the south of france 
soaking up the sun.

looking to you for suggestions-
david


Hilmar Lapp wrote:
> Hi David, sorry for dropping (or rather, not ever picking up) the ball 
> on this ... got lost in inbox stack.
>
> The earlier consensus was if I recall correctly to include is_circular 
> as a biosequence attribute in the 1.1 version.
>
> isTaxonHidden is new to me and I don't even understand what it would 
> mean. Can you elaborate?
>
>     -hilmar
>
> On Jun 21, 2006, at 11:19 AM, David Scott wrote:
>
>> biojavax is using hibernate to o/r map the biosql database to biojavax
>> objects. biojavax is planning support in the biojavax objects for fields
>> not directly supported in the biosql database (e.g. isCircular,
>> isTaxonHidden). in order to conform to the current biosql database, the
>> default mapping file from biosql to biojavax will comment out the
>> unsupported fields (so the object fields will not be initialized) and
>> the objects will default an appropriate conforming value (e.g. false for
>> isCircular and isTaxonHidden). for users wishing to localize biojavax:
>> the user would uncomment the mapping file and alter the database tables.
>> altering the database would require running ddl on the existing database
>> to create the new table columns. what is the best way to review and then
>> distribute the alter/create ddl for users to localize their database?
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
>
> --===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>
>


From mark.schreiber at novartis.com  Tue Jul  4 01:48:43 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 4 Jul 2006 13:48:43 +0800
Subject: [BioSQL-l] a biosql/biojavax localization question
Message-ID: <OF8A1EB55F.3E5ED684-ON482571A1.001E8840-482571A1.001FED0B@EU.novartis.net>

>Is_Circular is a general attribute that will apply to any sequence 
>(given the fact that many sequences are indeed circular). This, and 
>the fact that one may even want to search for it, would justify 
>inclusion directly as a column in the biosequence table.
>
>Is_Taxon_Hidden is one of those attributes that BioSQL by design 
>handles through attribute/value associations, that is, using ontology 
>term associations that have a value (the term is the attribute name).
>
>However, there is no taxon_qualifier_value table in BioSQL, so in 
>essence you are asking for adding that table.
>
>Does anybody else have ideas for taxon attributes for which this 
>table may be used?

A taxon_qualifier_value table would be potentially useful. One may want to 
have conflicting taxa (taxonomists never agree) that could be 
differentiated by use of a qualifier. The hidden attribute could also be 
one. 

>I don't really favor a proliferation of 'localized' versions of 
>BioSQL - this tends to defeat the purpose both of the rationale 
>behind a standardized persistence interface, as well as the design of 
>the schema for ultimate extensibility through weak typing and the use 
>of controlled vocabularies.
>
>Any thoughts to this end welcome.

I think that the best way to avoid localized versions might be to release 
a BioSQL 1.1 as soon as possible. The is_circular column has been on the 
todo list for a very long time. The above taxon_qualifier_value table 
would also be required to give more complete persistence of genbank data. 
Is there any reason why 1.1 cannot be released promptly?

I also wonder about how likely a standardised persistence interface is 
when there is the possibility of using custom ontologies. Biojavax is much 
better at using the correct tables in BioSQL but we use our own ontology 
terms for all kinds of qualifiers. The way we persist data to BioSQL is 
undoubtably closer to BioPerlDB than the old biojava mapping but whenever 
ontology comes into it there is bound to be breaks. To be truely unified 
the two projects (and all the other bio*s) would need to use a common 
ontology. I gues I am saying what do you mean by standardised persistence?

- Mark


From richard.holland at ebi.ac.uk  Tue Jul  4 04:13:02 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Tue, 04 Jul 2006 09:13:02 +0100
Subject: [BioSQL-l] a biosql/biojavax localization question
In-Reply-To: <DCF4100B-5C15-4C16-9013-68DEC5B929FB@gmx.net>
References: <44996380.6060300@autohandle.com>
	<D58A55AA-3849-483F-9726-D0E3C8FB2EB5@gmx.net>
	<44A95A2E.8000203@autohandle.com>
	<DCF4100B-5C15-4C16-9013-68DEC5B929FB@gmx.net>
Message-ID: <1152000782.3948.36.camel@texas.ebi.ac.uk>

Personally I'd like to see *_qualifier_value tables for all BioSQL
tables that represents an entity of any kind, be it term, feature,
location, sequence, taxon, or anything else. 

In the case of is_taxon_hidden, this is specific to an individual taxon,
and I can see cases where it would be appropriate to search by it (for
instance, pulling out all ancestors of a given taxon that are visible).
So I think this should be an additional column.

By the way, is there a document somewhere detailing all the changes that
are planned for 1.1?

cheers,
Richard


On Mon, 2006-07-03 at 14:07 -0400, Hilmar Lapp wrote:
> Hi David, I wish I were in the south of France soaking up sun ...  
> although there is no shortage of sun (or heat for that matter, and  
> throw humidity in there too) where I am.
> 
> Is_Circular is a general attribute that will apply to any sequence  
> (given the fact that many sequences are indeed circular). This, and  
> the fact that one may even want to search for it, would justify  
> inclusion directly as a column in the biosequence table.
> 
> Is_Taxon_Hidden is one of those attributes that BioSQL by design  
> handles through attribute/value associations, that is, using ontology  
> term associations that have a value (the term is the attribute name).
> 
> However, there is no taxon_qualifier_value table in BioSQL, so in  
> essence you are asking for adding that table.
> 
> Does anybody else have ideas for taxon attributes for which this  
> table may be used?
> 
> I don't really favor a proliferation of 'localized' versions of  
> BioSQL - this tends to defeat the purpose both of the rationale  
> behind a standardized persistence interface, as well as the design of  
> the schema for ultimate extensibility through weak typing and the use  
> of controlled vocabularies.
> 
> Any thoughts to this end welcome.
> 
> 	-hilmar
> 
> On Jul 3, 2006, at 1:55 PM, David Scott wrote:
> 
> > sure hilmar-
> >
> > in the genbank taxonomy file - nodes.dmp:
> > ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt
> > there is a field:
> >
> > GenBank hidden flag (1 or 0)            -- 1 if name is suppressed  
> > in GenBank entry lineage
> >
> > this field controls whether the level is included in the taxonomy  
> > hierarchy when the genbank ORGANISM section is generated - but the  
> > more general problem trying to be solved is:
> > o parse genbank entries
> > o store parsed entry in biosql
> > o pull parsed entry from biosql
> > o (re)create the genbank entry
> > o compare the recreated entry with the source document for  
> > identity. well - ok - almost identical.
> >
> > there are several parameters missing from biosql to make this  
> > possible. the general approach to a solution has been:
> > o alter the biosql table to add a new column (a sql ddl file)
> > o add a private get/set for the column in the biojavax object (a  
> > java file)
> > o add the column to the biojavax hibernate o/r mapping (an xml file)
> >
> > to help others that might have the same objective, and to  
> > accomodate those that don't wish these nonstandard columns  - it is  
> > planned to release the o/r mapping files with the additional  
> > columns/fields commented out - these xml files along with the java  
> > files are checked out with cvs. it was not clear what to do with  
> > the ddl files - and it would be helpful to have them reviewed - no  
> > matter what is done with them.
> >
> > thanks for helping me - i just assumed you were late in responding  
> > because it is summer - and, well - you were in the the south of  
> > france soaking up the sun.
> >
> > looking to you for suggestions-
> > david
> >
> >
> > Hilmar Lapp wrote:
> >> Hi David, sorry for dropping (or rather, not ever picking up) the  
> >> ball on this ... got lost in inbox stack.
> >>
> >> The earlier consensus was if I recall correctly to include  
> >> is_circular as a biosequence attribute in the 1.1 version.
> >>
> >> isTaxonHidden is new to me and I don't even understand what it  
> >> would mean. Can you elaborate?
> >>
> >>     -hilmar
> >>
> >> On Jun 21, 2006, at 11:19 AM, David Scott wrote:
> >>
> >>> biojavax is using hibernate to o/r map the biosql database to  
> >>> biojavax
> >>> objects. biojavax is planning support in the biojavax objects for  
> >>> fields
> >>> not directly supported in the biosql database (e.g. isCircular,
> >>> isTaxonHidden). in order to conform to the current biosql  
> >>> database, the
> >>> default mapping file from biosql to biojavax will comment out the
> >>> unsupported fields (so the object fields will not be initialized)  
> >>> and
> >>> the objects will default an appropriate conforming value (e.g.  
> >>> false for
> >>> isCircular and isTaxonHidden). for users wishing to localize  
> >>> biojavax:
> >>> the user would uncomment the mapping file and alter the database  
> >>> tables.
> >>> altering the database would require running ddl on the existing  
> >>> database
> >>> to create the new table columns. what is the best way to review  
> >>> and then
> >>> distribute the alter/create ddl for users to localize their  
> >>> database?
> >>> _______________________________________________
> >>> BioSQL-l mailing list
> >>> BioSQL-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biosql-l
> >>>
> >>
> >> --===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From hlapp at gmx.net  Wed Jul  5 00:04:12 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 5 Jul 2006 00:04:12 -0400
Subject: [BioSQL-l] a biosql/biojavax localization question
In-Reply-To: <1152000782.3948.36.camel@texas.ebi.ac.uk>
References: <44996380.6060300@autohandle.com>
	<D58A55AA-3849-483F-9726-D0E3C8FB2EB5@gmx.net>
	<44A95A2E.8000203@autohandle.com>
	<DCF4100B-5C15-4C16-9013-68DEC5B929FB@gmx.net>
	<1152000782.3948.36.camel@texas.ebi.ac.uk>
Message-ID: <D3C41A52-6ED2-4FD7-B285-D35A6980DB48@gmx.net>


On Jul 4, 2006, at 4:13 AM, Richard Holland wrote:

> Personally I'd like to see *_qualifier_value tables for all BioSQL
> tables that represents an entity of any kind, be it term, feature,
> location, sequence, taxon, or anything else.

I can see that making sense. Basically what it would say is that  
every entity in BioSQL is derivable, as opposed to final, in an OO  
sense.

In fact, there aren't many entities that don't have a qualifier_value  
association table yet. Adding one for biodatabase would have been in  
my book of 1.1 changes as I use it in SymAtlas already.

>
>
> In the case of is_taxon_hidden, this is specific to an individual  
> taxon,
> and I can see cases where it would be appropriate to search by it (for
> instance, pulling out all ancestors of a given taxon that are  
> visible).
> So I think this should be an additional column.

I would like to ask that a systematist. I have not seen it anywhere  
else in a taxonomy other than NCBI's. I'm not convinced it's a good  
idea to elevate NCBI's (or anybody else's) idiosyncrasies to columns  
in the Bio* persistence interface.

>
> By the way, is there a document somewhere detailing all the changes  
> that
> are planned for 1.1?

No, not yet. Good point though. Volunteers for starting one are  
welcome ... :-)

	-hilmar


>
> cheers,
> Richard
>
>
> On Mon, 2006-07-03 at 14:07 -0400, Hilmar Lapp wrote:
>> Hi David, I wish I were in the south of France soaking up sun ...
>> although there is no shortage of sun (or heat for that matter, and
>> throw humidity in there too) where I am.
>>
>> Is_Circular is a general attribute that will apply to any sequence
>> (given the fact that many sequences are indeed circular). This, and
>> the fact that one may even want to search for it, would justify
>> inclusion directly as a column in the biosequence table.
>>
>> Is_Taxon_Hidden is one of those attributes that BioSQL by design
>> handles through attribute/value associations, that is, using ontology
>> term associations that have a value (the term is the attribute name).
>>
>> However, there is no taxon_qualifier_value table in BioSQL, so in
>> essence you are asking for adding that table.
>>
>> Does anybody else have ideas for taxon attributes for which this
>> table may be used?
>>
>> I don't really favor a proliferation of 'localized' versions of
>> BioSQL - this tends to defeat the purpose both of the rationale
>> behind a standardized persistence interface, as well as the design of
>> the schema for ultimate extensibility through weak typing and the use
>> of controlled vocabularies.
>>
>> Any thoughts to this end welcome.
>>
>> 	-hilmar
>>
>> On Jul 3, 2006, at 1:55 PM, David Scott wrote:
>>
>>> sure hilmar-
>>>
>>> in the genbank taxonomy file - nodes.dmp:
>>> ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt
>>> there is a field:
>>>
>>> GenBank hidden flag (1 or 0)            -- 1 if name is suppressed
>>> in GenBank entry lineage
>>>
>>> this field controls whether the level is included in the taxonomy
>>> hierarchy when the genbank ORGANISM section is generated - but the
>>> more general problem trying to be solved is:
>>> o parse genbank entries
>>> o store parsed entry in biosql
>>> o pull parsed entry from biosql
>>> o (re)create the genbank entry
>>> o compare the recreated entry with the source document for
>>> identity. well - ok - almost identical.
>>>
>>> there are several parameters missing from biosql to make this
>>> possible. the general approach to a solution has been:
>>> o alter the biosql table to add a new column (a sql ddl file)
>>> o add a private get/set for the column in the biojavax object (a
>>> java file)
>>> o add the column to the biojavax hibernate o/r mapping (an xml file)
>>>
>>> to help others that might have the same objective, and to
>>> accomodate those that don't wish these nonstandard columns  - it is
>>> planned to release the o/r mapping files with the additional
>>> columns/fields commented out - these xml files along with the java
>>> files are checked out with cvs. it was not clear what to do with
>>> the ddl files - and it would be helpful to have them reviewed - no
>>> matter what is done with them.
>>>
>>> thanks for helping me - i just assumed you were late in responding
>>> because it is summer - and, well - you were in the the south of
>>> france soaking up the sun.
>>>
>>> looking to you for suggestions-
>>> david
>>>
>>>
>>> Hilmar Lapp wrote:
>>>> Hi David, sorry for dropping (or rather, not ever picking up) the
>>>> ball on this ... got lost in inbox stack.
>>>>
>>>> The earlier consensus was if I recall correctly to include
>>>> is_circular as a biosequence attribute in the 1.1 version.
>>>>
>>>> isTaxonHidden is new to me and I don't even understand what it
>>>> would mean. Can you elaborate?
>>>>
>>>>     -hilmar
>>>>
>>>> On Jun 21, 2006, at 11:19 AM, David Scott wrote:
>>>>
>>>>> biojavax is using hibernate to o/r map the biosql database to
>>>>> biojavax
>>>>> objects. biojavax is planning support in the biojavax objects for
>>>>> fields
>>>>> not directly supported in the biosql database (e.g. isCircular,
>>>>> isTaxonHidden). in order to conform to the current biosql
>>>>> database, the
>>>>> default mapping file from biosql to biojavax will comment out the
>>>>> unsupported fields (so the object fields will not be initialized)
>>>>> and
>>>>> the objects will default an appropriate conforming value (e.g.
>>>>> false for
>>>>> isCircular and isTaxonHidden). for users wishing to localize
>>>>> biojavax:
>>>>> the user would uncomment the mapping file and alter the database
>>>>> tables.
>>>>> altering the database would require running ddl on the existing
>>>>> database
>>>>> to create the new table columns. what is the best way to review
>>>>> and then
>>>>> distribute the alter/create ddl for users to localize their
>>>>> database?
>>>>> _______________________________________________
>>>>> BioSQL-l mailing list
>>>>> BioSQL-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>>>
>>>>
>>>> --===========================================================
>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
> -- 
> Richard Holland (BioMart Team)
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> UNITED KINGDOM
> Tel: +44-(0)1223-494416
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed Jul  5 08:47:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 5 Jul 2006 08:47:05 -0400
Subject: [BioSQL-l] a biosql/biojavax localization question
In-Reply-To: <1152093096.3948.82.camel@texas.ebi.ac.uk>
References: <44996380.6060300@autohandle.com>
	<D58A55AA-3849-483F-9726-D0E3C8FB2EB5@gmx.net>
	<44A95A2E.8000203@autohandle.com>
	<DCF4100B-5C15-4C16-9013-68DEC5B929FB@gmx.net>
	<1152000782.3948.36.camel@texas.ebi.ac.uk>
	<D3C41A52-6ED2-4FD7-B285-D35A6980DB48@gmx.net>
	<1152093096.3948.82.camel@texas.ebi.ac.uk>
Message-ID: <B251A89A-5BE9-4F74-BC0E-32F615C1CAE6@gmx.net>

Alright - but was a nice try, no?

On Jul 5, 2006, at 5:51 AM, Richard Holland wrote:

> I think you should create it as you are the only one at present who
> knows what is already planned and what is not! :)
>
> cheers,
> Richard
>
> On Wed, 2006-07-05 at 00:04 -0400, Hilmar Lapp wrote:
>> On Jul 4, 2006, at 4:13 AM, Richard Holland wrote:
>>
>>> Personally I'd like to see *_qualifier_value tables for all BioSQL
>>> tables that represents an entity of any kind, be it term, feature,
>>> location, sequence, taxon, or anything else.
>>
>> I can see that making sense. Basically what it would say is that
>> every entity in BioSQL is derivable, as opposed to final, in an OO
>> sense.
>>
>> In fact, there aren't many entities that don't have a qualifier_value
>> association table yet. Adding one for biodatabase would have been in
>> my book of 1.1 changes as I use it in SymAtlas already.
>>
>>>
>>>
>>> In the case of is_taxon_hidden, this is specific to an individual
>>> taxon,
>>> and I can see cases where it would be appropriate to search by it  
>>> (for
>>> instance, pulling out all ancestors of a given taxon that are
>>> visible).
>>> So I think this should be an additional column.
>>
>> I would like to ask that a systematist. I have not seen it anywhere
>> else in a taxonomy other than NCBI's. I'm not convinced it's a good
>> idea to elevate NCBI's (or anybody else's) idiosyncrasies to columns
>> in the Bio* persistence interface.
>>
>>>
>>> By the way, is there a document somewhere detailing all the changes
>>> that
>>> are planned for 1.1?
>>
>> No, not yet. Good point though. Volunteers for starting one are
>> welcome ... :-)
>>
>> 	-hilmar
>>
>>
>>>
>>> cheers,
>>> Richard
>>>
>>>
>>> On Mon, 2006-07-03 at 14:07 -0400, Hilmar Lapp wrote:
>>>> Hi David, I wish I were in the south of France soaking up sun ...
>>>> although there is no shortage of sun (or heat for that matter, and
>>>> throw humidity in there too) where I am.
>>>>
>>>> Is_Circular is a general attribute that will apply to any sequence
>>>> (given the fact that many sequences are indeed circular). This, and
>>>> the fact that one may even want to search for it, would justify
>>>> inclusion directly as a column in the biosequence table.
>>>>
>>>> Is_Taxon_Hidden is one of those attributes that BioSQL by design
>>>> handles through attribute/value associations, that is, using  
>>>> ontology
>>>> term associations that have a value (the term is the attribute  
>>>> name).
>>>>
>>>> However, there is no taxon_qualifier_value table in BioSQL, so in
>>>> essence you are asking for adding that table.
>>>>
>>>> Does anybody else have ideas for taxon attributes for which this
>>>> table may be used?
>>>>
>>>> I don't really favor a proliferation of 'localized' versions of
>>>> BioSQL - this tends to defeat the purpose both of the rationale
>>>> behind a standardized persistence interface, as well as the  
>>>> design of
>>>> the schema for ultimate extensibility through weak typing and  
>>>> the use
>>>> of controlled vocabularies.
>>>>
>>>> Any thoughts to this end welcome.
>>>>
>>>> 	-hilmar
>>>>
>>>> On Jul 3, 2006, at 1:55 PM, David Scott wrote:
>>>>
>>>>> sure hilmar-
>>>>>
>>>>> in the genbank taxonomy file - nodes.dmp:
>>>>> ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt
>>>>> there is a field:
>>>>>
>>>>> GenBank hidden flag (1 or 0)            -- 1 if name is suppressed
>>>>> in GenBank entry lineage
>>>>>
>>>>> this field controls whether the level is included in the taxonomy
>>>>> hierarchy when the genbank ORGANISM section is generated - but the
>>>>> more general problem trying to be solved is:
>>>>> o parse genbank entries
>>>>> o store parsed entry in biosql
>>>>> o pull parsed entry from biosql
>>>>> o (re)create the genbank entry
>>>>> o compare the recreated entry with the source document for
>>>>> identity. well - ok - almost identical.
>>>>>
>>>>> there are several parameters missing from biosql to make this
>>>>> possible. the general approach to a solution has been:
>>>>> o alter the biosql table to add a new column (a sql ddl file)
>>>>> o add a private get/set for the column in the biojavax object (a
>>>>> java file)
>>>>> o add the column to the biojavax hibernate o/r mapping (an xml  
>>>>> file)
>>>>>
>>>>> to help others that might have the same objective, and to
>>>>> accomodate those that don't wish these nonstandard columns  -  
>>>>> it is
>>>>> planned to release the o/r mapping files with the additional
>>>>> columns/fields commented out - these xml files along with the java
>>>>> files are checked out with cvs. it was not clear what to do with
>>>>> the ddl files - and it would be helpful to have them reviewed - no
>>>>> matter what is done with them.
>>>>>
>>>>> thanks for helping me - i just assumed you were late in responding
>>>>> because it is summer - and, well - you were in the the south of
>>>>> france soaking up the sun.
>>>>>
>>>>> looking to you for suggestions-
>>>>> david
>>>>>
>>>>>
>>>>> Hilmar Lapp wrote:
>>>>>> Hi David, sorry for dropping (or rather, not ever picking up) the
>>>>>> ball on this ... got lost in inbox stack.
>>>>>>
>>>>>> The earlier consensus was if I recall correctly to include
>>>>>> is_circular as a biosequence attribute in the 1.1 version.
>>>>>>
>>>>>> isTaxonHidden is new to me and I don't even understand what it
>>>>>> would mean. Can you elaborate?
>>>>>>
>>>>>>     -hilmar
>>>>>>
>>>>>> On Jun 21, 2006, at 11:19 AM, David Scott wrote:
>>>>>>
>>>>>>> biojavax is using hibernate to o/r map the biosql database to
>>>>>>> biojavax
>>>>>>> objects. biojavax is planning support in the biojavax objects  
>>>>>>> for
>>>>>>> fields
>>>>>>> not directly supported in the biosql database (e.g. isCircular,
>>>>>>> isTaxonHidden). in order to conform to the current biosql
>>>>>>> database, the
>>>>>>> default mapping file from biosql to biojavax will comment out  
>>>>>>> the
>>>>>>> unsupported fields (so the object fields will not be  
>>>>>>> initialized)
>>>>>>> and
>>>>>>> the objects will default an appropriate conforming value (e.g.
>>>>>>> false for
>>>>>>> isCircular and isTaxonHidden). for users wishing to localize
>>>>>>> biojavax:
>>>>>>> the user would uncomment the mapping file and alter the database
>>>>>>> tables.
>>>>>>> altering the database would require running ddl on the existing
>>>>>>> database
>>>>>>> to create the new table columns. what is the best way to review
>>>>>>> and then
>>>>>>> distribute the alter/create ddl for users to localize their
>>>>>>> database?
>>>>>>> _______________________________________________
>>>>>>> BioSQL-l mailing list
>>>>>>> BioSQL-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>>>>>
>>>>>>
>>>>>> --===========================================================
>>>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>>>> ===========================================================
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>> -- 
>>> Richard Holland (BioMart Team)
>>> EMBL-EBI
>>> Wellcome Trust Genome Campus
>>> Hinxton
>>> Cambridge CB10 1SD
>>> UNITED KINGDOM
>>> Tel: +44-(0)1223-494416
>>>
>>
> -- 
> Richard Holland (BioMart Team)
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> UNITED KINGDOM
> Tel: +44-(0)1223-494416
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From richard.holland at ebi.ac.uk  Wed Jul  5 05:51:35 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Wed, 05 Jul 2006 10:51:35 +0100
Subject: [BioSQL-l] a biosql/biojavax localization question
In-Reply-To: <D3C41A52-6ED2-4FD7-B285-D35A6980DB48@gmx.net>
References: <44996380.6060300@autohandle.com>
	<D58A55AA-3849-483F-9726-D0E3C8FB2EB5@gmx.net>
	<44A95A2E.8000203@autohandle.com>
	<DCF4100B-5C15-4C16-9013-68DEC5B929FB@gmx.net>
	<1152000782.3948.36.camel@texas.ebi.ac.uk>
	<D3C41A52-6ED2-4FD7-B285-D35A6980DB48@gmx.net>
Message-ID: <1152093096.3948.82.camel@texas.ebi.ac.uk>

I think you should create it as you are the only one at present who
knows what is already planned and what is not! :)

cheers,
Richard

On Wed, 2006-07-05 at 00:04 -0400, Hilmar Lapp wrote:
> On Jul 4, 2006, at 4:13 AM, Richard Holland wrote:
> 
> > Personally I'd like to see *_qualifier_value tables for all BioSQL
> > tables that represents an entity of any kind, be it term, feature,
> > location, sequence, taxon, or anything else.
> 
> I can see that making sense. Basically what it would say is that  
> every entity in BioSQL is derivable, as opposed to final, in an OO  
> sense.
> 
> In fact, there aren't many entities that don't have a qualifier_value  
> association table yet. Adding one for biodatabase would have been in  
> my book of 1.1 changes as I use it in SymAtlas already.
> 
> >
> >
> > In the case of is_taxon_hidden, this is specific to an individual  
> > taxon,
> > and I can see cases where it would be appropriate to search by it (for
> > instance, pulling out all ancestors of a given taxon that are  
> > visible).
> > So I think this should be an additional column.
> 
> I would like to ask that a systematist. I have not seen it anywhere  
> else in a taxonomy other than NCBI's. I'm not convinced it's a good  
> idea to elevate NCBI's (or anybody else's) idiosyncrasies to columns  
> in the Bio* persistence interface.
> 
> >
> > By the way, is there a document somewhere detailing all the changes  
> > that
> > are planned for 1.1?
> 
> No, not yet. Good point though. Volunteers for starting one are  
> welcome ... :-)
> 
> 	-hilmar
> 
> 
> >
> > cheers,
> > Richard
> >
> >
> > On Mon, 2006-07-03 at 14:07 -0400, Hilmar Lapp wrote:
> >> Hi David, I wish I were in the south of France soaking up sun ...
> >> although there is no shortage of sun (or heat for that matter, and
> >> throw humidity in there too) where I am.
> >>
> >> Is_Circular is a general attribute that will apply to any sequence
> >> (given the fact that many sequences are indeed circular). This, and
> >> the fact that one may even want to search for it, would justify
> >> inclusion directly as a column in the biosequence table.
> >>
> >> Is_Taxon_Hidden is one of those attributes that BioSQL by design
> >> handles through attribute/value associations, that is, using ontology
> >> term associations that have a value (the term is the attribute name).
> >>
> >> However, there is no taxon_qualifier_value table in BioSQL, so in
> >> essence you are asking for adding that table.
> >>
> >> Does anybody else have ideas for taxon attributes for which this
> >> table may be used?
> >>
> >> I don't really favor a proliferation of 'localized' versions of
> >> BioSQL - this tends to defeat the purpose both of the rationale
> >> behind a standardized persistence interface, as well as the design of
> >> the schema for ultimate extensibility through weak typing and the use
> >> of controlled vocabularies.
> >>
> >> Any thoughts to this end welcome.
> >>
> >> 	-hilmar
> >>
> >> On Jul 3, 2006, at 1:55 PM, David Scott wrote:
> >>
> >>> sure hilmar-
> >>>
> >>> in the genbank taxonomy file - nodes.dmp:
> >>> ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt
> >>> there is a field:
> >>>
> >>> GenBank hidden flag (1 or 0)            -- 1 if name is suppressed
> >>> in GenBank entry lineage
> >>>
> >>> this field controls whether the level is included in the taxonomy
> >>> hierarchy when the genbank ORGANISM section is generated - but the
> >>> more general problem trying to be solved is:
> >>> o parse genbank entries
> >>> o store parsed entry in biosql
> >>> o pull parsed entry from biosql
> >>> o (re)create the genbank entry
> >>> o compare the recreated entry with the source document for
> >>> identity. well - ok - almost identical.
> >>>
> >>> there are several parameters missing from biosql to make this
> >>> possible. the general approach to a solution has been:
> >>> o alter the biosql table to add a new column (a sql ddl file)
> >>> o add a private get/set for the column in the biojavax object (a
> >>> java file)
> >>> o add the column to the biojavax hibernate o/r mapping (an xml file)
> >>>
> >>> to help others that might have the same objective, and to
> >>> accomodate those that don't wish these nonstandard columns  - it is
> >>> planned to release the o/r mapping files with the additional
> >>> columns/fields commented out - these xml files along with the java
> >>> files are checked out with cvs. it was not clear what to do with
> >>> the ddl files - and it would be helpful to have them reviewed - no
> >>> matter what is done with them.
> >>>
> >>> thanks for helping me - i just assumed you were late in responding
> >>> because it is summer - and, well - you were in the the south of
> >>> france soaking up the sun.
> >>>
> >>> looking to you for suggestions-
> >>> david
> >>>
> >>>
> >>> Hilmar Lapp wrote:
> >>>> Hi David, sorry for dropping (or rather, not ever picking up) the
> >>>> ball on this ... got lost in inbox stack.
> >>>>
> >>>> The earlier consensus was if I recall correctly to include
> >>>> is_circular as a biosequence attribute in the 1.1 version.
> >>>>
> >>>> isTaxonHidden is new to me and I don't even understand what it
> >>>> would mean. Can you elaborate?
> >>>>
> >>>>     -hilmar
> >>>>
> >>>> On Jun 21, 2006, at 11:19 AM, David Scott wrote:
> >>>>
> >>>>> biojavax is using hibernate to o/r map the biosql database to
> >>>>> biojavax
> >>>>> objects. biojavax is planning support in the biojavax objects for
> >>>>> fields
> >>>>> not directly supported in the biosql database (e.g. isCircular,
> >>>>> isTaxonHidden). in order to conform to the current biosql
> >>>>> database, the
> >>>>> default mapping file from biosql to biojavax will comment out the
> >>>>> unsupported fields (so the object fields will not be initialized)
> >>>>> and
> >>>>> the objects will default an appropriate conforming value (e.g.
> >>>>> false for
> >>>>> isCircular and isTaxonHidden). for users wishing to localize
> >>>>> biojavax:
> >>>>> the user would uncomment the mapping file and alter the database
> >>>>> tables.
> >>>>> altering the database would require running ddl on the existing
> >>>>> database
> >>>>> to create the new table columns. what is the best way to review
> >>>>> and then
> >>>>> distribute the alter/create ddl for users to localize their
> >>>>> database?
> >>>>> _______________________________________________
> >>>>> BioSQL-l mailing list
> >>>>> BioSQL-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
> >>>>>
> >>>>
> >>>> --===========================================================
> >>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >>>> ===========================================================
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>
> > -- 
> > Richard Holland (BioMart Team)
> > EMBL-EBI
> > Wellcome Trust Genome Campus
> > Hinxton
> > Cambridge CB10 1SD
> > UNITED KINGDOM
> > Tel: +44-(0)1223-494416
> >
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From pim.van.nierop at falw.vu.nl  Wed Jul  5 09:53:39 2006
From: pim.van.nierop at falw.vu.nl (Pim van Nierop)
Date: Wed, 05 Jul 2006 15:53:39 +0200
Subject: [BioSQL-l] Prolem with loading bioseqsql scheme
Message-ID: <85343C76-6149-4439-B410-4D04B642D567@falw.vu.nl>

Hello all,

I have just started out exploring using bioSQL in combination with PERL
scripting to run a local instance of GenBank on mySQL at my lab. I have
to appologize for my ignorance beforehand, as I do not know much about
mySQL.

I followed the instructions as provided on the BioPerl wiki page on how
to start using bioSQL with bioPerl. Unfortunately, I seem to get stuck
when loading my newly created  database named "bioseqdb" with
"biosqldb-mysql.sql" file.

I use this command:
> mysql -u root -p bioseqdb < c:\biosqldb-mysql.sql

This generates the following error:
ERROR 1005 (HY000) at line 39: Can't create table
'.\bioseqdb\biodatabase.frm' (errno: 121)

I looked on th einternet what the errorcode ERROR 1005 errno: 121 means.
It seems it has something to do with foreign keys, but I have no clue
how to act from here.

Could someone please explain what I am doing wrong?

Oh yeah, I use a windows XP system.

All the best,

Pim

--  
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
*-*-*-*-*-

       Pim van Nierop

       Department of Molecular and Cellular Neurobiology
       Faculty of Earth and Life Sciences
       Vrije Universiteit
       Amsterdam

       Tel. +31 (0)20 5987114
       Fax. +31 (0)20 5987112

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
*-*-*-*-*-

_______________________________________________
Open-Bio-l mailing list
Open-Bio-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/open-bio-l


From hlapp at gmx.net  Thu Jul  6 07:44:38 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 6 Jul 2006 07:44:38 -0400
Subject: [BioSQL-l] [Open-bio-l] [Fwd: Prolem with loading bioseqsql
	scheme]
In-Reply-To: <44ACD59C.3020604@falw.vu.nl>
References: <44ACD59C.3020604@falw.vu.nl>
Message-ID: <E70F87D6-4E67-4D02-B828-B5945C7DC7F9@gmx.net>

Hi Pim, I forwarded your email to biosql-l at lists.open-bio.org, which  
is where the BioSQL discussions take place. I wanted to respond  
yesterday but didn't get to respond to it.

The page to subscribe to biosql-l is at
http://obda.open-bio.org/mailman/listinfo/biosql-l

	-hilmar


On Jul 6, 2006, at 5:19 AM, Pim van Nierop wrote:

> I resend this message as I shipped it before my participation to this
> mailing list was confirmed. I am sorry if its a double post.
>
> -------- Original Message --------
> Subject: 	Prolem with loading bioseqsql scheme
> Date: 	Wed, 05 Jul 2006 15:53:39 +0200
> From: 	Pim van Nierop <pim.van.nierop at falw.vu.nl>
> To: 	open-bio-l at lists.open-bio.org
>
>
>
> Hello all,
>
> I have just started out exploring using bioSQL in combination with  
> PERL
> scripting to run a local instance of GenBank on mySQL at my lab. I  
> have
> to appologize for my ignorance beforehand, as I do not know much about
> mySQL.
>
> I followed the instructions as provided on the BioPerl wiki page on  
> how
> to start using bioSQL with bioPerl. Unfortunately, I seem to get stuck
> when loading my newly created  database named "bioseqdb" with
> "biosqldb-mysql.sql" file.
>
> I use this command:
>> mysql -u root -p bioseqdb < c:\biosqldb-mysql.sql
>
> This generates the following error:
> ERROR 1005 (HY000) at line 39: Can't create table
> '.\bioseqdb\biodatabase.frm' (errno: 121)
>
> I looked on th einternet what the errorcode ERROR 1005 errno: 121  
> means.
> It seems it has something to do with foreign keys, but I have no clue
> how to act from here.
>
> Could someone please explain what I am doing wrong?
>
> Oh yeah, I use a windows XP system.
>
> All the best,
>
> Pim
>
> -- 
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
> *-*-*-*-*-*-
>
>      Pim van Nierop
>
>      Department of Molecular and Cellular Neurobiology
>      Faculty of Earth and Life Sciences
>      Vrije Universiteit
>      Amsterdam
>
>      Tel. +31 (0)20 5987114
>      Fax. +31 (0)20 5987112
>
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
> *-*-*-*-*-*-
>
>
>
>
> -- 
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
> *-*-*-*-*-*-
>
>       Pim van Nierop
>
>       Department of Molecular and Cellular Neurobiology
>       Faculty of Earth and Life Sciences
>       Vrije Universiteit
>       Amsterdam
>
>       Tel. +31 (0)20 5987114
>       Fax. +31 (0)20 5987112
>
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
> *-*-*-*-*-*-
>
> _______________________________________________
> Open-Bio-l mailing list
> Open-Bio-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/open-bio-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From pim.van.nierop at falw.vu.nl  Sat Jul  8 07:19:04 2006
From: pim.van.nierop at falw.vu.nl (Pim van Nierop)
Date: Sat, 08 Jul 2006 13:19:04 +0200
Subject: [BioSQL-l]  Prolem with loading bioseqsql scheme
Message-ID: <44AF94A8.8030501@falw.vu.nl>

Hello all,

I have been experimenting myself a little and it turns out that the 
problem (InnoDB Error 1005 errno 121) occurs with mySQL 5.0, but not 
with mySQL 4.1.

I will continue to use 4.1 to create a bioseq-database instead. I guess 
the 5.0 version is bugged.

Greetz, Pim

From mark.schreiber at novartis.com  Sun Jul  9 23:03:10 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Mon, 10 Jul 2006 11:03:10 +0800
Subject: [BioSQL-l] null title and CRC
Message-ID: <OF612183F3.63299C76-ON482571A7.00103304-482571A7.0010C525@EU.novartis.net>

Hi -

We are having a problem in biojava parsing some genbank records that 
contain references with no title. These cannot have a CRC value which is 
required in BioSQL. If we make the title an empty string then we quickly 
get non-unique CRC numbers.

What does BioPerl do in these cases?

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910


From hlapp at gmx.net  Sun Jul  9 23:22:26 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 9 Jul 2006 23:22:26 -0400
Subject: [BioSQL-l] null title and CRC
In-Reply-To: <OF612183F3.63299C76-ON482571A7.00103304-482571A7.0010C525@EU.novartis.net>
References: <OF612183F3.63299C76-ON482571A7.00103304-482571A7.0010C525@EU.novartis.net>
Message-ID: <92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net>

The CRC for references uses the authors, title, and location  
attributes in Bioperl-db, and empty (or null) strings default to the  
string "<undef>".

If title is empty and authors and location do not distinguish two  
references, then why do you want to have two rows for those  
references? Basically, there are identical for all intents and  
purposes, or are they not?

	-hilmar

On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote:

> Hi -
>
> We are having a problem in biojava parsing some genbank records that
> contain references with no title. These cannot have a CRC value  
> which is
> required in BioSQL. If we make the title an empty string then we  
> quickly
> get non-unique CRC numbers.
>
> What does BioPerl do in these cases?
>
> - Mark
>
> Mark Schreiber
> Research Investigator (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax  +65 6722 2910
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mark.schreiber at novartis.com  Thu Jul 13 01:23:18 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Thu, 13 Jul 2006 13:23:18 +0800
Subject: [BioSQL-l] Abstracts and Full Text on References
Message-ID: <OFE521208C.0D5B6141-ON482571AA.001D4655-482571AA.001D9990@EU.novartis.net>

Hi -

As an enhancement for a future version of BioSQL it would be nice to have 
CLOB rows for abstract and full text (Full text might need to be a BLOB 
depending on format). Obviously they could both be null.

Alternatively they could be in another table linked to Reference. I don't 
know if it could be done via the term relationship method??

Any thoughts?

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910


From hlapp at gmx.net  Thu Jul 13 12:59:04 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 13 Jul 2006 12:59:04 -0400
Subject: [BioSQL-l] Abstracts and Full Text on References
In-Reply-To: <OFE521208C.0D5B6141-ON482571AA.001D4655-482571AA.001D9990@EU.novartis.net>
References: <OFE521208C.0D5B6141-ON482571AA.001D4655-482571AA.001D9990@EU.novartis.net>
Message-ID: <21289F28-309E-4A81-B326-E939838A5820@gmx.net>

Sounds reasonable to me. Attribute association wouldn't be desirable  
I think (it would only bloat and overload the value field).

The only thing I'd be concerned about is accumulating stuff that is  
not supported by the language bindings ... i.e., bioperl doesn't  
support this, and so there isn't a way for bioperl-db to do so  
either. What are the plans for Biojava?

Are any Biopython or Bioruby folks on this list? Any comments from  
those fronts?

	-hilmar

On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote:

> Hi -
>
> As an enhancement for a future version of BioSQL it would be nice  
> to have
> CLOB rows for abstract and full text (Full text might need to be a  
> BLOB
> depending on format). Obviously they could both be null.
>
> Alternatively they could be in another table linked to Reference. I  
> don't
> know if it could be done via the term relationship method??
>
> Any thoughts?
>
> - Mark
>
> Mark Schreiber
> Research Investigator (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax  +65 6722 2910
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mark.schreiber at novartis.com  Thu Jul 13 21:56:13 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 14 Jul 2006 09:56:13 +0800
Subject: [BioSQL-l] Abstracts and Full Text on References
Message-ID: <OF1C5F6CCB.3A1DB798-ON482571AB.000A4D90-482571AB.000AA3DF@EU.novartis.net>

Hello -

There are no specific plans for biojava although the Reference object 
could easily be modified to contain 

String getAbstract()
void setAbstract(String abs)
etc.

I wonder if the full text of an article should be a byte[] or BLOB or a 
String/ CLOB. Are people more likely to want to store a PDF (usually more 
available) or a parsed String?

- Mark


Hilmar Lapp <hlapp at gmx.net>
07/14/2006 12:59 AM

 
        To:     mark.schreiber at novartis.com
        cc:     biosql-l at open-bio.org
        Subject:        Re: [BioSQL-l] Abstracts and Full Text on References


Sounds reasonable to me. Attribute association wouldn't be desirable 
I think (it would only bloat and overload the value field).

The only thing I'd be concerned about is accumulating stuff that is 
not supported by the language bindings ... i.e., bioperl doesn't 
support this, and so there isn't a way for bioperl-db to do so 
either. What are the plans for Biojava?

Are any Biopython or Bioruby folks on this list? Any comments from 
those fronts?

                 -hilmar

On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote:

> Hi -
>
> As an enhancement for a future version of BioSQL it would be nice 
> to have
> CLOB rows for abstract and full text (Full text might need to be a 
> BLOB
> depending on format). Obviously they could both be null.
>
> Alternatively they could be in another table linked to Reference. I 
> don't
> know if it could be done via the term relationship method??
>
> Any thoughts?
>
> - Mark
>
> Mark Schreiber
> Research Investigator (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax  +65 6722 2910
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Fri Jul 14 07:24:19 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 14 Jul 2006 07:24:19 -0400
Subject: [BioSQL-l] Abstracts and Full Text on References
In-Reply-To: <1152864626.3943.61.camel@texas.ebi.ac.uk>
References: <OF1C5F6CCB.3A1DB798-ON482571AB.000A4D90-482571AB.000AA3DF@EU.novartis.net>
	<1152864626.3943.61.camel@texas.ebi.ac.uk>
Message-ID: <748F3120-1FD3-4DF8-A0D7-EF9EE0414A14@gmx.net>

Right. I like this. However, it also suggests to have an additional  
table. Who knows what other fields one will want to know for an  
abstract. Also, plenty of references will never have an abstract,  
e.g. automatic submissions, ontology term references etc.

	-hilmar

On Jul 14, 2006, at 4:10 AM, Richard Holland wrote:

> Make it a BLOB and add another column indicating the MIME type of the
> BLOB.
>
> 	BLOB abstract
> 	VARCHAR abstract_mime_type
>
> Then if you stored a PDF in it you could set abstract_mime_type to
> 'application/x-pdf', or if it was plain text, you could set the
> abstract_mime_type to 'text/plain'.
>
> cheers,
> Richard
>
> On Fri, 2006-07-14 at 09:56 +0800, mark.schreiber at novartis.com wrote:
>> Hello -
>>
>> There are no specific plans for biojava although the Reference object
>> could easily be modified to contain
>>
>> String getAbstract()
>> void setAbstract(String abs)
>> etc.
>>
>> I wonder if the full text of an article should be a byte[] or BLOB  
>> or a
>> String/ CLOB. Are people more likely to want to store a PDF  
>> (usually more
>> available) or a parsed String?
>>
>> - Mark
>>
>>
>>
>>
>>
>> Hilmar Lapp <hlapp at gmx.net>
>> 07/14/2006 12:59 AM
>>
>>
>>         To:     mark.schreiber at novartis.com
>>         cc:     biosql-l at open-bio.org
>>         Subject:        Re: [BioSQL-l] Abstracts and Full Text on  
>> References
>>
>>
>> Sounds reasonable to me. Attribute association wouldn't be desirable
>> I think (it would only bloat and overload the value field).
>>
>> The only thing I'd be concerned about is accumulating stuff that is
>> not supported by the language bindings ... i.e., bioperl doesn't
>> support this, and so there isn't a way for bioperl-db to do so
>> either. What are the plans for Biojava?
>>
>> Are any Biopython or Bioruby folks on this list? Any comments from
>> those fronts?
>>
>>                  -hilmar
>>
>> On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote:
>>
>>> Hi -
>>>
>>> As an enhancement for a future version of BioSQL it would be nice
>>> to have
>>> CLOB rows for abstract and full text (Full text might need to be a
>>> BLOB
>>> depending on format). Obviously they could both be null.
>>>
>>> Alternatively they could be in another table linked to Reference. I
>>> don't
>>> know if it could be done via the term relationship method??
>>>
>>> Any thoughts?
>>>
>>> - Mark
>>>
>>> Mark Schreiber
>>> Research Investigator (Bioinformatics)
>>>
>>> Novartis Institute for Tropical Diseases (NITD)
>>> 10 Biopolis Road
>>> #05-01 Chromos
>>> Singapore 138670
>>> www.nitd.novartis.com
>>>
>>> phone +65 6722 2973
>>> fax  +65 6722 2910
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>
>>
> -- 
> Richard Holland (BioMart Team)
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> UNITED KINGDOM
> Tel: +44-(0)1223-494416
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From david at autohandle.com  Fri Jul 14 13:48:50 2006
From: david at autohandle.com (David Scott)
Date: Fri, 14 Jul 2006 10:48:50 -0700
Subject: [BioSQL-l] null title and CRC
In-Reply-To: <92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net>
References: <OF612183F3.63299C76-ON482571A7.00103304-482571A7.0010C525@EU.novartis.net>
	<92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net>
Message-ID: <44B7D902.6040804@autohandle.com>

we are currently using "<undef>" in the crc calculation for the case 
where the title is empty (or null) - i can extend that for authors and 
location - what should we be storing the the table: "<undef>", empty, or 
null?

thanks-
david

p.s. fog for sale:
http://www.sfgate.com/liveviews/


Hilmar Lapp wrote:
> The CRC for references uses the authors, title, and location  
> attributes in Bioperl-db, and empty (or null) strings default to the  
> string "<undef>".
>
> If title is empty and authors and location do not distinguish two  
> references, then why do you want to have two rows for those  
> references? Basically, there are identical for all intents and  
> purposes, or are they not?
>
> 	-hilmar
>
> On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote:
>
>   
>> Hi -
>>
>> We are having a problem in biojava parsing some genbank records that
>> contain references with no title. These cannot have a CRC value  
>> which is
>> required in BioSQL. If we make the title an empty string then we  
>> quickly
>> get non-unique CRC numbers.
>>
>> What does BioPerl do in these cases?
>>
>> - Mark
>>
>> Mark Schreiber
>> Research Investigator (Bioinformatics)
>>
>> Novartis Institute for Tropical Diseases (NITD)
>> 10 Biopolis Road
>> #05-01 Chromos
>> Singapore 138670
>> www.nitd.novartis.com
>>
>> phone +65 6722 2973
>> fax  +65 6722 2910
>>
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
>>     
>
>   


From hlapp at gmx.net  Fri Jul 14 14:31:44 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 14 Jul 2006 14:31:44 -0400
Subject: [BioSQL-l] null title and CRC
In-Reply-To: <44B7D902.6040804@autohandle.com>
References: <OF612183F3.63299C76-ON482571A7.00103304-482571A7.0010C525@EU.novartis.net>
	<92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net>
	<44B7D902.6040804@autohandle.com>
Message-ID: <AC06A099-46F8-40F9-A90D-FD7A5EF49087@gmx.net>

In the table you store the value of the attribute, not a default that  
substitutes for it in some calculation. I.e., either null or an empty  
string, depending on what the value is. (in Oracle an empty string is  
treated as null.)

	-hilmar
On Jul 14, 2006, at 1:48 PM, David Scott wrote:

> we are currently using "<undef>" in the crc calculation for the  
> case where the title is empty (or null) - i can extend that for  
> authors and location - what should we be storing the the table:  
> "<undef>", empty, or null?
>
> thanks-
> david
>
> p.s. fog for sale:
> http://www.sfgate.com/liveviews/
>
>
> Hilmar Lapp wrote:
>> The CRC for references uses the authors, title, and location
>> attributes in Bioperl-db, and empty (or null) strings default to the
>> string "<undef>".
>>
>> If title is empty and authors and location do not distinguish two
>> references, then why do you want to have two rows for those
>> references? Basically, there are identical for all intents and
>> purposes, or are they not?
>>
>> 	-hilmar
>>
>> On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote:
>>
>>
>>> Hi -
>>>
>>> We are having a problem in biojava parsing some genbank records that
>>> contain references with no title. These cannot have a CRC value
>>> which is
>>> required in BioSQL. If we make the title an empty string then we
>>> quickly
>>> get non-unique CRC numbers.
>>>
>>> What does BioPerl do in these cases?
>>>
>>> - Mark
>>>
>>> Mark Schreiber
>>> Research Investigator (Bioinformatics)
>>>
>>> Novartis Institute for Tropical Diseases (NITD)
>>> 10 Biopolis Road
>>> #05-01 Chromos
>>> Singapore 138670
>>> www.nitd.novartis.com
>>>
>>> phone +65 6722 2973
>>> fax  +65 6722 2910
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>
>>>
>>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From david at autohandle.com  Fri Jul 14 14:51:18 2006
From: david at autohandle.com (David Scott)
Date: Fri, 14 Jul 2006 11:51:18 -0700
Subject: [BioSQL-l] null title and CRC
In-Reply-To: <AC06A099-46F8-40F9-A90D-FD7A5EF49087@gmx.net>
References: <OF612183F3.63299C76-ON482571A7.00103304-482571A7.0010C525@EU.novartis.net>
	<92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net>
	<44B7D902.6040804@autohandle.com>
	<AC06A099-46F8-40F9-A90D-FD7A5EF49087@gmx.net>
Message-ID: <44B7E7A6.9040300@autohandle.com>

ok, then - in the case of genbank: i'm going to try to treat missing 
titles as null - store them in the object as null - and provide them to 
the hibernate o/r mapping as null - presumably they will go into the 
table as null.

best-

Hilmar Lapp wrote:
> In the table you store the value of the attribute, not a default that 
> substitutes for it in some calculation. I.e., either null or an empty 
> string, depending on what the value is. (in Oracle an empty string is 
> treated as null.)
>
> -hilmar
> On Jul 14, 2006, at 1:48 PM, David Scott wrote:
>
>> we are currently using "<undef>" in the crc calculation for the case 
>> where the title is empty (or null) - i can extend that for authors 
>> and location - what should we be storing the the table: "<undef>", 
>> empty, or null?
>>
>> thanks-
>> david
>>
>> p.s. fog for sale:
>> http://www.sfgate.com/liveviews/
>>
>>
>> Hilmar Lapp wrote:
>>> The CRC for references uses the authors, title, and location  
>>> attributes in Bioperl-db, and empty (or null) strings default to the  
>>> string "".
>>>
>>> If title is empty and authors and location do not distinguish two  
>>> references, then why do you want to have two rows for those  
>>> references? Basically, there are identical for all intents and  
>>> purposes, or are they not?
>>>
>>> 	-hilmar
>>>
>>> On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote:
>>>
>>>   
>>>> Hi -
>>>>
>>>> We are having a problem in biojava parsing some genbank records that
>>>> contain references with no title. These cannot have a CRC value  
>>>> which is
>>>> required in BioSQL. If we make the title an empty string then we  
>>>> quickly
>>>> get non-unique CRC numbers.
>>>>
>>>> What does BioPerl do in these cases?
>>>>
>>>> - Mark
>>>>
>>>> Mark Schreiber
>>>> Research Investigator (Bioinformatics)
>>>>
>>>> Novartis Institute for Tropical Diseases (NITD)
>>>> 10 Biopolis Road
>>>> #05-01 Chromos
>>>> Singapore 138670
>>>> www.nitd.novartis.com
>>>>
>>>> phone +65 6722 2973
>>>> fax  +65 6722 2910
>>>>
>>>> _______________________________________________
>>>> BioSQL-l mailing list
>>>> BioSQL-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>>
>>>>     
>>>   
>>
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>


From richard.holland at ebi.ac.uk  Thu Jul 13 04:14:55 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Thu, 13 Jul 2006 09:14:55 +0100
Subject: [BioSQL-l] Abstracts and Full Text on References
In-Reply-To: <OFE521208C.0D5B6141-ON482571AA.001D4655-482571AA.001D9990@EU.novartis.net>
References: <OFE521208C.0D5B6141-ON482571AA.001D4655-482571AA.001D9990@EU.novartis.net>
Message-ID: <1152778496.3943.51.camel@texas.ebi.ac.uk>

I'd like to enhance that request by asking for individual author records
instead of a single string, and a flag indicating the type of
publication - eg. journal, book, article, conference paper, etc.

On Thu, 2006-07-13 at 13:23 +0800, mark.schreiber at novartis.com wrote:
> Hi -
> 
> As an enhancement for a future version of BioSQL it would be nice to have 
> CLOB rows for abstract and full text (Full text might need to be a BLOB 
> depending on format). Obviously they could both be null.
> 
> Alternatively they could be in another table linked to Reference. I don't 
> know if it could be done via the term relationship method??
> 
> Any thoughts?
> 
> - Mark
> 
> Mark Schreiber
> Research Investigator (Bioinformatics)
> 
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
> 
> phone +65 6722 2973
> fax  +65 6722 2910
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From richard.holland at ebi.ac.uk  Fri Jul 14 04:10:25 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Fri, 14 Jul 2006 09:10:25 +0100
Subject: [BioSQL-l] Abstracts and Full Text on References
In-Reply-To: <OF1C5F6CCB.3A1DB798-ON482571AB.000A4D90-482571AB.000AA3DF@EU.novartis.net>
References: <OF1C5F6CCB.3A1DB798-ON482571AB.000A4D90-482571AB.000AA3DF@EU.novartis.net>
Message-ID: <1152864626.3943.61.camel@texas.ebi.ac.uk>

Make it a BLOB and add another column indicating the MIME type of the
BLOB.

	BLOB abstract
	VARCHAR abstract_mime_type

Then if you stored a PDF in it you could set abstract_mime_type to
'application/x-pdf', or if it was plain text, you could set the
abstract_mime_type to 'text/plain'.

cheers,
Richard

On Fri, 2006-07-14 at 09:56 +0800, mark.schreiber at novartis.com wrote:
> Hello -
> 
> There are no specific plans for biojava although the Reference object 
> could easily be modified to contain 
> 
> String getAbstract()
> void setAbstract(String abs)
> etc.
> 
> I wonder if the full text of an article should be a byte[] or BLOB or a 
> String/ CLOB. Are people more likely to want to store a PDF (usually more 
> available) or a parsed String?
> 
> - Mark
> 
> 
> 
> 
> 
> Hilmar Lapp <hlapp at gmx.net>
> 07/14/2006 12:59 AM
> 
>  
>         To:     mark.schreiber at novartis.com
>         cc:     biosql-l at open-bio.org
>         Subject:        Re: [BioSQL-l] Abstracts and Full Text on References
> 
> 
> Sounds reasonable to me. Attribute association wouldn't be desirable 
> I think (it would only bloat and overload the value field).
> 
> The only thing I'd be concerned about is accumulating stuff that is 
> not supported by the language bindings ... i.e., bioperl doesn't 
> support this, and so there isn't a way for bioperl-db to do so 
> either. What are the plans for Biojava?
> 
> Are any Biopython or Bioruby folks on this list? Any comments from 
> those fronts?
> 
>                  -hilmar
> 
> On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote:
> 
> > Hi -
> >
> > As an enhancement for a future version of BioSQL it would be nice 
> > to have
> > CLOB rows for abstract and full text (Full text might need to be a 
> > BLOB
> > depending on format). Obviously they could both be null.
> >
> > Alternatively they could be in another table linked to Reference. I 
> > don't
> > know if it could be done via the term relationship method??
> >
> > Any thoughts?
> >
> > - Mark
> >
> > Mark Schreiber
> > Research Investigator (Bioinformatics)
> >
> > Novartis Institute for Tropical Diseases (NITD)
> > 10 Biopolis Road
> > #05-01 Chromos
> > Singapore 138670
> > www.nitd.novartis.com
> >
> > phone +65 6722 2973
> > fax  +65 6722 2910
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biosql-l
> >
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From richard.holland at ebi.ac.uk  Mon Jul 17 04:57:55 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Mon, 17 Jul 2006 09:57:55 +0100
Subject: [BioSQL-l] null title and CRC
In-Reply-To: <AC06A099-46F8-40F9-A90D-FD7A5EF49087@gmx.net>
References: <OF612183F3.63299C76-ON482571A7.00103304-482571A7.0010C525@EU.novartis.net>
	<92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net>
	<44B7D902.6040804@autohandle.com>
	<AC06A099-46F8-40F9-A90D-FD7A5EF49087@gmx.net>
Message-ID: <1153126675.3957.17.camel@texas.ebi.ac.uk>

Sounds good.

cheers,
Richard

On Fri, 2006-07-14 at 14:31 -0400, Hilmar Lapp wrote:
> In the table you store the value of the attribute, not a default that
> substitutes for it in some calculation. I.e., either null or an empty
> string, depending on what the value is. (in Oracle an empty string is
> treated as null.)
> 
> 
> -hilmar
> On Jul 14, 2006, at 1:48 PM, David Scott wrote:
> 
> > we are currently using "<undef>" in the crc calculation for the case
> > where the title is empty (or null) - i can extend that for authors
> > and location - what should we be storing the the table: "<undef>",
> > empty, or null?
> > 
> > thanks-
> > david
> > 
> > p.s. fog for sale:
> > http://www.sfgate.com/liveviews/
> > 
> > 
> > Hilmar Lapp wrote: 
> > > The CRC for references uses the authors, title, and location  
> > > attributes in Bioperl-db, and empty (or null) strings default to the  
> > > string "".
> > > 
> > > If title is empty and authors and location do not distinguish two  
> > > references, then why do you want to have two rows for those  
> > > references? Basically, there are identical for all intents and  
> > > purposes, or are they not?
> > > 
> > > 	-hilmar
> > > 
> > > On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote:
> > > 
> > >   
> > > > Hi -
> > > > 
> > > > We are having a problem in biojava parsing some genbank records that
> > > > contain references with no title. These cannot have a CRC value  
> > > > which is
> > > > required in BioSQL. If we make the title an empty string then we  
> > > > quickly
> > > > get non-unique CRC numbers.
> > > > 
> > > > What does BioPerl do in these cases?
> > > > 
> > > > - Mark
> > > > 
> > > > Mark Schreiber
> > > > Research Investigator (Bioinformatics)
> > > > 
> > > > Novartis Institute for Tropical Diseases (NITD)
> > > > 10 Biopolis Road
> > > > #05-01 Chromos
> > > > Singapore 138670
> > > > www.nitd.novartis.com
> > > > 
> > > > phone +65 6722 2973
> > > > fax  +65 6722 2910
> > > > 
> > > > _______________________________________________
> > > > BioSQL-l mailing list
> > > > BioSQL-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/biosql-l
> > > > 
> > > >     
> > >   
> > 
> 
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> 
> 
> 
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From mark.schreiber at novartis.com  Tue Jul 18 16:41:34 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Wed, 19 Jul 2006 04:41:34 +0800
Subject: [BioSQL-l] Abstracts and Full Text on References
Message-ID: <OF9DAFEBD1.5D661206-ON482571AF.00719004-482571AF.0071AB9B@EU.novartis.net>

Another table is probably best.

Is there a working version of BioSQL 1.1 this can be added to?

- Mark


Hilmar Lapp <hlapp at gmx.net>
07/14/2006 07:24 PM

 
        To:     Richard Holland <richard.holland at ebi.ac.uk>
        cc:     Mark Schreiber <mark.schreiber at novartis.com>, biosql-l at open-bio.org
        Subject:        Re: [BioSQL-l] Abstracts and Full Text on References


Right. I like this. However, it also suggests to have an additional 
table. Who knows what other fields one will want to know for an 
abstract. Also, plenty of references will never have an abstract, 
e.g. automatic submissions, ontology term references etc.

                 -hilmar

On Jul 14, 2006, at 4:10 AM, Richard Holland wrote:

> Make it a BLOB and add another column indicating the MIME type of the
> BLOB.
>
>                BLOB abstract
>                VARCHAR abstract_mime_type
>
> Then if you stored a PDF in it you could set abstract_mime_type to
> 'application/x-pdf', or if it was plain text, you could set the
> abstract_mime_type to 'text/plain'.
>
> cheers,
> Richard
>
> On Fri, 2006-07-14 at 09:56 +0800, mark.schreiber at novartis.com wrote:
>> Hello -
>>
>> There are no specific plans for biojava although the Reference object
>> could easily be modified to contain
>>
>> String getAbstract()
>> void setAbstract(String abs)
>> etc.
>>
>> I wonder if the full text of an article should be a byte[] or BLOB 
>> or a
>> String/ CLOB. Are people more likely to want to store a PDF 
>> (usually more
>> available) or a parsed String?
>>
>> - Mark
>>
>>
>>
>>
>>
>> Hilmar Lapp <hlapp at gmx.net>
>> 07/14/2006 12:59 AM
>>
>>
>>         To:     mark.schreiber at novartis.com
>>         cc:     biosql-l at open-bio.org
>>         Subject:        Re: [BioSQL-l] Abstracts and Full Text on 
>> References
>>
>>
>> Sounds reasonable to me. Attribute association wouldn't be desirable
>> I think (it would only bloat and overload the value field).
>>
>> The only thing I'd be concerned about is accumulating stuff that is
>> not supported by the language bindings ... i.e., bioperl doesn't
>> support this, and so there isn't a way for bioperl-db to do so
>> either. What are the plans for Biojava?
>>
>> Are any Biopython or Bioruby folks on this list? Any comments from
>> those fronts?
>>
>>                  -hilmar
>>
>> On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote:
>>
>>> Hi -
>>>
>>> As an enhancement for a future version of BioSQL it would be nice
>>> to have
>>> CLOB rows for abstract and full text (Full text might need to be a
>>> BLOB
>>> depending on format). Obviously they could both be null.
>>>
>>> Alternatively they could be in another table linked to Reference. I
>>> don't
>>> know if it could be done via the term relationship method??
>>>
>>> Any thoughts?
>>>
>>> - Mark
>>>
>>> Mark Schreiber
>>> Research Investigator (Bioinformatics)
>>>
>>> Novartis Institute for Tropical Diseases (NITD)
>>> 10 Biopolis Road
>>> #05-01 Chromos
>>> Singapore 138670
>>> www.nitd.novartis.com
>>>
>>> phone +65 6722 2973
>>> fax  +65 6722 2910
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>
>>
> -- 
> Richard Holland (BioMart Team)
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> UNITED KINGDOM
> Tel: +44-(0)1223-494416
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Tue Jul 18 16:50:25 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 18 Jul 2006 16:50:25 -0400
Subject: [BioSQL-l] Abstracts and Full Text on References
In-Reply-To: <OF9DAFEBD1.5D661206-ON482571AF.00719004-482571AF.0071AB9B@EU.novartis.net>
References: <OF9DAFEBD1.5D661206-ON482571AF.00719004-482571AF.0071AB9B@EU.novartis.net>
Message-ID: <99FEA1E7-8540-46DE-8025-9F34D8026D0C@gmx.net>

Yes and no. I was working on one at GNF. I'll have to create this in  
the repository.

	-hilmar

On Jul 18, 2006, at 4:41 PM, mark.schreiber at novartis.com wrote:

> Another table is probably best.
>
> Is there a working version of BioSQL 1.1 this can be added to?
>
> - Mark
>
>
>
>
>
> Hilmar Lapp <hlapp at gmx.net>
> 07/14/2006 07:24 PM
>
>
>         To:     Richard Holland <richard.holland at ebi.ac.uk>
>         cc:     Mark Schreiber <mark.schreiber at novartis.com>,  
> biosql-l at open-bio.org
>         Subject:        Re: [BioSQL-l] Abstracts and Full Text on  
> References
>
>
> Right. I like this. However, it also suggests to have an additional
> table. Who knows what other fields one will want to know for an
> abstract. Also, plenty of references will never have an abstract,
> e.g. automatic submissions, ontology term references etc.
>
>                  -hilmar
>
> On Jul 14, 2006, at 4:10 AM, Richard Holland wrote:
>
>> Make it a BLOB and add another column indicating the MIME type of the
>> BLOB.
>>
>>                BLOB abstract
>>                VARCHAR abstract_mime_type
>>
>> Then if you stored a PDF in it you could set abstract_mime_type to
>> 'application/x-pdf', or if it was plain text, you could set the
>> abstract_mime_type to 'text/plain'.
>>
>> cheers,
>> Richard
>>
>> On Fri, 2006-07-14 at 09:56 +0800, mark.schreiber at novartis.com wrote:
>>> Hello -
>>>
>>> There are no specific plans for biojava although the Reference  
>>> object
>>> could easily be modified to contain
>>>
>>> String getAbstract()
>>> void setAbstract(String abs)
>>> etc.
>>>
>>> I wonder if the full text of an article should be a byte[] or BLOB
>>> or a
>>> String/ CLOB. Are people more likely to want to store a PDF
>>> (usually more
>>> available) or a parsed String?
>>>
>>> - Mark
>>>
>>>
>>>
>>>
>>>
>>> Hilmar Lapp <hlapp at gmx.net>
>>> 07/14/2006 12:59 AM
>>>
>>>
>>>         To:     mark.schreiber at novartis.com
>>>         cc:     biosql-l at open-bio.org
>>>         Subject:        Re: [BioSQL-l] Abstracts and Full Text on
>>> References
>>>
>>>
>>> Sounds reasonable to me. Attribute association wouldn't be desirable
>>> I think (it would only bloat and overload the value field).
>>>
>>> The only thing I'd be concerned about is accumulating stuff that is
>>> not supported by the language bindings ... i.e., bioperl doesn't
>>> support this, and so there isn't a way for bioperl-db to do so
>>> either. What are the plans for Biojava?
>>>
>>> Are any Biopython or Bioruby folks on this list? Any comments from
>>> those fronts?
>>>
>>>                  -hilmar
>>>
>>> On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote:
>>>
>>>> Hi -
>>>>
>>>> As an enhancement for a future version of BioSQL it would be nice
>>>> to have
>>>> CLOB rows for abstract and full text (Full text might need to be a
>>>> BLOB
>>>> depending on format). Obviously they could both be null.
>>>>
>>>> Alternatively they could be in another table linked to Reference. I
>>>> don't
>>>> know if it could be done via the term relationship method??
>>>>
>>>> Any thoughts?
>>>>
>>>> - Mark
>>>>
>>>> Mark Schreiber
>>>> Research Investigator (Bioinformatics)
>>>>
>>>> Novartis Institute for Tropical Diseases (NITD)
>>>> 10 Biopolis Road
>>>> #05-01 Chromos
>>>> Singapore 138670
>>>> www.nitd.novartis.com
>>>>
>>>> phone +65 6722 2973
>>>> fax  +65 6722 2910
>>>>
>>>> _______________________________________________
>>>> BioSQL-l mailing list
>>>> BioSQL-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>>
>>>
>> -- 
>> Richard Holland (BioMart Team)
>> EMBL-EBI
>> Wellcome Trust Genome Campus
>> Hinxton
>> Cambridge CB10 1SD
>> UNITED KINGDOM
>> Tel: +44-(0)1223-494416
>>
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sun Jul  2 13:20:53 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 2 Jul 2006 09:20:53 -0400
Subject: [BioSQL-l] BioSQL Schema problem
In-Reply-To: <44A275E5.2040104@librophyt.com>
References: <44A275E5.2040104@librophyt.com>
Message-ID: <2F4506F2-84FC-412A-9BC5-8E3C92E086C8@gmx.net>

The biosqldb-views-pg.sql is badly outdated I notice. Sorry about  
that. Are you sure you need it? (Most applications will not.)

I probably shouldn't just delete but try to update it. The offending  
seqfeature_key table has long been removed from the schema and you  
can safely delete the view definition from the file, but there may be  
a few more errors given its age.

I need to investigate the script's failure on inserting nodes - this  
is assuming that you put the file by hand in the right place.  
Apparently there is an alphanumerical value that gets parsed as the  
taxon id (which must be numeric indeed).

--download is a switch and hence does not take any arguments, -- 
download 0 does ask to download, which is why you see the error. I  
don't know why the download fails, maybe there's a problem with  
extended ftp mode (EPSV/EPRT commands) but I don't know off hand how  
you disable them in Net::FTP.

	-hilmar

On Jun 28, 2006, at 8:28 AM, Samuel Thoraval wrote:

>
> Hello,
>
> I am new to biosql and I have 2 problems installing last CVS version
> (*1.4.2.1*, /Sun Jun 16)/:
> - running biosqldb-views-pg.sql after biosqldb-pg.sql gives errors,  
> the
> first one being:
> psql:biosqldb-views-pg.sql:6: ERROR:  relation "seqfeature_key"  
> does not
> exist
> - running load_ncbi_taxonomy.pl with
> ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
> <ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz> (the script
> download option set to 1 doesn't download anything) gives the  
> following
> error :
> ---------------------------------------------------------------------- 
> ------------------------------------------------------------------
>  ./scripts/load_ncbi_taxonomy.pl --dbname bioseqdb --driver Pg -- 
> download 0
> gunzip: taxdata/taxdump.tar.gz: No such file or directory
> tar: taxdump.tar: ne peut open: Aucun fichier ou r?pertoire de ce type
> tar: Erreur non r?cup?rable: fin de l'ex?cution imm?diate
> Loading NCBI taxon database in taxdata:
>         ... retrieving all taxon nodes in the database
>         ... reading in taxon nodes from nodes.dmp
>         ... insert / update / delete taxon nodes
> failed to insert node (1;1;1;no rank;1;0): ERROR:  column  
> "taxon_id" is
> of type integer but expression is of type character varying
> HINT:  You will need to rewrite or cast the expression.
> ---------------------------------------------------------------------- 
> ------------------------------------------------------------------
>
> The schema expected from the biosqldb-views-pg.sql or taxonomy dump
> file  does not match  the one in biosqldb-pg.sql.
>
>
> Best regards,
>
> -- 
> Samuel Thoraval
> LIBROPHYT, Bioinformatique
> Centre de Cadarache
> B?timent 185, DEVM
> 13108 St Paul-Lez-Durance
> France
> T?l:  +33 442 574 799
> Fax: +33 442 574 439
> e-mail : samuel.thoraval at librophyt.com
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sun Jul  2 17:44:21 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 2 Jul 2006 13:44:21 -0400
Subject: [BioSQL-l] Versioning of features
In-Reply-To: <s4a5406d.040@ohsu.edu>
References: <s4a5406d.040@ohsu.edu>
Message-ID: <39FD8AB6-26F2-40B6-A3BC-42A42A42A06F@gmx.net>

It should be straightforward. In essence you control it through the  
source type which as you say is an ontology term.

You can for instance include the software version in the source term.  
This is what I did for the BLAT-derived genome mappings in SymAtlas  
(which runs on top of BioSQL). This wouldn't even necessitate to  
'obsolete' a previous source term.

You'd only have to do that if you wanted to have the exact same name  
for the source term, and have old and new 'version' term in the same  
ontology. I probably wouldn't be in much favor of doing so because  
then you don't have an explicit version anywhere. However, of course  
if you include it into the name then if compared by name two source  
types appear different even though they are effectively the same  
(e.g., same algorithm), just different versions. You can take care of  
that though by introducing 'parent' source (e.g. algorithm) terms  
that would have the versioned ones as children.

Let me know if this doesn't help.

	-hilmar

On Jun 30, 2006, at 6:16 PM, Sandie Peters wrote:

> In the BioSQL v. 1.0 schema overview, the author briefly mentions  
> the possibility of feature set versioning using "dated" source  
> ontology terms.  Has anyone tried this or any other versioning  
> methods with seqfeatures in BioSQL?
>
> Thanks,
> Sandie Peters
> Vollum Institute/OHSU
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From darin.london at duke.edu  Mon Jul  3 12:41:33 2006
From: darin.london at duke.edu (Darin London)
Date: Mon, 03 Jul 2006 08:41:33 -0400
Subject: [BioSQL-l] Call For Birds of a Feather Suggestions
Message-ID: <44A9107D.2050304@duke.edu>

The BOSC organizing comittee is currently seeking suggestions for Birds
of a Feather meeting ideas. Birds of a Feather meetings are one of the
more popular activities at BOSC, occurring at the end of each days
session. These are free-form meetings organized by the attendees
themselves to discuss one or a few topics of interest in greater detail.
BOF?s have been formed to allow developers and users of individual OBF
software to meet each other face-to-face to discuss the project, or to
discuss completely new ideas, and even start new software development
projects. These meetings offer a unique opportunity for individuals to
explore more about the activities of the various Open Source Projects,
and, in some cases, even take an active role influencing the future of
Open Source Software development. If you would like to create a BOF,
just sign up for a wiki account, login, and edit the <a
href="http://www.open-bio.org/wiki/BOSC_2006/Birds-of-a-Feather">BOSC
2006 Birds of a Feather page</a>.


From hlapp at gmx.net  Mon Jul  3 17:04:48 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 3 Jul 2006 13:04:48 -0400
Subject: [BioSQL-l] a biosql/biojavax localization question
In-Reply-To: <44996380.6060300@autohandle.com>
References: <44996380.6060300@autohandle.com>
Message-ID: <D58A55AA-3849-483F-9726-D0E3C8FB2EB5@gmx.net>

Hi David, sorry for dropping (or rather, not ever picking up) the  
ball on this ... got lost in inbox stack.

The earlier consensus was if I recall correctly to include  
is_circular as a biosequence attribute in the 1.1 version.

isTaxonHidden is new to me and I don't even understand what it would  
mean. Can you elaborate?

	-hilmar

On Jun 21, 2006, at 11:19 AM, David Scott wrote:

> biojavax is using hibernate to o/r map the biosql database to biojavax
> objects. biojavax is planning support in the biojavax objects for  
> fields
> not directly supported in the biosql database (e.g. isCircular,
> isTaxonHidden). in order to conform to the current biosql database,  
> the
> default mapping file from biosql to biojavax will comment out the
> unsupported fields (so the object fields will not be initialized) and
> the objects will default an appropriate conforming value (e.g.  
> false for
> isCircular and isTaxonHidden). for users wishing to localize biojavax:
> the user would uncomment the mapping file and alter the database  
> tables.
> altering the database would require running ddl on the existing  
> database
> to create the new table columns. what is the best way to review and  
> then
> distribute the alter/create ddl for users to localize their database?
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Mon Jul  3 18:07:10 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 3 Jul 2006 14:07:10 -0400
Subject: [BioSQL-l] a biosql/biojavax localization question
In-Reply-To: <44A95A2E.8000203@autohandle.com>
References: <44996380.6060300@autohandle.com>
	<D58A55AA-3849-483F-9726-D0E3C8FB2EB5@gmx.net>
	<44A95A2E.8000203@autohandle.com>
Message-ID: <DCF4100B-5C15-4C16-9013-68DEC5B929FB@gmx.net>

Hi David, I wish I were in the south of France soaking up sun ...  
although there is no shortage of sun (or heat for that matter, and  
throw humidity in there too) where I am.

Is_Circular is a general attribute that will apply to any sequence  
(given the fact that many sequences are indeed circular). This, and  
the fact that one may even want to search for it, would justify  
inclusion directly as a column in the biosequence table.

Is_Taxon_Hidden is one of those attributes that BioSQL by design  
handles through attribute/value associations, that is, using ontology  
term associations that have a value (the term is the attribute name).

However, there is no taxon_qualifier_value table in BioSQL, so in  
essence you are asking for adding that table.

Does anybody else have ideas for taxon attributes for which this  
table may be used?

I don't really favor a proliferation of 'localized' versions of  
BioSQL - this tends to defeat the purpose both of the rationale  
behind a standardized persistence interface, as well as the design of  
the schema for ultimate extensibility through weak typing and the use  
of controlled vocabularies.

Any thoughts to this end welcome.

	-hilmar

On Jul 3, 2006, at 1:55 PM, David Scott wrote:

> sure hilmar-
>
> in the genbank taxonomy file - nodes.dmp:
> ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt
> there is a field:
>
> GenBank hidden flag (1 or 0)            -- 1 if name is suppressed  
> in GenBank entry lineage
>
> this field controls whether the level is included in the taxonomy  
> hierarchy when the genbank ORGANISM section is generated - but the  
> more general problem trying to be solved is:
> o parse genbank entries
> o store parsed entry in biosql
> o pull parsed entry from biosql
> o (re)create the genbank entry
> o compare the recreated entry with the source document for  
> identity. well - ok - almost identical.
>
> there are several parameters missing from biosql to make this  
> possible. the general approach to a solution has been:
> o alter the biosql table to add a new column (a sql ddl file)
> o add a private get/set for the column in the biojavax object (a  
> java file)
> o add the column to the biojavax hibernate o/r mapping (an xml file)
>
> to help others that might have the same objective, and to  
> accomodate those that don't wish these nonstandard columns  - it is  
> planned to release the o/r mapping files with the additional  
> columns/fields commented out - these xml files along with the java  
> files are checked out with cvs. it was not clear what to do with  
> the ddl files - and it would be helpful to have them reviewed - no  
> matter what is done with them.
>
> thanks for helping me - i just assumed you were late in responding  
> because it is summer - and, well - you were in the the south of  
> france soaking up the sun.
>
> looking to you for suggestions-
> david
>
>
> Hilmar Lapp wrote:
>> Hi David, sorry for dropping (or rather, not ever picking up) the  
>> ball on this ... got lost in inbox stack.
>>
>> The earlier consensus was if I recall correctly to include  
>> is_circular as a biosequence attribute in the 1.1 version.
>>
>> isTaxonHidden is new to me and I don't even understand what it  
>> would mean. Can you elaborate?
>>
>>     -hilmar
>>
>> On Jun 21, 2006, at 11:19 AM, David Scott wrote:
>>
>>> biojavax is using hibernate to o/r map the biosql database to  
>>> biojavax
>>> objects. biojavax is planning support in the biojavax objects for  
>>> fields
>>> not directly supported in the biosql database (e.g. isCircular,
>>> isTaxonHidden). in order to conform to the current biosql  
>>> database, the
>>> default mapping file from biosql to biojavax will comment out the
>>> unsupported fields (so the object fields will not be initialized)  
>>> and
>>> the objects will default an appropriate conforming value (e.g.  
>>> false for
>>> isCircular and isTaxonHidden). for users wishing to localize  
>>> biojavax:
>>> the user would uncomment the mapping file and alter the database  
>>> tables.
>>> altering the database would require running ddl on the existing  
>>> database
>>> to create the new table columns. what is the best way to review  
>>> and then
>>> distribute the alter/create ddl for users to localize their  
>>> database?
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>
>>
>> --===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>>
>>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From david at autohandle.com  Mon Jul  3 17:55:58 2006
From: david at autohandle.com (David Scott)
Date: Mon, 03 Jul 2006 10:55:58 -0700
Subject: [BioSQL-l] a biosql/biojavax localization question
In-Reply-To: <D58A55AA-3849-483F-9726-D0E3C8FB2EB5@gmx.net>
References: <44996380.6060300@autohandle.com>
	<D58A55AA-3849-483F-9726-D0E3C8FB2EB5@gmx.net>
Message-ID: <44A95A2E.8000203@autohandle.com>

sure hilmar-

in the genbank taxonomy file - nodes.dmp:
ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt
there is a field:

GenBank hidden flag (1 or 0)            -- 1 if name is suppressed in GenBank entry lineage
 

this field controls whether the level is included in the taxonomy 
hierarchy when the genbank ORGANISM section is generated - but the more 
general problem trying to be solved is:
o parse genbank entries
o store parsed entry in biosql
o pull parsed entry from biosql
o (re)create the genbank entry
o compare the recreated entry with the source document for identity. 
well - ok - almost identical.

there are several parameters missing from biosql to make this possible. 
the general approach to a solution has been:
o alter the biosql table to add a new column (a sql ddl file)
o add a private get/set for the column in the biojavax object (a java file)
o add the column to the biojavax hibernate o/r mapping (an xml file)

to help others that might have the same objective, and to accomodate 
those that don't wish these nonstandard columns  - it is planned to 
release the o/r mapping files with the additional columns/fields 
commented out - these xml files along with the java files are checked 
out with cvs. it was not clear what to do with the ddl files - and it 
would be helpful to have them reviewed - no matter what is done with them.

thanks for helping me - i just assumed you were late in responding 
because it is summer - and, well - you were in the the south of france 
soaking up the sun.

looking to you for suggestions-
david


Hilmar Lapp wrote:
> Hi David, sorry for dropping (or rather, not ever picking up) the ball 
> on this ... got lost in inbox stack.
>
> The earlier consensus was if I recall correctly to include is_circular 
> as a biosequence attribute in the 1.1 version.
>
> isTaxonHidden is new to me and I don't even understand what it would 
> mean. Can you elaborate?
>
>     -hilmar
>
> On Jun 21, 2006, at 11:19 AM, David Scott wrote:
>
>> biojavax is using hibernate to o/r map the biosql database to biojavax
>> objects. biojavax is planning support in the biojavax objects for fields
>> not directly supported in the biosql database (e.g. isCircular,
>> isTaxonHidden). in order to conform to the current biosql database, the
>> default mapping file from biosql to biojavax will comment out the
>> unsupported fields (so the object fields will not be initialized) and
>> the objects will default an appropriate conforming value (e.g. false for
>> isCircular and isTaxonHidden). for users wishing to localize biojavax:
>> the user would uncomment the mapping file and alter the database tables.
>> altering the database would require running ddl on the existing database
>> to create the new table columns. what is the best way to review and then
>> distribute the alter/create ddl for users to localize their database?
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
>
> --===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>
>


From mark.schreiber at novartis.com  Tue Jul  4 05:48:43 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 4 Jul 2006 13:48:43 +0800
Subject: [BioSQL-l] a biosql/biojavax localization question
Message-ID: <OF8A1EB55F.3E5ED684-ON482571A1.001E8840-482571A1.001FED0B@EU.novartis.net>

>Is_Circular is a general attribute that will apply to any sequence 
>(given the fact that many sequences are indeed circular). This, and 
>the fact that one may even want to search for it, would justify 
>inclusion directly as a column in the biosequence table.
>
>Is_Taxon_Hidden is one of those attributes that BioSQL by design 
>handles through attribute/value associations, that is, using ontology 
>term associations that have a value (the term is the attribute name).
>
>However, there is no taxon_qualifier_value table in BioSQL, so in 
>essence you are asking for adding that table.
>
>Does anybody else have ideas for taxon attributes for which this 
>table may be used?

A taxon_qualifier_value table would be potentially useful. One may want to 
have conflicting taxa (taxonomists never agree) that could be 
differentiated by use of a qualifier. The hidden attribute could also be 
one. 

>I don't really favor a proliferation of 'localized' versions of 
>BioSQL - this tends to defeat the purpose both of the rationale 
>behind a standardized persistence interface, as well as the design of 
>the schema for ultimate extensibility through weak typing and the use 
>of controlled vocabularies.
>
>Any thoughts to this end welcome.

I think that the best way to avoid localized versions might be to release 
a BioSQL 1.1 as soon as possible. The is_circular column has been on the 
todo list for a very long time. The above taxon_qualifier_value table 
would also be required to give more complete persistence of genbank data. 
Is there any reason why 1.1 cannot be released promptly?

I also wonder about how likely a standardised persistence interface is 
when there is the possibility of using custom ontologies. Biojavax is much 
better at using the correct tables in BioSQL but we use our own ontology 
terms for all kinds of qualifiers. The way we persist data to BioSQL is 
undoubtably closer to BioPerlDB than the old biojava mapping but whenever 
ontology comes into it there is bound to be breaks. To be truely unified 
the two projects (and all the other bio*s) would need to use a common 
ontology. I gues I am saying what do you mean by standardised persistence?

- Mark


From richard.holland at ebi.ac.uk  Tue Jul  4 08:13:02 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Tue, 04 Jul 2006 09:13:02 +0100
Subject: [BioSQL-l] a biosql/biojavax localization question
In-Reply-To: <DCF4100B-5C15-4C16-9013-68DEC5B929FB@gmx.net>
References: <44996380.6060300@autohandle.com>
	<D58A55AA-3849-483F-9726-D0E3C8FB2EB5@gmx.net>
	<44A95A2E.8000203@autohandle.com>
	<DCF4100B-5C15-4C16-9013-68DEC5B929FB@gmx.net>
Message-ID: <1152000782.3948.36.camel@texas.ebi.ac.uk>

Personally I'd like to see *_qualifier_value tables for all BioSQL
tables that represents an entity of any kind, be it term, feature,
location, sequence, taxon, or anything else. 

In the case of is_taxon_hidden, this is specific to an individual taxon,
and I can see cases where it would be appropriate to search by it (for
instance, pulling out all ancestors of a given taxon that are visible).
So I think this should be an additional column.

By the way, is there a document somewhere detailing all the changes that
are planned for 1.1?

cheers,
Richard


On Mon, 2006-07-03 at 14:07 -0400, Hilmar Lapp wrote:
> Hi David, I wish I were in the south of France soaking up sun ...  
> although there is no shortage of sun (or heat for that matter, and  
> throw humidity in there too) where I am.
> 
> Is_Circular is a general attribute that will apply to any sequence  
> (given the fact that many sequences are indeed circular). This, and  
> the fact that one may even want to search for it, would justify  
> inclusion directly as a column in the biosequence table.
> 
> Is_Taxon_Hidden is one of those attributes that BioSQL by design  
> handles through attribute/value associations, that is, using ontology  
> term associations that have a value (the term is the attribute name).
> 
> However, there is no taxon_qualifier_value table in BioSQL, so in  
> essence you are asking for adding that table.
> 
> Does anybody else have ideas for taxon attributes for which this  
> table may be used?
> 
> I don't really favor a proliferation of 'localized' versions of  
> BioSQL - this tends to defeat the purpose both of the rationale  
> behind a standardized persistence interface, as well as the design of  
> the schema for ultimate extensibility through weak typing and the use  
> of controlled vocabularies.
> 
> Any thoughts to this end welcome.
> 
> 	-hilmar
> 
> On Jul 3, 2006, at 1:55 PM, David Scott wrote:
> 
> > sure hilmar-
> >
> > in the genbank taxonomy file - nodes.dmp:
> > ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt
> > there is a field:
> >
> > GenBank hidden flag (1 or 0)            -- 1 if name is suppressed  
> > in GenBank entry lineage
> >
> > this field controls whether the level is included in the taxonomy  
> > hierarchy when the genbank ORGANISM section is generated - but the  
> > more general problem trying to be solved is:
> > o parse genbank entries
> > o store parsed entry in biosql
> > o pull parsed entry from biosql
> > o (re)create the genbank entry
> > o compare the recreated entry with the source document for  
> > identity. well - ok - almost identical.
> >
> > there are several parameters missing from biosql to make this  
> > possible. the general approach to a solution has been:
> > o alter the biosql table to add a new column (a sql ddl file)
> > o add a private get/set for the column in the biojavax object (a  
> > java file)
> > o add the column to the biojavax hibernate o/r mapping (an xml file)
> >
> > to help others that might have the same objective, and to  
> > accomodate those that don't wish these nonstandard columns  - it is  
> > planned to release the o/r mapping files with the additional  
> > columns/fields commented out - these xml files along with the java  
> > files are checked out with cvs. it was not clear what to do with  
> > the ddl files - and it would be helpful to have them reviewed - no  
> > matter what is done with them.
> >
> > thanks for helping me - i just assumed you were late in responding  
> > because it is summer - and, well - you were in the the south of  
> > france soaking up the sun.
> >
> > looking to you for suggestions-
> > david
> >
> >
> > Hilmar Lapp wrote:
> >> Hi David, sorry for dropping (or rather, not ever picking up) the  
> >> ball on this ... got lost in inbox stack.
> >>
> >> The earlier consensus was if I recall correctly to include  
> >> is_circular as a biosequence attribute in the 1.1 version.
> >>
> >> isTaxonHidden is new to me and I don't even understand what it  
> >> would mean. Can you elaborate?
> >>
> >>     -hilmar
> >>
> >> On Jun 21, 2006, at 11:19 AM, David Scott wrote:
> >>
> >>> biojavax is using hibernate to o/r map the biosql database to  
> >>> biojavax
> >>> objects. biojavax is planning support in the biojavax objects for  
> >>> fields
> >>> not directly supported in the biosql database (e.g. isCircular,
> >>> isTaxonHidden). in order to conform to the current biosql  
> >>> database, the
> >>> default mapping file from biosql to biojavax will comment out the
> >>> unsupported fields (so the object fields will not be initialized)  
> >>> and
> >>> the objects will default an appropriate conforming value (e.g.  
> >>> false for
> >>> isCircular and isTaxonHidden). for users wishing to localize  
> >>> biojavax:
> >>> the user would uncomment the mapping file and alter the database  
> >>> tables.
> >>> altering the database would require running ddl on the existing  
> >>> database
> >>> to create the new table columns. what is the best way to review  
> >>> and then
> >>> distribute the alter/create ddl for users to localize their  
> >>> database?
> >>> _______________________________________________
> >>> BioSQL-l mailing list
> >>> BioSQL-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biosql-l
> >>>
> >>
> >> --===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From hlapp at gmx.net  Wed Jul  5 04:04:12 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 5 Jul 2006 00:04:12 -0400
Subject: [BioSQL-l] a biosql/biojavax localization question
In-Reply-To: <1152000782.3948.36.camel@texas.ebi.ac.uk>
References: <44996380.6060300@autohandle.com>
	<D58A55AA-3849-483F-9726-D0E3C8FB2EB5@gmx.net>
	<44A95A2E.8000203@autohandle.com>
	<DCF4100B-5C15-4C16-9013-68DEC5B929FB@gmx.net>
	<1152000782.3948.36.camel@texas.ebi.ac.uk>
Message-ID: <D3C41A52-6ED2-4FD7-B285-D35A6980DB48@gmx.net>


On Jul 4, 2006, at 4:13 AM, Richard Holland wrote:

> Personally I'd like to see *_qualifier_value tables for all BioSQL
> tables that represents an entity of any kind, be it term, feature,
> location, sequence, taxon, or anything else.

I can see that making sense. Basically what it would say is that  
every entity in BioSQL is derivable, as opposed to final, in an OO  
sense.

In fact, there aren't many entities that don't have a qualifier_value  
association table yet. Adding one for biodatabase would have been in  
my book of 1.1 changes as I use it in SymAtlas already.

>
>
> In the case of is_taxon_hidden, this is specific to an individual  
> taxon,
> and I can see cases where it would be appropriate to search by it (for
> instance, pulling out all ancestors of a given taxon that are  
> visible).
> So I think this should be an additional column.

I would like to ask that a systematist. I have not seen it anywhere  
else in a taxonomy other than NCBI's. I'm not convinced it's a good  
idea to elevate NCBI's (or anybody else's) idiosyncrasies to columns  
in the Bio* persistence interface.

>
> By the way, is there a document somewhere detailing all the changes  
> that
> are planned for 1.1?

No, not yet. Good point though. Volunteers for starting one are  
welcome ... :-)

	-hilmar


>
> cheers,
> Richard
>
>
> On Mon, 2006-07-03 at 14:07 -0400, Hilmar Lapp wrote:
>> Hi David, I wish I were in the south of France soaking up sun ...
>> although there is no shortage of sun (or heat for that matter, and
>> throw humidity in there too) where I am.
>>
>> Is_Circular is a general attribute that will apply to any sequence
>> (given the fact that many sequences are indeed circular). This, and
>> the fact that one may even want to search for it, would justify
>> inclusion directly as a column in the biosequence table.
>>
>> Is_Taxon_Hidden is one of those attributes that BioSQL by design
>> handles through attribute/value associations, that is, using ontology
>> term associations that have a value (the term is the attribute name).
>>
>> However, there is no taxon_qualifier_value table in BioSQL, so in
>> essence you are asking for adding that table.
>>
>> Does anybody else have ideas for taxon attributes for which this
>> table may be used?
>>
>> I don't really favor a proliferation of 'localized' versions of
>> BioSQL - this tends to defeat the purpose both of the rationale
>> behind a standardized persistence interface, as well as the design of
>> the schema for ultimate extensibility through weak typing and the use
>> of controlled vocabularies.
>>
>> Any thoughts to this end welcome.
>>
>> 	-hilmar
>>
>> On Jul 3, 2006, at 1:55 PM, David Scott wrote:
>>
>>> sure hilmar-
>>>
>>> in the genbank taxonomy file - nodes.dmp:
>>> ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt
>>> there is a field:
>>>
>>> GenBank hidden flag (1 or 0)            -- 1 if name is suppressed
>>> in GenBank entry lineage
>>>
>>> this field controls whether the level is included in the taxonomy
>>> hierarchy when the genbank ORGANISM section is generated - but the
>>> more general problem trying to be solved is:
>>> o parse genbank entries
>>> o store parsed entry in biosql
>>> o pull parsed entry from biosql
>>> o (re)create the genbank entry
>>> o compare the recreated entry with the source document for
>>> identity. well - ok - almost identical.
>>>
>>> there are several parameters missing from biosql to make this
>>> possible. the general approach to a solution has been:
>>> o alter the biosql table to add a new column (a sql ddl file)
>>> o add a private get/set for the column in the biojavax object (a
>>> java file)
>>> o add the column to the biojavax hibernate o/r mapping (an xml file)
>>>
>>> to help others that might have the same objective, and to
>>> accomodate those that don't wish these nonstandard columns  - it is
>>> planned to release the o/r mapping files with the additional
>>> columns/fields commented out - these xml files along with the java
>>> files are checked out with cvs. it was not clear what to do with
>>> the ddl files - and it would be helpful to have them reviewed - no
>>> matter what is done with them.
>>>
>>> thanks for helping me - i just assumed you were late in responding
>>> because it is summer - and, well - you were in the the south of
>>> france soaking up the sun.
>>>
>>> looking to you for suggestions-
>>> david
>>>
>>>
>>> Hilmar Lapp wrote:
>>>> Hi David, sorry for dropping (or rather, not ever picking up) the
>>>> ball on this ... got lost in inbox stack.
>>>>
>>>> The earlier consensus was if I recall correctly to include
>>>> is_circular as a biosequence attribute in the 1.1 version.
>>>>
>>>> isTaxonHidden is new to me and I don't even understand what it
>>>> would mean. Can you elaborate?
>>>>
>>>>     -hilmar
>>>>
>>>> On Jun 21, 2006, at 11:19 AM, David Scott wrote:
>>>>
>>>>> biojavax is using hibernate to o/r map the biosql database to
>>>>> biojavax
>>>>> objects. biojavax is planning support in the biojavax objects for
>>>>> fields
>>>>> not directly supported in the biosql database (e.g. isCircular,
>>>>> isTaxonHidden). in order to conform to the current biosql
>>>>> database, the
>>>>> default mapping file from biosql to biojavax will comment out the
>>>>> unsupported fields (so the object fields will not be initialized)
>>>>> and
>>>>> the objects will default an appropriate conforming value (e.g.
>>>>> false for
>>>>> isCircular and isTaxonHidden). for users wishing to localize
>>>>> biojavax:
>>>>> the user would uncomment the mapping file and alter the database
>>>>> tables.
>>>>> altering the database would require running ddl on the existing
>>>>> database
>>>>> to create the new table columns. what is the best way to review
>>>>> and then
>>>>> distribute the alter/create ddl for users to localize their
>>>>> database?
>>>>> _______________________________________________
>>>>> BioSQL-l mailing list
>>>>> BioSQL-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>>>
>>>>
>>>> --===========================================================
>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
> -- 
> Richard Holland (BioMart Team)
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> UNITED KINGDOM
> Tel: +44-(0)1223-494416
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed Jul  5 12:47:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 5 Jul 2006 08:47:05 -0400
Subject: [BioSQL-l] a biosql/biojavax localization question
In-Reply-To: <1152093096.3948.82.camel@texas.ebi.ac.uk>
References: <44996380.6060300@autohandle.com>
	<D58A55AA-3849-483F-9726-D0E3C8FB2EB5@gmx.net>
	<44A95A2E.8000203@autohandle.com>
	<DCF4100B-5C15-4C16-9013-68DEC5B929FB@gmx.net>
	<1152000782.3948.36.camel@texas.ebi.ac.uk>
	<D3C41A52-6ED2-4FD7-B285-D35A6980DB48@gmx.net>
	<1152093096.3948.82.camel@texas.ebi.ac.uk>
Message-ID: <B251A89A-5BE9-4F74-BC0E-32F615C1CAE6@gmx.net>

Alright - but was a nice try, no?

On Jul 5, 2006, at 5:51 AM, Richard Holland wrote:

> I think you should create it as you are the only one at present who
> knows what is already planned and what is not! :)
>
> cheers,
> Richard
>
> On Wed, 2006-07-05 at 00:04 -0400, Hilmar Lapp wrote:
>> On Jul 4, 2006, at 4:13 AM, Richard Holland wrote:
>>
>>> Personally I'd like to see *_qualifier_value tables for all BioSQL
>>> tables that represents an entity of any kind, be it term, feature,
>>> location, sequence, taxon, or anything else.
>>
>> I can see that making sense. Basically what it would say is that
>> every entity in BioSQL is derivable, as opposed to final, in an OO
>> sense.
>>
>> In fact, there aren't many entities that don't have a qualifier_value
>> association table yet. Adding one for biodatabase would have been in
>> my book of 1.1 changes as I use it in SymAtlas already.
>>
>>>
>>>
>>> In the case of is_taxon_hidden, this is specific to an individual
>>> taxon,
>>> and I can see cases where it would be appropriate to search by it  
>>> (for
>>> instance, pulling out all ancestors of a given taxon that are
>>> visible).
>>> So I think this should be an additional column.
>>
>> I would like to ask that a systematist. I have not seen it anywhere
>> else in a taxonomy other than NCBI's. I'm not convinced it's a good
>> idea to elevate NCBI's (or anybody else's) idiosyncrasies to columns
>> in the Bio* persistence interface.
>>
>>>
>>> By the way, is there a document somewhere detailing all the changes
>>> that
>>> are planned for 1.1?
>>
>> No, not yet. Good point though. Volunteers for starting one are
>> welcome ... :-)
>>
>> 	-hilmar
>>
>>
>>>
>>> cheers,
>>> Richard
>>>
>>>
>>> On Mon, 2006-07-03 at 14:07 -0400, Hilmar Lapp wrote:
>>>> Hi David, I wish I were in the south of France soaking up sun ...
>>>> although there is no shortage of sun (or heat for that matter, and
>>>> throw humidity in there too) where I am.
>>>>
>>>> Is_Circular is a general attribute that will apply to any sequence
>>>> (given the fact that many sequences are indeed circular). This, and
>>>> the fact that one may even want to search for it, would justify
>>>> inclusion directly as a column in the biosequence table.
>>>>
>>>> Is_Taxon_Hidden is one of those attributes that BioSQL by design
>>>> handles through attribute/value associations, that is, using  
>>>> ontology
>>>> term associations that have a value (the term is the attribute  
>>>> name).
>>>>
>>>> However, there is no taxon_qualifier_value table in BioSQL, so in
>>>> essence you are asking for adding that table.
>>>>
>>>> Does anybody else have ideas for taxon attributes for which this
>>>> table may be used?
>>>>
>>>> I don't really favor a proliferation of 'localized' versions of
>>>> BioSQL - this tends to defeat the purpose both of the rationale
>>>> behind a standardized persistence interface, as well as the  
>>>> design of
>>>> the schema for ultimate extensibility through weak typing and  
>>>> the use
>>>> of controlled vocabularies.
>>>>
>>>> Any thoughts to this end welcome.
>>>>
>>>> 	-hilmar
>>>>
>>>> On Jul 3, 2006, at 1:55 PM, David Scott wrote:
>>>>
>>>>> sure hilmar-
>>>>>
>>>>> in the genbank taxonomy file - nodes.dmp:
>>>>> ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt
>>>>> there is a field:
>>>>>
>>>>> GenBank hidden flag (1 or 0)            -- 1 if name is suppressed
>>>>> in GenBank entry lineage
>>>>>
>>>>> this field controls whether the level is included in the taxonomy
>>>>> hierarchy when the genbank ORGANISM section is generated - but the
>>>>> more general problem trying to be solved is:
>>>>> o parse genbank entries
>>>>> o store parsed entry in biosql
>>>>> o pull parsed entry from biosql
>>>>> o (re)create the genbank entry
>>>>> o compare the recreated entry with the source document for
>>>>> identity. well - ok - almost identical.
>>>>>
>>>>> there are several parameters missing from biosql to make this
>>>>> possible. the general approach to a solution has been:
>>>>> o alter the biosql table to add a new column (a sql ddl file)
>>>>> o add a private get/set for the column in the biojavax object (a
>>>>> java file)
>>>>> o add the column to the biojavax hibernate o/r mapping (an xml  
>>>>> file)
>>>>>
>>>>> to help others that might have the same objective, and to
>>>>> accomodate those that don't wish these nonstandard columns  -  
>>>>> it is
>>>>> planned to release the o/r mapping files with the additional
>>>>> columns/fields commented out - these xml files along with the java
>>>>> files are checked out with cvs. it was not clear what to do with
>>>>> the ddl files - and it would be helpful to have them reviewed - no
>>>>> matter what is done with them.
>>>>>
>>>>> thanks for helping me - i just assumed you were late in responding
>>>>> because it is summer - and, well - you were in the the south of
>>>>> france soaking up the sun.
>>>>>
>>>>> looking to you for suggestions-
>>>>> david
>>>>>
>>>>>
>>>>> Hilmar Lapp wrote:
>>>>>> Hi David, sorry for dropping (or rather, not ever picking up) the
>>>>>> ball on this ... got lost in inbox stack.
>>>>>>
>>>>>> The earlier consensus was if I recall correctly to include
>>>>>> is_circular as a biosequence attribute in the 1.1 version.
>>>>>>
>>>>>> isTaxonHidden is new to me and I don't even understand what it
>>>>>> would mean. Can you elaborate?
>>>>>>
>>>>>>     -hilmar
>>>>>>
>>>>>> On Jun 21, 2006, at 11:19 AM, David Scott wrote:
>>>>>>
>>>>>>> biojavax is using hibernate to o/r map the biosql database to
>>>>>>> biojavax
>>>>>>> objects. biojavax is planning support in the biojavax objects  
>>>>>>> for
>>>>>>> fields
>>>>>>> not directly supported in the biosql database (e.g. isCircular,
>>>>>>> isTaxonHidden). in order to conform to the current biosql
>>>>>>> database, the
>>>>>>> default mapping file from biosql to biojavax will comment out  
>>>>>>> the
>>>>>>> unsupported fields (so the object fields will not be  
>>>>>>> initialized)
>>>>>>> and
>>>>>>> the objects will default an appropriate conforming value (e.g.
>>>>>>> false for
>>>>>>> isCircular and isTaxonHidden). for users wishing to localize
>>>>>>> biojavax:
>>>>>>> the user would uncomment the mapping file and alter the database
>>>>>>> tables.
>>>>>>> altering the database would require running ddl on the existing
>>>>>>> database
>>>>>>> to create the new table columns. what is the best way to review
>>>>>>> and then
>>>>>>> distribute the alter/create ddl for users to localize their
>>>>>>> database?
>>>>>>> _______________________________________________
>>>>>>> BioSQL-l mailing list
>>>>>>> BioSQL-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>>>>>
>>>>>>
>>>>>> --===========================================================
>>>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>>>> ===========================================================
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>> -- 
>>> Richard Holland (BioMart Team)
>>> EMBL-EBI
>>> Wellcome Trust Genome Campus
>>> Hinxton
>>> Cambridge CB10 1SD
>>> UNITED KINGDOM
>>> Tel: +44-(0)1223-494416
>>>
>>
> -- 
> Richard Holland (BioMart Team)
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> UNITED KINGDOM
> Tel: +44-(0)1223-494416
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From richard.holland at ebi.ac.uk  Wed Jul  5 09:51:35 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Wed, 05 Jul 2006 10:51:35 +0100
Subject: [BioSQL-l] a biosql/biojavax localization question
In-Reply-To: <D3C41A52-6ED2-4FD7-B285-D35A6980DB48@gmx.net>
References: <44996380.6060300@autohandle.com>
	<D58A55AA-3849-483F-9726-D0E3C8FB2EB5@gmx.net>
	<44A95A2E.8000203@autohandle.com>
	<DCF4100B-5C15-4C16-9013-68DEC5B929FB@gmx.net>
	<1152000782.3948.36.camel@texas.ebi.ac.uk>
	<D3C41A52-6ED2-4FD7-B285-D35A6980DB48@gmx.net>
Message-ID: <1152093096.3948.82.camel@texas.ebi.ac.uk>

I think you should create it as you are the only one at present who
knows what is already planned and what is not! :)

cheers,
Richard

On Wed, 2006-07-05 at 00:04 -0400, Hilmar Lapp wrote:
> On Jul 4, 2006, at 4:13 AM, Richard Holland wrote:
> 
> > Personally I'd like to see *_qualifier_value tables for all BioSQL
> > tables that represents an entity of any kind, be it term, feature,
> > location, sequence, taxon, or anything else.
> 
> I can see that making sense. Basically what it would say is that  
> every entity in BioSQL is derivable, as opposed to final, in an OO  
> sense.
> 
> In fact, there aren't many entities that don't have a qualifier_value  
> association table yet. Adding one for biodatabase would have been in  
> my book of 1.1 changes as I use it in SymAtlas already.
> 
> >
> >
> > In the case of is_taxon_hidden, this is specific to an individual  
> > taxon,
> > and I can see cases where it would be appropriate to search by it (for
> > instance, pulling out all ancestors of a given taxon that are  
> > visible).
> > So I think this should be an additional column.
> 
> I would like to ask that a systematist. I have not seen it anywhere  
> else in a taxonomy other than NCBI's. I'm not convinced it's a good  
> idea to elevate NCBI's (or anybody else's) idiosyncrasies to columns  
> in the Bio* persistence interface.
> 
> >
> > By the way, is there a document somewhere detailing all the changes  
> > that
> > are planned for 1.1?
> 
> No, not yet. Good point though. Volunteers for starting one are  
> welcome ... :-)
> 
> 	-hilmar
> 
> 
> >
> > cheers,
> > Richard
> >
> >
> > On Mon, 2006-07-03 at 14:07 -0400, Hilmar Lapp wrote:
> >> Hi David, I wish I were in the south of France soaking up sun ...
> >> although there is no shortage of sun (or heat for that matter, and
> >> throw humidity in there too) where I am.
> >>
> >> Is_Circular is a general attribute that will apply to any sequence
> >> (given the fact that many sequences are indeed circular). This, and
> >> the fact that one may even want to search for it, would justify
> >> inclusion directly as a column in the biosequence table.
> >>
> >> Is_Taxon_Hidden is one of those attributes that BioSQL by design
> >> handles through attribute/value associations, that is, using ontology
> >> term associations that have a value (the term is the attribute name).
> >>
> >> However, there is no taxon_qualifier_value table in BioSQL, so in
> >> essence you are asking for adding that table.
> >>
> >> Does anybody else have ideas for taxon attributes for which this
> >> table may be used?
> >>
> >> I don't really favor a proliferation of 'localized' versions of
> >> BioSQL - this tends to defeat the purpose both of the rationale
> >> behind a standardized persistence interface, as well as the design of
> >> the schema for ultimate extensibility through weak typing and the use
> >> of controlled vocabularies.
> >>
> >> Any thoughts to this end welcome.
> >>
> >> 	-hilmar
> >>
> >> On Jul 3, 2006, at 1:55 PM, David Scott wrote:
> >>
> >>> sure hilmar-
> >>>
> >>> in the genbank taxonomy file - nodes.dmp:
> >>> ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt
> >>> there is a field:
> >>>
> >>> GenBank hidden flag (1 or 0)            -- 1 if name is suppressed
> >>> in GenBank entry lineage
> >>>
> >>> this field controls whether the level is included in the taxonomy
> >>> hierarchy when the genbank ORGANISM section is generated - but the
> >>> more general problem trying to be solved is:
> >>> o parse genbank entries
> >>> o store parsed entry in biosql
> >>> o pull parsed entry from biosql
> >>> o (re)create the genbank entry
> >>> o compare the recreated entry with the source document for
> >>> identity. well - ok - almost identical.
> >>>
> >>> there are several parameters missing from biosql to make this
> >>> possible. the general approach to a solution has been:
> >>> o alter the biosql table to add a new column (a sql ddl file)
> >>> o add a private get/set for the column in the biojavax object (a
> >>> java file)
> >>> o add the column to the biojavax hibernate o/r mapping (an xml file)
> >>>
> >>> to help others that might have the same objective, and to
> >>> accomodate those that don't wish these nonstandard columns  - it is
> >>> planned to release the o/r mapping files with the additional
> >>> columns/fields commented out - these xml files along with the java
> >>> files are checked out with cvs. it was not clear what to do with
> >>> the ddl files - and it would be helpful to have them reviewed - no
> >>> matter what is done with them.
> >>>
> >>> thanks for helping me - i just assumed you were late in responding
> >>> because it is summer - and, well - you were in the the south of
> >>> france soaking up the sun.
> >>>
> >>> looking to you for suggestions-
> >>> david
> >>>
> >>>
> >>> Hilmar Lapp wrote:
> >>>> Hi David, sorry for dropping (or rather, not ever picking up) the
> >>>> ball on this ... got lost in inbox stack.
> >>>>
> >>>> The earlier consensus was if I recall correctly to include
> >>>> is_circular as a biosequence attribute in the 1.1 version.
> >>>>
> >>>> isTaxonHidden is new to me and I don't even understand what it
> >>>> would mean. Can you elaborate?
> >>>>
> >>>>     -hilmar
> >>>>
> >>>> On Jun 21, 2006, at 11:19 AM, David Scott wrote:
> >>>>
> >>>>> biojavax is using hibernate to o/r map the biosql database to
> >>>>> biojavax
> >>>>> objects. biojavax is planning support in the biojavax objects for
> >>>>> fields
> >>>>> not directly supported in the biosql database (e.g. isCircular,
> >>>>> isTaxonHidden). in order to conform to the current biosql
> >>>>> database, the
> >>>>> default mapping file from biosql to biojavax will comment out the
> >>>>> unsupported fields (so the object fields will not be initialized)
> >>>>> and
> >>>>> the objects will default an appropriate conforming value (e.g.
> >>>>> false for
> >>>>> isCircular and isTaxonHidden). for users wishing to localize
> >>>>> biojavax:
> >>>>> the user would uncomment the mapping file and alter the database
> >>>>> tables.
> >>>>> altering the database would require running ddl on the existing
> >>>>> database
> >>>>> to create the new table columns. what is the best way to review
> >>>>> and then
> >>>>> distribute the alter/create ddl for users to localize their
> >>>>> database?
> >>>>> _______________________________________________
> >>>>> BioSQL-l mailing list
> >>>>> BioSQL-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
> >>>>>
> >>>>
> >>>> --===========================================================
> >>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >>>> ===========================================================
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>
> > -- 
> > Richard Holland (BioMart Team)
> > EMBL-EBI
> > Wellcome Trust Genome Campus
> > Hinxton
> > Cambridge CB10 1SD
> > UNITED KINGDOM
> > Tel: +44-(0)1223-494416
> >
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From pim.van.nierop at falw.vu.nl  Wed Jul  5 13:53:39 2006
From: pim.van.nierop at falw.vu.nl (Pim van Nierop)
Date: Wed, 05 Jul 2006 15:53:39 +0200
Subject: [BioSQL-l] Prolem with loading bioseqsql scheme
Message-ID: <85343C76-6149-4439-B410-4D04B642D567@falw.vu.nl>

Hello all,

I have just started out exploring using bioSQL in combination with PERL
scripting to run a local instance of GenBank on mySQL at my lab. I have
to appologize for my ignorance beforehand, as I do not know much about
mySQL.

I followed the instructions as provided on the BioPerl wiki page on how
to start using bioSQL with bioPerl. Unfortunately, I seem to get stuck
when loading my newly created  database named "bioseqdb" with
"biosqldb-mysql.sql" file.

I use this command:
> mysql -u root -p bioseqdb < c:\biosqldb-mysql.sql

This generates the following error:
ERROR 1005 (HY000) at line 39: Can't create table
'.\bioseqdb\biodatabase.frm' (errno: 121)

I looked on th einternet what the errorcode ERROR 1005 errno: 121 means.
It seems it has something to do with foreign keys, but I have no clue
how to act from here.

Could someone please explain what I am doing wrong?

Oh yeah, I use a windows XP system.

All the best,

Pim

--  
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
*-*-*-*-*-

       Pim van Nierop

       Department of Molecular and Cellular Neurobiology
       Faculty of Earth and Life Sciences
       Vrije Universiteit
       Amsterdam

       Tel. +31 (0)20 5987114
       Fax. +31 (0)20 5987112

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
*-*-*-*-*-

_______________________________________________
Open-Bio-l mailing list
Open-Bio-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/open-bio-l


From hlapp at gmx.net  Thu Jul  6 11:44:38 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 6 Jul 2006 07:44:38 -0400
Subject: [BioSQL-l] [Open-bio-l] [Fwd: Prolem with loading bioseqsql
	scheme]
In-Reply-To: <44ACD59C.3020604@falw.vu.nl>
References: <44ACD59C.3020604@falw.vu.nl>
Message-ID: <E70F87D6-4E67-4D02-B828-B5945C7DC7F9@gmx.net>

Hi Pim, I forwarded your email to biosql-l at lists.open-bio.org, which  
is where the BioSQL discussions take place. I wanted to respond  
yesterday but didn't get to respond to it.

The page to subscribe to biosql-l is at
http://obda.open-bio.org/mailman/listinfo/biosql-l

	-hilmar


On Jul 6, 2006, at 5:19 AM, Pim van Nierop wrote:

> I resend this message as I shipped it before my participation to this
> mailing list was confirmed. I am sorry if its a double post.
>
> -------- Original Message --------
> Subject: 	Prolem with loading bioseqsql scheme
> Date: 	Wed, 05 Jul 2006 15:53:39 +0200
> From: 	Pim van Nierop <pim.van.nierop at falw.vu.nl>
> To: 	open-bio-l at lists.open-bio.org
>
>
>
> Hello all,
>
> I have just started out exploring using bioSQL in combination with  
> PERL
> scripting to run a local instance of GenBank on mySQL at my lab. I  
> have
> to appologize for my ignorance beforehand, as I do not know much about
> mySQL.
>
> I followed the instructions as provided on the BioPerl wiki page on  
> how
> to start using bioSQL with bioPerl. Unfortunately, I seem to get stuck
> when loading my newly created  database named "bioseqdb" with
> "biosqldb-mysql.sql" file.
>
> I use this command:
>> mysql -u root -p bioseqdb < c:\biosqldb-mysql.sql
>
> This generates the following error:
> ERROR 1005 (HY000) at line 39: Can't create table
> '.\bioseqdb\biodatabase.frm' (errno: 121)
>
> I looked on th einternet what the errorcode ERROR 1005 errno: 121  
> means.
> It seems it has something to do with foreign keys, but I have no clue
> how to act from here.
>
> Could someone please explain what I am doing wrong?
>
> Oh yeah, I use a windows XP system.
>
> All the best,
>
> Pim
>
> -- 
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
> *-*-*-*-*-*-
>
>      Pim van Nierop
>
>      Department of Molecular and Cellular Neurobiology
>      Faculty of Earth and Life Sciences
>      Vrije Universiteit
>      Amsterdam
>
>      Tel. +31 (0)20 5987114
>      Fax. +31 (0)20 5987112
>
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
> *-*-*-*-*-*-
>
>
>
>
> -- 
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
> *-*-*-*-*-*-
>
>       Pim van Nierop
>
>       Department of Molecular and Cellular Neurobiology
>       Faculty of Earth and Life Sciences
>       Vrije Universiteit
>       Amsterdam
>
>       Tel. +31 (0)20 5987114
>       Fax. +31 (0)20 5987112
>
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
> *-*-*-*-*-*-
>
> _______________________________________________
> Open-Bio-l mailing list
> Open-Bio-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/open-bio-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From pim.van.nierop at falw.vu.nl  Sat Jul  8 11:19:04 2006
From: pim.van.nierop at falw.vu.nl (Pim van Nierop)
Date: Sat, 08 Jul 2006 13:19:04 +0200
Subject: [BioSQL-l]  Prolem with loading bioseqsql scheme
Message-ID: <44AF94A8.8030501@falw.vu.nl>

Hello all,

I have been experimenting myself a little and it turns out that the 
problem (InnoDB Error 1005 errno 121) occurs with mySQL 5.0, but not 
with mySQL 4.1.

I will continue to use 4.1 to create a bioseq-database instead. I guess 
the 5.0 version is bugged.

Greetz, Pim


From mark.schreiber at novartis.com  Mon Jul 10 03:03:10 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Mon, 10 Jul 2006 11:03:10 +0800
Subject: [BioSQL-l] null title and CRC
Message-ID: <OF612183F3.63299C76-ON482571A7.00103304-482571A7.0010C525@EU.novartis.net>

Hi -

We are having a problem in biojava parsing some genbank records that 
contain references with no title. These cannot have a CRC value which is 
required in BioSQL. If we make the title an empty string then we quickly 
get non-unique CRC numbers.

What does BioPerl do in these cases?

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910


From hlapp at gmx.net  Mon Jul 10 03:22:26 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 9 Jul 2006 23:22:26 -0400
Subject: [BioSQL-l] null title and CRC
In-Reply-To: <OF612183F3.63299C76-ON482571A7.00103304-482571A7.0010C525@EU.novartis.net>
References: <OF612183F3.63299C76-ON482571A7.00103304-482571A7.0010C525@EU.novartis.net>
Message-ID: <92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net>

The CRC for references uses the authors, title, and location  
attributes in Bioperl-db, and empty (or null) strings default to the  
string "<undef>".

If title is empty and authors and location do not distinguish two  
references, then why do you want to have two rows for those  
references? Basically, there are identical for all intents and  
purposes, or are they not?

	-hilmar

On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote:

> Hi -
>
> We are having a problem in biojava parsing some genbank records that
> contain references with no title. These cannot have a CRC value  
> which is
> required in BioSQL. If we make the title an empty string then we  
> quickly
> get non-unique CRC numbers.
>
> What does BioPerl do in these cases?
>
> - Mark
>
> Mark Schreiber
> Research Investigator (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax  +65 6722 2910
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mark.schreiber at novartis.com  Thu Jul 13 05:23:18 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Thu, 13 Jul 2006 13:23:18 +0800
Subject: [BioSQL-l] Abstracts and Full Text on References
Message-ID: <OFE521208C.0D5B6141-ON482571AA.001D4655-482571AA.001D9990@EU.novartis.net>

Hi -

As an enhancement for a future version of BioSQL it would be nice to have 
CLOB rows for abstract and full text (Full text might need to be a BLOB 
depending on format). Obviously they could both be null.

Alternatively they could be in another table linked to Reference. I don't 
know if it could be done via the term relationship method??

Any thoughts?

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910


From hlapp at gmx.net  Thu Jul 13 16:59:04 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 13 Jul 2006 12:59:04 -0400
Subject: [BioSQL-l] Abstracts and Full Text on References
In-Reply-To: <OFE521208C.0D5B6141-ON482571AA.001D4655-482571AA.001D9990@EU.novartis.net>
References: <OFE521208C.0D5B6141-ON482571AA.001D4655-482571AA.001D9990@EU.novartis.net>
Message-ID: <21289F28-309E-4A81-B326-E939838A5820@gmx.net>

Sounds reasonable to me. Attribute association wouldn't be desirable  
I think (it would only bloat and overload the value field).

The only thing I'd be concerned about is accumulating stuff that is  
not supported by the language bindings ... i.e., bioperl doesn't  
support this, and so there isn't a way for bioperl-db to do so  
either. What are the plans for Biojava?

Are any Biopython or Bioruby folks on this list? Any comments from  
those fronts?

	-hilmar

On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote:

> Hi -
>
> As an enhancement for a future version of BioSQL it would be nice  
> to have
> CLOB rows for abstract and full text (Full text might need to be a  
> BLOB
> depending on format). Obviously they could both be null.
>
> Alternatively they could be in another table linked to Reference. I  
> don't
> know if it could be done via the term relationship method??
>
> Any thoughts?
>
> - Mark
>
> Mark Schreiber
> Research Investigator (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax  +65 6722 2910
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mark.schreiber at novartis.com  Fri Jul 14 01:56:13 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 14 Jul 2006 09:56:13 +0800
Subject: [BioSQL-l] Abstracts and Full Text on References
Message-ID: <OF1C5F6CCB.3A1DB798-ON482571AB.000A4D90-482571AB.000AA3DF@EU.novartis.net>

Hello -

There are no specific plans for biojava although the Reference object 
could easily be modified to contain 

String getAbstract()
void setAbstract(String abs)
etc.

I wonder if the full text of an article should be a byte[] or BLOB or a 
String/ CLOB. Are people more likely to want to store a PDF (usually more 
available) or a parsed String?

- Mark


Hilmar Lapp <hlapp at gmx.net>
07/14/2006 12:59 AM

 
        To:     mark.schreiber at novartis.com
        cc:     biosql-l at open-bio.org
        Subject:        Re: [BioSQL-l] Abstracts and Full Text on References


Sounds reasonable to me. Attribute association wouldn't be desirable 
I think (it would only bloat and overload the value field).

The only thing I'd be concerned about is accumulating stuff that is 
not supported by the language bindings ... i.e., bioperl doesn't 
support this, and so there isn't a way for bioperl-db to do so 
either. What are the plans for Biojava?

Are any Biopython or Bioruby folks on this list? Any comments from 
those fronts?

                 -hilmar

On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote:

> Hi -
>
> As an enhancement for a future version of BioSQL it would be nice 
> to have
> CLOB rows for abstract and full text (Full text might need to be a 
> BLOB
> depending on format). Obviously they could both be null.
>
> Alternatively they could be in another table linked to Reference. I 
> don't
> know if it could be done via the term relationship method??
>
> Any thoughts?
>
> - Mark
>
> Mark Schreiber
> Research Investigator (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax  +65 6722 2910
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Fri Jul 14 11:24:19 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 14 Jul 2006 07:24:19 -0400
Subject: [BioSQL-l] Abstracts and Full Text on References
In-Reply-To: <1152864626.3943.61.camel@texas.ebi.ac.uk>
References: <OF1C5F6CCB.3A1DB798-ON482571AB.000A4D90-482571AB.000AA3DF@EU.novartis.net>
	<1152864626.3943.61.camel@texas.ebi.ac.uk>
Message-ID: <748F3120-1FD3-4DF8-A0D7-EF9EE0414A14@gmx.net>

Right. I like this. However, it also suggests to have an additional  
table. Who knows what other fields one will want to know for an  
abstract. Also, plenty of references will never have an abstract,  
e.g. automatic submissions, ontology term references etc.

	-hilmar

On Jul 14, 2006, at 4:10 AM, Richard Holland wrote:

> Make it a BLOB and add another column indicating the MIME type of the
> BLOB.
>
> 	BLOB abstract
> 	VARCHAR abstract_mime_type
>
> Then if you stored a PDF in it you could set abstract_mime_type to
> 'application/x-pdf', or if it was plain text, you could set the
> abstract_mime_type to 'text/plain'.
>
> cheers,
> Richard
>
> On Fri, 2006-07-14 at 09:56 +0800, mark.schreiber at novartis.com wrote:
>> Hello -
>>
>> There are no specific plans for biojava although the Reference object
>> could easily be modified to contain
>>
>> String getAbstract()
>> void setAbstract(String abs)
>> etc.
>>
>> I wonder if the full text of an article should be a byte[] or BLOB  
>> or a
>> String/ CLOB. Are people more likely to want to store a PDF  
>> (usually more
>> available) or a parsed String?
>>
>> - Mark
>>
>>
>>
>>
>>
>> Hilmar Lapp <hlapp at gmx.net>
>> 07/14/2006 12:59 AM
>>
>>
>>         To:     mark.schreiber at novartis.com
>>         cc:     biosql-l at open-bio.org
>>         Subject:        Re: [BioSQL-l] Abstracts and Full Text on  
>> References
>>
>>
>> Sounds reasonable to me. Attribute association wouldn't be desirable
>> I think (it would only bloat and overload the value field).
>>
>> The only thing I'd be concerned about is accumulating stuff that is
>> not supported by the language bindings ... i.e., bioperl doesn't
>> support this, and so there isn't a way for bioperl-db to do so
>> either. What are the plans for Biojava?
>>
>> Are any Biopython or Bioruby folks on this list? Any comments from
>> those fronts?
>>
>>                  -hilmar
>>
>> On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote:
>>
>>> Hi -
>>>
>>> As an enhancement for a future version of BioSQL it would be nice
>>> to have
>>> CLOB rows for abstract and full text (Full text might need to be a
>>> BLOB
>>> depending on format). Obviously they could both be null.
>>>
>>> Alternatively they could be in another table linked to Reference. I
>>> don't
>>> know if it could be done via the term relationship method??
>>>
>>> Any thoughts?
>>>
>>> - Mark
>>>
>>> Mark Schreiber
>>> Research Investigator (Bioinformatics)
>>>
>>> Novartis Institute for Tropical Diseases (NITD)
>>> 10 Biopolis Road
>>> #05-01 Chromos
>>> Singapore 138670
>>> www.nitd.novartis.com
>>>
>>> phone +65 6722 2973
>>> fax  +65 6722 2910
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>
>>
> -- 
> Richard Holland (BioMart Team)
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> UNITED KINGDOM
> Tel: +44-(0)1223-494416
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From david at autohandle.com  Fri Jul 14 17:48:50 2006
From: david at autohandle.com (David Scott)
Date: Fri, 14 Jul 2006 10:48:50 -0700
Subject: [BioSQL-l] null title and CRC
In-Reply-To: <92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net>
References: <OF612183F3.63299C76-ON482571A7.00103304-482571A7.0010C525@EU.novartis.net>
	<92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net>
Message-ID: <44B7D902.6040804@autohandle.com>

we are currently using "<undef>" in the crc calculation for the case 
where the title is empty (or null) - i can extend that for authors and 
location - what should we be storing the the table: "<undef>", empty, or 
null?

thanks-
david

p.s. fog for sale:
http://www.sfgate.com/liveviews/


Hilmar Lapp wrote:
> The CRC for references uses the authors, title, and location  
> attributes in Bioperl-db, and empty (or null) strings default to the  
> string "<undef>".
>
> If title is empty and authors and location do not distinguish two  
> references, then why do you want to have two rows for those  
> references? Basically, there are identical for all intents and  
> purposes, or are they not?
>
> 	-hilmar
>
> On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote:
>
>   
>> Hi -
>>
>> We are having a problem in biojava parsing some genbank records that
>> contain references with no title. These cannot have a CRC value  
>> which is
>> required in BioSQL. If we make the title an empty string then we  
>> quickly
>> get non-unique CRC numbers.
>>
>> What does BioPerl do in these cases?
>>
>> - Mark
>>
>> Mark Schreiber
>> Research Investigator (Bioinformatics)
>>
>> Novartis Institute for Tropical Diseases (NITD)
>> 10 Biopolis Road
>> #05-01 Chromos
>> Singapore 138670
>> www.nitd.novartis.com
>>
>> phone +65 6722 2973
>> fax  +65 6722 2910
>>
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
>>     
>
>   


From hlapp at gmx.net  Fri Jul 14 18:31:44 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 14 Jul 2006 14:31:44 -0400
Subject: [BioSQL-l] null title and CRC
In-Reply-To: <44B7D902.6040804@autohandle.com>
References: <OF612183F3.63299C76-ON482571A7.00103304-482571A7.0010C525@EU.novartis.net>
	<92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net>
	<44B7D902.6040804@autohandle.com>
Message-ID: <AC06A099-46F8-40F9-A90D-FD7A5EF49087@gmx.net>

In the table you store the value of the attribute, not a default that  
substitutes for it in some calculation. I.e., either null or an empty  
string, depending on what the value is. (in Oracle an empty string is  
treated as null.)

	-hilmar
On Jul 14, 2006, at 1:48 PM, David Scott wrote:

> we are currently using "<undef>" in the crc calculation for the  
> case where the title is empty (or null) - i can extend that for  
> authors and location - what should we be storing the the table:  
> "<undef>", empty, or null?
>
> thanks-
> david
>
> p.s. fog for sale:
> http://www.sfgate.com/liveviews/
>
>
> Hilmar Lapp wrote:
>> The CRC for references uses the authors, title, and location
>> attributes in Bioperl-db, and empty (or null) strings default to the
>> string "<undef>".
>>
>> If title is empty and authors and location do not distinguish two
>> references, then why do you want to have two rows for those
>> references? Basically, there are identical for all intents and
>> purposes, or are they not?
>>
>> 	-hilmar
>>
>> On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote:
>>
>>
>>> Hi -
>>>
>>> We are having a problem in biojava parsing some genbank records that
>>> contain references with no title. These cannot have a CRC value
>>> which is
>>> required in BioSQL. If we make the title an empty string then we
>>> quickly
>>> get non-unique CRC numbers.
>>>
>>> What does BioPerl do in these cases?
>>>
>>> - Mark
>>>
>>> Mark Schreiber
>>> Research Investigator (Bioinformatics)
>>>
>>> Novartis Institute for Tropical Diseases (NITD)
>>> 10 Biopolis Road
>>> #05-01 Chromos
>>> Singapore 138670
>>> www.nitd.novartis.com
>>>
>>> phone +65 6722 2973
>>> fax  +65 6722 2910
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>
>>>
>>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From david at autohandle.com  Fri Jul 14 18:51:18 2006
From: david at autohandle.com (David Scott)
Date: Fri, 14 Jul 2006 11:51:18 -0700
Subject: [BioSQL-l] null title and CRC
In-Reply-To: <AC06A099-46F8-40F9-A90D-FD7A5EF49087@gmx.net>
References: <OF612183F3.63299C76-ON482571A7.00103304-482571A7.0010C525@EU.novartis.net>
	<92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net>
	<44B7D902.6040804@autohandle.com>
	<AC06A099-46F8-40F9-A90D-FD7A5EF49087@gmx.net>
Message-ID: <44B7E7A6.9040300@autohandle.com>

ok, then - in the case of genbank: i'm going to try to treat missing 
titles as null - store them in the object as null - and provide them to 
the hibernate o/r mapping as null - presumably they will go into the 
table as null.

best-

Hilmar Lapp wrote:
> In the table you store the value of the attribute, not a default that 
> substitutes for it in some calculation. I.e., either null or an empty 
> string, depending on what the value is. (in Oracle an empty string is 
> treated as null.)
>
> -hilmar
> On Jul 14, 2006, at 1:48 PM, David Scott wrote:
>
>> we are currently using "<undef>" in the crc calculation for the case 
>> where the title is empty (or null) - i can extend that for authors 
>> and location - what should we be storing the the table: "<undef>", 
>> empty, or null?
>>
>> thanks-
>> david
>>
>> p.s. fog for sale:
>> http://www.sfgate.com/liveviews/
>>
>>
>> Hilmar Lapp wrote:
>>> The CRC for references uses the authors, title, and location  
>>> attributes in Bioperl-db, and empty (or null) strings default to the  
>>> string "".
>>>
>>> If title is empty and authors and location do not distinguish two  
>>> references, then why do you want to have two rows for those  
>>> references? Basically, there are identical for all intents and  
>>> purposes, or are they not?
>>>
>>> 	-hilmar
>>>
>>> On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote:
>>>
>>>   
>>>> Hi -
>>>>
>>>> We are having a problem in biojava parsing some genbank records that
>>>> contain references with no title. These cannot have a CRC value  
>>>> which is
>>>> required in BioSQL. If we make the title an empty string then we  
>>>> quickly
>>>> get non-unique CRC numbers.
>>>>
>>>> What does BioPerl do in these cases?
>>>>
>>>> - Mark
>>>>
>>>> Mark Schreiber
>>>> Research Investigator (Bioinformatics)
>>>>
>>>> Novartis Institute for Tropical Diseases (NITD)
>>>> 10 Biopolis Road
>>>> #05-01 Chromos
>>>> Singapore 138670
>>>> www.nitd.novartis.com
>>>>
>>>> phone +65 6722 2973
>>>> fax  +65 6722 2910
>>>>
>>>> _______________________________________________
>>>> BioSQL-l mailing list
>>>> BioSQL-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>>
>>>>     
>>>   
>>
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>


From richard.holland at ebi.ac.uk  Thu Jul 13 08:14:55 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Thu, 13 Jul 2006 09:14:55 +0100
Subject: [BioSQL-l] Abstracts and Full Text on References
In-Reply-To: <OFE521208C.0D5B6141-ON482571AA.001D4655-482571AA.001D9990@EU.novartis.net>
References: <OFE521208C.0D5B6141-ON482571AA.001D4655-482571AA.001D9990@EU.novartis.net>
Message-ID: <1152778496.3943.51.camel@texas.ebi.ac.uk>

I'd like to enhance that request by asking for individual author records
instead of a single string, and a flag indicating the type of
publication - eg. journal, book, article, conference paper, etc.

On Thu, 2006-07-13 at 13:23 +0800, mark.schreiber at novartis.com wrote:
> Hi -
> 
> As an enhancement for a future version of BioSQL it would be nice to have 
> CLOB rows for abstract and full text (Full text might need to be a BLOB 
> depending on format). Obviously they could both be null.
> 
> Alternatively they could be in another table linked to Reference. I don't 
> know if it could be done via the term relationship method??
> 
> Any thoughts?
> 
> - Mark
> 
> Mark Schreiber
> Research Investigator (Bioinformatics)
> 
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
> 
> phone +65 6722 2973
> fax  +65 6722 2910
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From richard.holland at ebi.ac.uk  Fri Jul 14 08:10:25 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Fri, 14 Jul 2006 09:10:25 +0100
Subject: [BioSQL-l] Abstracts and Full Text on References
In-Reply-To: <OF1C5F6CCB.3A1DB798-ON482571AB.000A4D90-482571AB.000AA3DF@EU.novartis.net>
References: <OF1C5F6CCB.3A1DB798-ON482571AB.000A4D90-482571AB.000AA3DF@EU.novartis.net>
Message-ID: <1152864626.3943.61.camel@texas.ebi.ac.uk>

Make it a BLOB and add another column indicating the MIME type of the
BLOB.

	BLOB abstract
	VARCHAR abstract_mime_type

Then if you stored a PDF in it you could set abstract_mime_type to
'application/x-pdf', or if it was plain text, you could set the
abstract_mime_type to 'text/plain'.

cheers,
Richard

On Fri, 2006-07-14 at 09:56 +0800, mark.schreiber at novartis.com wrote:
> Hello -
> 
> There are no specific plans for biojava although the Reference object 
> could easily be modified to contain 
> 
> String getAbstract()
> void setAbstract(String abs)
> etc.
> 
> I wonder if the full text of an article should be a byte[] or BLOB or a 
> String/ CLOB. Are people more likely to want to store a PDF (usually more 
> available) or a parsed String?
> 
> - Mark
> 
> 
> 
> 
> 
> Hilmar Lapp <hlapp at gmx.net>
> 07/14/2006 12:59 AM
> 
>  
>         To:     mark.schreiber at novartis.com
>         cc:     biosql-l at open-bio.org
>         Subject:        Re: [BioSQL-l] Abstracts and Full Text on References
> 
> 
> Sounds reasonable to me. Attribute association wouldn't be desirable 
> I think (it would only bloat and overload the value field).
> 
> The only thing I'd be concerned about is accumulating stuff that is 
> not supported by the language bindings ... i.e., bioperl doesn't 
> support this, and so there isn't a way for bioperl-db to do so 
> either. What are the plans for Biojava?
> 
> Are any Biopython or Bioruby folks on this list? Any comments from 
> those fronts?
> 
>                  -hilmar
> 
> On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote:
> 
> > Hi -
> >
> > As an enhancement for a future version of BioSQL it would be nice 
> > to have
> > CLOB rows for abstract and full text (Full text might need to be a 
> > BLOB
> > depending on format). Obviously they could both be null.
> >
> > Alternatively they could be in another table linked to Reference. I 
> > don't
> > know if it could be done via the term relationship method??
> >
> > Any thoughts?
> >
> > - Mark
> >
> > Mark Schreiber
> > Research Investigator (Bioinformatics)
> >
> > Novartis Institute for Tropical Diseases (NITD)
> > 10 Biopolis Road
> > #05-01 Chromos
> > Singapore 138670
> > www.nitd.novartis.com
> >
> > phone +65 6722 2973
> > fax  +65 6722 2910
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biosql-l
> >
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From richard.holland at ebi.ac.uk  Mon Jul 17 08:57:55 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Mon, 17 Jul 2006 09:57:55 +0100
Subject: [BioSQL-l] null title and CRC
In-Reply-To: <AC06A099-46F8-40F9-A90D-FD7A5EF49087@gmx.net>
References: <OF612183F3.63299C76-ON482571A7.00103304-482571A7.0010C525@EU.novartis.net>
	<92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net>
	<44B7D902.6040804@autohandle.com>
	<AC06A099-46F8-40F9-A90D-FD7A5EF49087@gmx.net>
Message-ID: <1153126675.3957.17.camel@texas.ebi.ac.uk>

Sounds good.

cheers,
Richard

On Fri, 2006-07-14 at 14:31 -0400, Hilmar Lapp wrote:
> In the table you store the value of the attribute, not a default that
> substitutes for it in some calculation. I.e., either null or an empty
> string, depending on what the value is. (in Oracle an empty string is
> treated as null.)
> 
> 
> -hilmar
> On Jul 14, 2006, at 1:48 PM, David Scott wrote:
> 
> > we are currently using "<undef>" in the crc calculation for the case
> > where the title is empty (or null) - i can extend that for authors
> > and location - what should we be storing the the table: "<undef>",
> > empty, or null?
> > 
> > thanks-
> > david
> > 
> > p.s. fog for sale:
> > http://www.sfgate.com/liveviews/
> > 
> > 
> > Hilmar Lapp wrote: 
> > > The CRC for references uses the authors, title, and location  
> > > attributes in Bioperl-db, and empty (or null) strings default to the  
> > > string "".
> > > 
> > > If title is empty and authors and location do not distinguish two  
> > > references, then why do you want to have two rows for those  
> > > references? Basically, there are identical for all intents and  
> > > purposes, or are they not?
> > > 
> > > 	-hilmar
> > > 
> > > On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote:
> > > 
> > >   
> > > > Hi -
> > > > 
> > > > We are having a problem in biojava parsing some genbank records that
> > > > contain references with no title. These cannot have a CRC value  
> > > > which is
> > > > required in BioSQL. If we make the title an empty string then we  
> > > > quickly
> > > > get non-unique CRC numbers.
> > > > 
> > > > What does BioPerl do in these cases?
> > > > 
> > > > - Mark
> > > > 
> > > > Mark Schreiber
> > > > Research Investigator (Bioinformatics)
> > > > 
> > > > Novartis Institute for Tropical Diseases (NITD)
> > > > 10 Biopolis Road
> > > > #05-01 Chromos
> > > > Singapore 138670
> > > > www.nitd.novartis.com
> > > > 
> > > > phone +65 6722 2973
> > > > fax  +65 6722 2910
> > > > 
> > > > _______________________________________________
> > > > BioSQL-l mailing list
> > > > BioSQL-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/biosql-l
> > > > 
> > > >     
> > >   
> > 
> 
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> 
> 
> 
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From mark.schreiber at novartis.com  Tue Jul 18 20:41:34 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Wed, 19 Jul 2006 04:41:34 +0800
Subject: [BioSQL-l] Abstracts and Full Text on References
Message-ID: <OF9DAFEBD1.5D661206-ON482571AF.00719004-482571AF.0071AB9B@EU.novartis.net>

Another table is probably best.

Is there a working version of BioSQL 1.1 this can be added to?

- Mark


Hilmar Lapp <hlapp at gmx.net>
07/14/2006 07:24 PM

 
        To:     Richard Holland <richard.holland at ebi.ac.uk>
        cc:     Mark Schreiber <mark.schreiber at novartis.com>, biosql-l at open-bio.org
        Subject:        Re: [BioSQL-l] Abstracts and Full Text on References


Right. I like this. However, it also suggests to have an additional 
table. Who knows what other fields one will want to know for an 
abstract. Also, plenty of references will never have an abstract, 
e.g. automatic submissions, ontology term references etc.

                 -hilmar

On Jul 14, 2006, at 4:10 AM, Richard Holland wrote:

> Make it a BLOB and add another column indicating the MIME type of the
> BLOB.
>
>                BLOB abstract
>                VARCHAR abstract_mime_type
>
> Then if you stored a PDF in it you could set abstract_mime_type to
> 'application/x-pdf', or if it was plain text, you could set the
> abstract_mime_type to 'text/plain'.
>
> cheers,
> Richard
>
> On Fri, 2006-07-14 at 09:56 +0800, mark.schreiber at novartis.com wrote:
>> Hello -
>>
>> There are no specific plans for biojava although the Reference object
>> could easily be modified to contain
>>
>> String getAbstract()
>> void setAbstract(String abs)
>> etc.
>>
>> I wonder if the full text of an article should be a byte[] or BLOB 
>> or a
>> String/ CLOB. Are people more likely to want to store a PDF 
>> (usually more
>> available) or a parsed String?
>>
>> - Mark
>>
>>
>>
>>
>>
>> Hilmar Lapp <hlapp at gmx.net>
>> 07/14/2006 12:59 AM
>>
>>
>>         To:     mark.schreiber at novartis.com
>>         cc:     biosql-l at open-bio.org
>>         Subject:        Re: [BioSQL-l] Abstracts and Full Text on 
>> References
>>
>>
>> Sounds reasonable to me. Attribute association wouldn't be desirable
>> I think (it would only bloat and overload the value field).
>>
>> The only thing I'd be concerned about is accumulating stuff that is
>> not supported by the language bindings ... i.e., bioperl doesn't
>> support this, and so there isn't a way for bioperl-db to do so
>> either. What are the plans for Biojava?
>>
>> Are any Biopython or Bioruby folks on this list? Any comments from
>> those fronts?
>>
>>                  -hilmar
>>
>> On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote:
>>
>>> Hi -
>>>
>>> As an enhancement for a future version of BioSQL it would be nice
>>> to have
>>> CLOB rows for abstract and full text (Full text might need to be a
>>> BLOB
>>> depending on format). Obviously they could both be null.
>>>
>>> Alternatively they could be in another table linked to Reference. I
>>> don't
>>> know if it could be done via the term relationship method??
>>>
>>> Any thoughts?
>>>
>>> - Mark
>>>
>>> Mark Schreiber
>>> Research Investigator (Bioinformatics)
>>>
>>> Novartis Institute for Tropical Diseases (NITD)
>>> 10 Biopolis Road
>>> #05-01 Chromos
>>> Singapore 138670
>>> www.nitd.novartis.com
>>>
>>> phone +65 6722 2973
>>> fax  +65 6722 2910
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>
>>
> -- 
> Richard Holland (BioMart Team)
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> UNITED KINGDOM
> Tel: +44-(0)1223-494416
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Tue Jul 18 20:50:25 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 18 Jul 2006 16:50:25 -0400
Subject: [BioSQL-l] Abstracts and Full Text on References
In-Reply-To: <OF9DAFEBD1.5D661206-ON482571AF.00719004-482571AF.0071AB9B@EU.novartis.net>
References: <OF9DAFEBD1.5D661206-ON482571AF.00719004-482571AF.0071AB9B@EU.novartis.net>
Message-ID: <99FEA1E7-8540-46DE-8025-9F34D8026D0C@gmx.net>

Yes and no. I was working on one at GNF. I'll have to create this in  
the repository.

	-hilmar

On Jul 18, 2006, at 4:41 PM, mark.schreiber at novartis.com wrote:

> Another table is probably best.
>
> Is there a working version of BioSQL 1.1 this can be added to?
>
> - Mark
>
>
>
>
>
> Hilmar Lapp <hlapp at gmx.net>
> 07/14/2006 07:24 PM
>
>
>         To:     Richard Holland <richard.holland at ebi.ac.uk>
>         cc:     Mark Schreiber <mark.schreiber at novartis.com>,  
> biosql-l at open-bio.org
>         Subject:        Re: [BioSQL-l] Abstracts and Full Text on  
> References
>
>
> Right. I like this. However, it also suggests to have an additional
> table. Who knows what other fields one will want to know for an
> abstract. Also, plenty of references will never have an abstract,
> e.g. automatic submissions, ontology term references etc.
>
>                  -hilmar
>
> On Jul 14, 2006, at 4:10 AM, Richard Holland wrote:
>
>> Make it a BLOB and add another column indicating the MIME type of the
>> BLOB.
>>
>>                BLOB abstract
>>                VARCHAR abstract_mime_type
>>
>> Then if you stored a PDF in it you could set abstract_mime_type to
>> 'application/x-pdf', or if it was plain text, you could set the
>> abstract_mime_type to 'text/plain'.
>>
>> cheers,
>> Richard
>>
>> On Fri, 2006-07-14 at 09:56 +0800, mark.schreiber at novartis.com wrote:
>>> Hello -
>>>
>>> There are no specific plans for biojava although the Reference  
>>> object
>>> could easily be modified to contain
>>>
>>> String getAbstract()
>>> void setAbstract(String abs)
>>> etc.
>>>
>>> I wonder if the full text of an article should be a byte[] or BLOB
>>> or a
>>> String/ CLOB. Are people more likely to want to store a PDF
>>> (usually more
>>> available) or a parsed String?
>>>
>>> - Mark
>>>
>>>
>>>
>>>
>>>
>>> Hilmar Lapp <hlapp at gmx.net>
>>> 07/14/2006 12:59 AM
>>>
>>>
>>>         To:     mark.schreiber at novartis.com
>>>         cc:     biosql-l at open-bio.org
>>>         Subject:        Re: [BioSQL-l] Abstracts and Full Text on
>>> References
>>>
>>>
>>> Sounds reasonable to me. Attribute association wouldn't be desirable
>>> I think (it would only bloat and overload the value field).
>>>
>>> The only thing I'd be concerned about is accumulating stuff that is
>>> not supported by the language bindings ... i.e., bioperl doesn't
>>> support this, and so there isn't a way for bioperl-db to do so
>>> either. What are the plans for Biojava?
>>>
>>> Are any Biopython or Bioruby folks on this list? Any comments from
>>> those fronts?
>>>
>>>                  -hilmar
>>>
>>> On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote:
>>>
>>>> Hi -
>>>>
>>>> As an enhancement for a future version of BioSQL it would be nice
>>>> to have
>>>> CLOB rows for abstract and full text (Full text might need to be a
>>>> BLOB
>>>> depending on format). Obviously they could both be null.
>>>>
>>>> Alternatively they could be in another table linked to Reference. I
>>>> don't
>>>> know if it could be done via the term relationship method??
>>>>
>>>> Any thoughts?
>>>>
>>>> - Mark
>>>>
>>>> Mark Schreiber
>>>> Research Investigator (Bioinformatics)
>>>>
>>>> Novartis Institute for Tropical Diseases (NITD)
>>>> 10 Biopolis Road
>>>> #05-01 Chromos
>>>> Singapore 138670
>>>> www.nitd.novartis.com
>>>>
>>>> phone +65 6722 2973
>>>> fax  +65 6722 2910
>>>>
>>>> _______________________________________________
>>>> BioSQL-l mailing list
>>>> BioSQL-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>>
>>>
>> -- 
>> Richard Holland (BioMart Team)
>> EMBL-EBI
>> Wellcome Trust Genome Campus
>> Hinxton
>> Cambridge CB10 1SD
>> UNITED KINGDOM
>> Tel: +44-(0)1223-494416
>>
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================