From hlapp at gmx.net Sun Jul 2 09:20:53 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 2 Jul 2006 09:20:53 -0400 Subject: [BioSQL-l] BioSQL Schema problem In-Reply-To: <44A275E5.2040104@librophyt.com> References: <44A275E5.2040104@librophyt.com> Message-ID: <2F4506F2-84FC-412A-9BC5-8E3C92E086C8@gmx.net> The biosqldb-views-pg.sql is badly outdated I notice. Sorry about that. Are you sure you need it? (Most applications will not.) I probably shouldn't just delete but try to update it. The offending seqfeature_key table has long been removed from the schema and you can safely delete the view definition from the file, but there may be a few more errors given its age. I need to investigate the script's failure on inserting nodes - this is assuming that you put the file by hand in the right place. Apparently there is an alphanumerical value that gets parsed as the taxon id (which must be numeric indeed). --download is a switch and hence does not take any arguments, -- download 0 does ask to download, which is why you see the error. I don't know why the download fails, maybe there's a problem with extended ftp mode (EPSV/EPRT commands) but I don't know off hand how you disable them in Net::FTP. -hilmar On Jun 28, 2006, at 8:28 AM, Samuel Thoraval wrote: > > Hello, > > I am new to biosql and I have 2 problems installing last CVS version > (*1.4.2.1*, /Sun Jun 16)/: > - running biosqldb-views-pg.sql after biosqldb-pg.sql gives errors, > the > first one being: > psql:biosqldb-views-pg.sql:6: ERROR: relation "seqfeature_key" > does not > exist > - running load_ncbi_taxonomy.pl with > ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz > (the script > download option set to 1 doesn't download anything) gives the > following > error : > ---------------------------------------------------------------------- > ------------------------------------------------------------------ > ./scripts/load_ncbi_taxonomy.pl --dbname bioseqdb --driver Pg -- > download 0 > gunzip: taxdata/taxdump.tar.gz: No such file or directory > tar: taxdump.tar: ne peut open: Aucun fichier ou r?pertoire de ce type > tar: Erreur non r?cup?rable: fin de l'ex?cution imm?diate > Loading NCBI taxon database in taxdata: > ... retrieving all taxon nodes in the database > ... reading in taxon nodes from nodes.dmp > ... insert / update / delete taxon nodes > failed to insert node (1;1;1;no rank;1;0): ERROR: column > "taxon_id" is > of type integer but expression is of type character varying > HINT: You will need to rewrite or cast the expression. > ---------------------------------------------------------------------- > ------------------------------------------------------------------ > > The schema expected from the biosqldb-views-pg.sql or taxonomy dump > file does not match the one in biosqldb-pg.sql. > > > Best regards, > > -- > Samuel Thoraval > LIBROPHYT, Bioinformatique > Centre de Cadarache > B?timent 185, DEVM > 13108 St Paul-Lez-Durance > France > T?l: +33 442 574 799 > Fax: +33 442 574 439 > e-mail : samuel.thoraval at librophyt.com > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Jul 2 13:44:21 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 2 Jul 2006 13:44:21 -0400 Subject: [BioSQL-l] Versioning of features In-Reply-To: References: Message-ID: <39FD8AB6-26F2-40B6-A3BC-42A42A42A06F@gmx.net> It should be straightforward. In essence you control it through the source type which as you say is an ontology term. You can for instance include the software version in the source term. This is what I did for the BLAT-derived genome mappings in SymAtlas (which runs on top of BioSQL). This wouldn't even necessitate to 'obsolete' a previous source term. You'd only have to do that if you wanted to have the exact same name for the source term, and have old and new 'version' term in the same ontology. I probably wouldn't be in much favor of doing so because then you don't have an explicit version anywhere. However, of course if you include it into the name then if compared by name two source types appear different even though they are effectively the same (e.g., same algorithm), just different versions. You can take care of that though by introducing 'parent' source (e.g. algorithm) terms that would have the versioned ones as children. Let me know if this doesn't help. -hilmar On Jun 30, 2006, at 6:16 PM, Sandie Peters wrote: > In the BioSQL v. 1.0 schema overview, the author briefly mentions > the possibility of feature set versioning using "dated" source > ontology terms. Has anyone tried this or any other versioning > methods with seqfeatures in BioSQL? > > Thanks, > Sandie Peters > Vollum Institute/OHSU > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From darin.london at duke.edu Mon Jul 3 08:41:33 2006 From: darin.london at duke.edu (Darin London) Date: Mon, 03 Jul 2006 08:41:33 -0400 Subject: [BioSQL-l] Call For Birds of a Feather Suggestions Message-ID: <44A9107D.2050304@duke.edu> The BOSC organizing comittee is currently seeking suggestions for Birds of a Feather meeting ideas. Birds of a Feather meetings are one of the more popular activities at BOSC, occurring at the end of each days session. These are free-form meetings organized by the attendees themselves to discuss one or a few topics of interest in greater detail. BOF?s have been formed to allow developers and users of individual OBF software to meet each other face-to-face to discuss the project, or to discuss completely new ideas, and even start new software development projects. These meetings offer a unique opportunity for individuals to explore more about the activities of the various Open Source Projects, and, in some cases, even take an active role influencing the future of Open Source Software development. If you would like to create a BOF, just sign up for a wiki account, login, and edit the BOSC 2006 Birds of a Feather page. From hlapp at gmx.net Mon Jul 3 13:04:48 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 3 Jul 2006 13:04:48 -0400 Subject: [BioSQL-l] a biosql/biojavax localization question In-Reply-To: <44996380.6060300@autohandle.com> References: <44996380.6060300@autohandle.com> Message-ID: Hi David, sorry for dropping (or rather, not ever picking up) the ball on this ... got lost in inbox stack. The earlier consensus was if I recall correctly to include is_circular as a biosequence attribute in the 1.1 version. isTaxonHidden is new to me and I don't even understand what it would mean. Can you elaborate? -hilmar On Jun 21, 2006, at 11:19 AM, David Scott wrote: > biojavax is using hibernate to o/r map the biosql database to biojavax > objects. biojavax is planning support in the biojavax objects for > fields > not directly supported in the biosql database (e.g. isCircular, > isTaxonHidden). in order to conform to the current biosql database, > the > default mapping file from biosql to biojavax will comment out the > unsupported fields (so the object fields will not be initialized) and > the objects will default an appropriate conforming value (e.g. > false for > isCircular and isTaxonHidden). for users wishing to localize biojavax: > the user would uncomment the mapping file and alter the database > tables. > altering the database would require running ddl on the existing > database > to create the new table columns. what is the best way to review and > then > distribute the alter/create ddl for users to localize their database? > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Mon Jul 3 14:07:10 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 3 Jul 2006 14:07:10 -0400 Subject: [BioSQL-l] a biosql/biojavax localization question In-Reply-To: <44A95A2E.8000203@autohandle.com> References: <44996380.6060300@autohandle.com> <44A95A2E.8000203@autohandle.com> Message-ID: Hi David, I wish I were in the south of France soaking up sun ... although there is no shortage of sun (or heat for that matter, and throw humidity in there too) where I am. Is_Circular is a general attribute that will apply to any sequence (given the fact that many sequences are indeed circular). This, and the fact that one may even want to search for it, would justify inclusion directly as a column in the biosequence table. Is_Taxon_Hidden is one of those attributes that BioSQL by design handles through attribute/value associations, that is, using ontology term associations that have a value (the term is the attribute name). However, there is no taxon_qualifier_value table in BioSQL, so in essence you are asking for adding that table. Does anybody else have ideas for taxon attributes for which this table may be used? I don't really favor a proliferation of 'localized' versions of BioSQL - this tends to defeat the purpose both of the rationale behind a standardized persistence interface, as well as the design of the schema for ultimate extensibility through weak typing and the use of controlled vocabularies. Any thoughts to this end welcome. -hilmar On Jul 3, 2006, at 1:55 PM, David Scott wrote: > sure hilmar- > > in the genbank taxonomy file - nodes.dmp: > ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt > there is a field: > > GenBank hidden flag (1 or 0) -- 1 if name is suppressed > in GenBank entry lineage > > this field controls whether the level is included in the taxonomy > hierarchy when the genbank ORGANISM section is generated - but the > more general problem trying to be solved is: > o parse genbank entries > o store parsed entry in biosql > o pull parsed entry from biosql > o (re)create the genbank entry > o compare the recreated entry with the source document for > identity. well - ok - almost identical. > > there are several parameters missing from biosql to make this > possible. the general approach to a solution has been: > o alter the biosql table to add a new column (a sql ddl file) > o add a private get/set for the column in the biojavax object (a > java file) > o add the column to the biojavax hibernate o/r mapping (an xml file) > > to help others that might have the same objective, and to > accomodate those that don't wish these nonstandard columns - it is > planned to release the o/r mapping files with the additional > columns/fields commented out - these xml files along with the java > files are checked out with cvs. it was not clear what to do with > the ddl files - and it would be helpful to have them reviewed - no > matter what is done with them. > > thanks for helping me - i just assumed you were late in responding > because it is summer - and, well - you were in the the south of > france soaking up the sun. > > looking to you for suggestions- > david > > > Hilmar Lapp wrote: >> Hi David, sorry for dropping (or rather, not ever picking up) the >> ball on this ... got lost in inbox stack. >> >> The earlier consensus was if I recall correctly to include >> is_circular as a biosequence attribute in the 1.1 version. >> >> isTaxonHidden is new to me and I don't even understand what it >> would mean. Can you elaborate? >> >> -hilmar >> >> On Jun 21, 2006, at 11:19 AM, David Scott wrote: >> >>> biojavax is using hibernate to o/r map the biosql database to >>> biojavax >>> objects. biojavax is planning support in the biojavax objects for >>> fields >>> not directly supported in the biosql database (e.g. isCircular, >>> isTaxonHidden). in order to conform to the current biosql >>> database, the >>> default mapping file from biosql to biojavax will comment out the >>> unsupported fields (so the object fields will not be initialized) >>> and >>> the objects will default an appropriate conforming value (e.g. >>> false for >>> isCircular and isTaxonHidden). for users wishing to localize >>> biojavax: >>> the user would uncomment the mapping file and alter the database >>> tables. >>> altering the database would require running ddl on the existing >>> database >>> to create the new table columns. what is the best way to review >>> and then >>> distribute the alter/create ddl for users to localize their >>> database? >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >> >> --=========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> >> > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From david at autohandle.com Mon Jul 3 13:55:58 2006 From: david at autohandle.com (David Scott) Date: Mon, 03 Jul 2006 10:55:58 -0700 Subject: [BioSQL-l] a biosql/biojavax localization question In-Reply-To: References: <44996380.6060300@autohandle.com> Message-ID: <44A95A2E.8000203@autohandle.com> sure hilmar- in the genbank taxonomy file - nodes.dmp: ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt there is a field: GenBank hidden flag (1 or 0) -- 1 if name is suppressed in GenBank entry lineage this field controls whether the level is included in the taxonomy hierarchy when the genbank ORGANISM section is generated - but the more general problem trying to be solved is: o parse genbank entries o store parsed entry in biosql o pull parsed entry from biosql o (re)create the genbank entry o compare the recreated entry with the source document for identity. well - ok - almost identical. there are several parameters missing from biosql to make this possible. the general approach to a solution has been: o alter the biosql table to add a new column (a sql ddl file) o add a private get/set for the column in the biojavax object (a java file) o add the column to the biojavax hibernate o/r mapping (an xml file) to help others that might have the same objective, and to accomodate those that don't wish these nonstandard columns - it is planned to release the o/r mapping files with the additional columns/fields commented out - these xml files along with the java files are checked out with cvs. it was not clear what to do with the ddl files - and it would be helpful to have them reviewed - no matter what is done with them. thanks for helping me - i just assumed you were late in responding because it is summer - and, well - you were in the the south of france soaking up the sun. looking to you for suggestions- david Hilmar Lapp wrote: > Hi David, sorry for dropping (or rather, not ever picking up) the ball > on this ... got lost in inbox stack. > > The earlier consensus was if I recall correctly to include is_circular > as a biosequence attribute in the 1.1 version. > > isTaxonHidden is new to me and I don't even understand what it would > mean. Can you elaborate? > > -hilmar > > On Jun 21, 2006, at 11:19 AM, David Scott wrote: > >> biojavax is using hibernate to o/r map the biosql database to biojavax >> objects. biojavax is planning support in the biojavax objects for fields >> not directly supported in the biosql database (e.g. isCircular, >> isTaxonHidden). in order to conform to the current biosql database, the >> default mapping file from biosql to biojavax will comment out the >> unsupported fields (so the object fields will not be initialized) and >> the objects will default an appropriate conforming value (e.g. false for >> isCircular and isTaxonHidden). for users wishing to localize biojavax: >> the user would uncomment the mapping file and alter the database tables. >> altering the database would require running ddl on the existing database >> to create the new table columns. what is the best way to review and then >> distribute the alter/create ddl for users to localize their database? >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > > --=========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > From mark.schreiber at novartis.com Tue Jul 4 01:48:43 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 4 Jul 2006 13:48:43 +0800 Subject: [BioSQL-l] a biosql/biojavax localization question Message-ID: >Is_Circular is a general attribute that will apply to any sequence >(given the fact that many sequences are indeed circular). This, and >the fact that one may even want to search for it, would justify >inclusion directly as a column in the biosequence table. > >Is_Taxon_Hidden is one of those attributes that BioSQL by design >handles through attribute/value associations, that is, using ontology >term associations that have a value (the term is the attribute name). > >However, there is no taxon_qualifier_value table in BioSQL, so in >essence you are asking for adding that table. > >Does anybody else have ideas for taxon attributes for which this >table may be used? A taxon_qualifier_value table would be potentially useful. One may want to have conflicting taxa (taxonomists never agree) that could be differentiated by use of a qualifier. The hidden attribute could also be one. >I don't really favor a proliferation of 'localized' versions of >BioSQL - this tends to defeat the purpose both of the rationale >behind a standardized persistence interface, as well as the design of >the schema for ultimate extensibility through weak typing and the use >of controlled vocabularies. > >Any thoughts to this end welcome. I think that the best way to avoid localized versions might be to release a BioSQL 1.1 as soon as possible. The is_circular column has been on the todo list for a very long time. The above taxon_qualifier_value table would also be required to give more complete persistence of genbank data. Is there any reason why 1.1 cannot be released promptly? I also wonder about how likely a standardised persistence interface is when there is the possibility of using custom ontologies. Biojavax is much better at using the correct tables in BioSQL but we use our own ontology terms for all kinds of qualifiers. The way we persist data to BioSQL is undoubtably closer to BioPerlDB than the old biojava mapping but whenever ontology comes into it there is bound to be breaks. To be truely unified the two projects (and all the other bio*s) would need to use a common ontology. I gues I am saying what do you mean by standardised persistence? - Mark From richard.holland at ebi.ac.uk Tue Jul 4 04:13:02 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 04 Jul 2006 09:13:02 +0100 Subject: [BioSQL-l] a biosql/biojavax localization question In-Reply-To: References: <44996380.6060300@autohandle.com> <44A95A2E.8000203@autohandle.com> Message-ID: <1152000782.3948.36.camel@texas.ebi.ac.uk> Personally I'd like to see *_qualifier_value tables for all BioSQL tables that represents an entity of any kind, be it term, feature, location, sequence, taxon, or anything else. In the case of is_taxon_hidden, this is specific to an individual taxon, and I can see cases where it would be appropriate to search by it (for instance, pulling out all ancestors of a given taxon that are visible). So I think this should be an additional column. By the way, is there a document somewhere detailing all the changes that are planned for 1.1? cheers, Richard On Mon, 2006-07-03 at 14:07 -0400, Hilmar Lapp wrote: > Hi David, I wish I were in the south of France soaking up sun ... > although there is no shortage of sun (or heat for that matter, and > throw humidity in there too) where I am. > > Is_Circular is a general attribute that will apply to any sequence > (given the fact that many sequences are indeed circular). This, and > the fact that one may even want to search for it, would justify > inclusion directly as a column in the biosequence table. > > Is_Taxon_Hidden is one of those attributes that BioSQL by design > handles through attribute/value associations, that is, using ontology > term associations that have a value (the term is the attribute name). > > However, there is no taxon_qualifier_value table in BioSQL, so in > essence you are asking for adding that table. > > Does anybody else have ideas for taxon attributes for which this > table may be used? > > I don't really favor a proliferation of 'localized' versions of > BioSQL - this tends to defeat the purpose both of the rationale > behind a standardized persistence interface, as well as the design of > the schema for ultimate extensibility through weak typing and the use > of controlled vocabularies. > > Any thoughts to this end welcome. > > -hilmar > > On Jul 3, 2006, at 1:55 PM, David Scott wrote: > > > sure hilmar- > > > > in the genbank taxonomy file - nodes.dmp: > > ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt > > there is a field: > > > > GenBank hidden flag (1 or 0) -- 1 if name is suppressed > > in GenBank entry lineage > > > > this field controls whether the level is included in the taxonomy > > hierarchy when the genbank ORGANISM section is generated - but the > > more general problem trying to be solved is: > > o parse genbank entries > > o store parsed entry in biosql > > o pull parsed entry from biosql > > o (re)create the genbank entry > > o compare the recreated entry with the source document for > > identity. well - ok - almost identical. > > > > there are several parameters missing from biosql to make this > > possible. the general approach to a solution has been: > > o alter the biosql table to add a new column (a sql ddl file) > > o add a private get/set for the column in the biojavax object (a > > java file) > > o add the column to the biojavax hibernate o/r mapping (an xml file) > > > > to help others that might have the same objective, and to > > accomodate those that don't wish these nonstandard columns - it is > > planned to release the o/r mapping files with the additional > > columns/fields commented out - these xml files along with the java > > files are checked out with cvs. it was not clear what to do with > > the ddl files - and it would be helpful to have them reviewed - no > > matter what is done with them. > > > > thanks for helping me - i just assumed you were late in responding > > because it is summer - and, well - you were in the the south of > > france soaking up the sun. > > > > looking to you for suggestions- > > david > > > > > > Hilmar Lapp wrote: > >> Hi David, sorry for dropping (or rather, not ever picking up) the > >> ball on this ... got lost in inbox stack. > >> > >> The earlier consensus was if I recall correctly to include > >> is_circular as a biosequence attribute in the 1.1 version. > >> > >> isTaxonHidden is new to me and I don't even understand what it > >> would mean. Can you elaborate? > >> > >> -hilmar > >> > >> On Jun 21, 2006, at 11:19 AM, David Scott wrote: > >> > >>> biojavax is using hibernate to o/r map the biosql database to > >>> biojavax > >>> objects. biojavax is planning support in the biojavax objects for > >>> fields > >>> not directly supported in the biosql database (e.g. isCircular, > >>> isTaxonHidden). in order to conform to the current biosql > >>> database, the > >>> default mapping file from biosql to biojavax will comment out the > >>> unsupported fields (so the object fields will not be initialized) > >>> and > >>> the objects will default an appropriate conforming value (e.g. > >>> false for > >>> isCircular and isTaxonHidden). for users wishing to localize > >>> biojavax: > >>> the user would uncomment the mapping file and alter the database > >>> tables. > >>> altering the database would require running ddl on the existing > >>> database > >>> to create the new table columns. what is the best way to review > >>> and then > >>> distribute the alter/create ddl for users to localize their > >>> database? > >>> _______________________________________________ > >>> BioSQL-l mailing list > >>> BioSQL-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >>> > >> > >> --=========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > >> > >> > >> > >> > > > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From hlapp at gmx.net Wed Jul 5 00:04:12 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 5 Jul 2006 00:04:12 -0400 Subject: [BioSQL-l] a biosql/biojavax localization question In-Reply-To: <1152000782.3948.36.camel@texas.ebi.ac.uk> References: <44996380.6060300@autohandle.com> <44A95A2E.8000203@autohandle.com> <1152000782.3948.36.camel@texas.ebi.ac.uk> Message-ID: On Jul 4, 2006, at 4:13 AM, Richard Holland wrote: > Personally I'd like to see *_qualifier_value tables for all BioSQL > tables that represents an entity of any kind, be it term, feature, > location, sequence, taxon, or anything else. I can see that making sense. Basically what it would say is that every entity in BioSQL is derivable, as opposed to final, in an OO sense. In fact, there aren't many entities that don't have a qualifier_value association table yet. Adding one for biodatabase would have been in my book of 1.1 changes as I use it in SymAtlas already. > > > In the case of is_taxon_hidden, this is specific to an individual > taxon, > and I can see cases where it would be appropriate to search by it (for > instance, pulling out all ancestors of a given taxon that are > visible). > So I think this should be an additional column. I would like to ask that a systematist. I have not seen it anywhere else in a taxonomy other than NCBI's. I'm not convinced it's a good idea to elevate NCBI's (or anybody else's) idiosyncrasies to columns in the Bio* persistence interface. > > By the way, is there a document somewhere detailing all the changes > that > are planned for 1.1? No, not yet. Good point though. Volunteers for starting one are welcome ... :-) -hilmar > > cheers, > Richard > > > On Mon, 2006-07-03 at 14:07 -0400, Hilmar Lapp wrote: >> Hi David, I wish I were in the south of France soaking up sun ... >> although there is no shortage of sun (or heat for that matter, and >> throw humidity in there too) where I am. >> >> Is_Circular is a general attribute that will apply to any sequence >> (given the fact that many sequences are indeed circular). This, and >> the fact that one may even want to search for it, would justify >> inclusion directly as a column in the biosequence table. >> >> Is_Taxon_Hidden is one of those attributes that BioSQL by design >> handles through attribute/value associations, that is, using ontology >> term associations that have a value (the term is the attribute name). >> >> However, there is no taxon_qualifier_value table in BioSQL, so in >> essence you are asking for adding that table. >> >> Does anybody else have ideas for taxon attributes for which this >> table may be used? >> >> I don't really favor a proliferation of 'localized' versions of >> BioSQL - this tends to defeat the purpose both of the rationale >> behind a standardized persistence interface, as well as the design of >> the schema for ultimate extensibility through weak typing and the use >> of controlled vocabularies. >> >> Any thoughts to this end welcome. >> >> -hilmar >> >> On Jul 3, 2006, at 1:55 PM, David Scott wrote: >> >>> sure hilmar- >>> >>> in the genbank taxonomy file - nodes.dmp: >>> ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt >>> there is a field: >>> >>> GenBank hidden flag (1 or 0) -- 1 if name is suppressed >>> in GenBank entry lineage >>> >>> this field controls whether the level is included in the taxonomy >>> hierarchy when the genbank ORGANISM section is generated - but the >>> more general problem trying to be solved is: >>> o parse genbank entries >>> o store parsed entry in biosql >>> o pull parsed entry from biosql >>> o (re)create the genbank entry >>> o compare the recreated entry with the source document for >>> identity. well - ok - almost identical. >>> >>> there are several parameters missing from biosql to make this >>> possible. the general approach to a solution has been: >>> o alter the biosql table to add a new column (a sql ddl file) >>> o add a private get/set for the column in the biojavax object (a >>> java file) >>> o add the column to the biojavax hibernate o/r mapping (an xml file) >>> >>> to help others that might have the same objective, and to >>> accomodate those that don't wish these nonstandard columns - it is >>> planned to release the o/r mapping files with the additional >>> columns/fields commented out - these xml files along with the java >>> files are checked out with cvs. it was not clear what to do with >>> the ddl files - and it would be helpful to have them reviewed - no >>> matter what is done with them. >>> >>> thanks for helping me - i just assumed you were late in responding >>> because it is summer - and, well - you were in the the south of >>> france soaking up the sun. >>> >>> looking to you for suggestions- >>> david >>> >>> >>> Hilmar Lapp wrote: >>>> Hi David, sorry for dropping (or rather, not ever picking up) the >>>> ball on this ... got lost in inbox stack. >>>> >>>> The earlier consensus was if I recall correctly to include >>>> is_circular as a biosequence attribute in the 1.1 version. >>>> >>>> isTaxonHidden is new to me and I don't even understand what it >>>> would mean. Can you elaborate? >>>> >>>> -hilmar >>>> >>>> On Jun 21, 2006, at 11:19 AM, David Scott wrote: >>>> >>>>> biojavax is using hibernate to o/r map the biosql database to >>>>> biojavax >>>>> objects. biojavax is planning support in the biojavax objects for >>>>> fields >>>>> not directly supported in the biosql database (e.g. isCircular, >>>>> isTaxonHidden). in order to conform to the current biosql >>>>> database, the >>>>> default mapping file from biosql to biojavax will comment out the >>>>> unsupported fields (so the object fields will not be initialized) >>>>> and >>>>> the objects will default an appropriate conforming value (e.g. >>>>> false for >>>>> isCircular and isTaxonHidden). for users wishing to localize >>>>> biojavax: >>>>> the user would uncomment the mapping file and alter the database >>>>> tables. >>>>> altering the database would require running ddl on the existing >>>>> database >>>>> to create the new table columns. what is the best way to review >>>>> and then >>>>> distribute the alter/create ddl for users to localize their >>>>> database? >>>>> _______________________________________________ >>>>> BioSQL-l mailing list >>>>> BioSQL-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>>> >>>> >>>> --=========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >> > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Jul 5 08:47:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 5 Jul 2006 08:47:05 -0400 Subject: [BioSQL-l] a biosql/biojavax localization question In-Reply-To: <1152093096.3948.82.camel@texas.ebi.ac.uk> References: <44996380.6060300@autohandle.com> <44A95A2E.8000203@autohandle.com> <1152000782.3948.36.camel@texas.ebi.ac.uk> <1152093096.3948.82.camel@texas.ebi.ac.uk> Message-ID: Alright - but was a nice try, no? On Jul 5, 2006, at 5:51 AM, Richard Holland wrote: > I think you should create it as you are the only one at present who > knows what is already planned and what is not! :) > > cheers, > Richard > > On Wed, 2006-07-05 at 00:04 -0400, Hilmar Lapp wrote: >> On Jul 4, 2006, at 4:13 AM, Richard Holland wrote: >> >>> Personally I'd like to see *_qualifier_value tables for all BioSQL >>> tables that represents an entity of any kind, be it term, feature, >>> location, sequence, taxon, or anything else. >> >> I can see that making sense. Basically what it would say is that >> every entity in BioSQL is derivable, as opposed to final, in an OO >> sense. >> >> In fact, there aren't many entities that don't have a qualifier_value >> association table yet. Adding one for biodatabase would have been in >> my book of 1.1 changes as I use it in SymAtlas already. >> >>> >>> >>> In the case of is_taxon_hidden, this is specific to an individual >>> taxon, >>> and I can see cases where it would be appropriate to search by it >>> (for >>> instance, pulling out all ancestors of a given taxon that are >>> visible). >>> So I think this should be an additional column. >> >> I would like to ask that a systematist. I have not seen it anywhere >> else in a taxonomy other than NCBI's. I'm not convinced it's a good >> idea to elevate NCBI's (or anybody else's) idiosyncrasies to columns >> in the Bio* persistence interface. >> >>> >>> By the way, is there a document somewhere detailing all the changes >>> that >>> are planned for 1.1? >> >> No, not yet. Good point though. Volunteers for starting one are >> welcome ... :-) >> >> -hilmar >> >> >>> >>> cheers, >>> Richard >>> >>> >>> On Mon, 2006-07-03 at 14:07 -0400, Hilmar Lapp wrote: >>>> Hi David, I wish I were in the south of France soaking up sun ... >>>> although there is no shortage of sun (or heat for that matter, and >>>> throw humidity in there too) where I am. >>>> >>>> Is_Circular is a general attribute that will apply to any sequence >>>> (given the fact that many sequences are indeed circular). This, and >>>> the fact that one may even want to search for it, would justify >>>> inclusion directly as a column in the biosequence table. >>>> >>>> Is_Taxon_Hidden is one of those attributes that BioSQL by design >>>> handles through attribute/value associations, that is, using >>>> ontology >>>> term associations that have a value (the term is the attribute >>>> name). >>>> >>>> However, there is no taxon_qualifier_value table in BioSQL, so in >>>> essence you are asking for adding that table. >>>> >>>> Does anybody else have ideas for taxon attributes for which this >>>> table may be used? >>>> >>>> I don't really favor a proliferation of 'localized' versions of >>>> BioSQL - this tends to defeat the purpose both of the rationale >>>> behind a standardized persistence interface, as well as the >>>> design of >>>> the schema for ultimate extensibility through weak typing and >>>> the use >>>> of controlled vocabularies. >>>> >>>> Any thoughts to this end welcome. >>>> >>>> -hilmar >>>> >>>> On Jul 3, 2006, at 1:55 PM, David Scott wrote: >>>> >>>>> sure hilmar- >>>>> >>>>> in the genbank taxonomy file - nodes.dmp: >>>>> ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt >>>>> there is a field: >>>>> >>>>> GenBank hidden flag (1 or 0) -- 1 if name is suppressed >>>>> in GenBank entry lineage >>>>> >>>>> this field controls whether the level is included in the taxonomy >>>>> hierarchy when the genbank ORGANISM section is generated - but the >>>>> more general problem trying to be solved is: >>>>> o parse genbank entries >>>>> o store parsed entry in biosql >>>>> o pull parsed entry from biosql >>>>> o (re)create the genbank entry >>>>> o compare the recreated entry with the source document for >>>>> identity. well - ok - almost identical. >>>>> >>>>> there are several parameters missing from biosql to make this >>>>> possible. the general approach to a solution has been: >>>>> o alter the biosql table to add a new column (a sql ddl file) >>>>> o add a private get/set for the column in the biojavax object (a >>>>> java file) >>>>> o add the column to the biojavax hibernate o/r mapping (an xml >>>>> file) >>>>> >>>>> to help others that might have the same objective, and to >>>>> accomodate those that don't wish these nonstandard columns - >>>>> it is >>>>> planned to release the o/r mapping files with the additional >>>>> columns/fields commented out - these xml files along with the java >>>>> files are checked out with cvs. it was not clear what to do with >>>>> the ddl files - and it would be helpful to have them reviewed - no >>>>> matter what is done with them. >>>>> >>>>> thanks for helping me - i just assumed you were late in responding >>>>> because it is summer - and, well - you were in the the south of >>>>> france soaking up the sun. >>>>> >>>>> looking to you for suggestions- >>>>> david >>>>> >>>>> >>>>> Hilmar Lapp wrote: >>>>>> Hi David, sorry for dropping (or rather, not ever picking up) the >>>>>> ball on this ... got lost in inbox stack. >>>>>> >>>>>> The earlier consensus was if I recall correctly to include >>>>>> is_circular as a biosequence attribute in the 1.1 version. >>>>>> >>>>>> isTaxonHidden is new to me and I don't even understand what it >>>>>> would mean. Can you elaborate? >>>>>> >>>>>> -hilmar >>>>>> >>>>>> On Jun 21, 2006, at 11:19 AM, David Scott wrote: >>>>>> >>>>>>> biojavax is using hibernate to o/r map the biosql database to >>>>>>> biojavax >>>>>>> objects. biojavax is planning support in the biojavax objects >>>>>>> for >>>>>>> fields >>>>>>> not directly supported in the biosql database (e.g. isCircular, >>>>>>> isTaxonHidden). in order to conform to the current biosql >>>>>>> database, the >>>>>>> default mapping file from biosql to biojavax will comment out >>>>>>> the >>>>>>> unsupported fields (so the object fields will not be >>>>>>> initialized) >>>>>>> and >>>>>>> the objects will default an appropriate conforming value (e.g. >>>>>>> false for >>>>>>> isCircular and isTaxonHidden). for users wishing to localize >>>>>>> biojavax: >>>>>>> the user would uncomment the mapping file and alter the database >>>>>>> tables. >>>>>>> altering the database would require running ddl on the existing >>>>>>> database >>>>>>> to create the new table columns. what is the best way to review >>>>>>> and then >>>>>>> distribute the alter/create ddl for users to localize their >>>>>>> database? >>>>>>> _______________________________________________ >>>>>>> BioSQL-l mailing list >>>>>>> BioSQL-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>>>>> >>>>>> >>>>>> --=========================================================== >>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>>> =========================================================== >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> -- >>> Richard Holland (BioMart Team) >>> EMBL-EBI >>> Wellcome Trust Genome Campus >>> Hinxton >>> Cambridge CB10 1SD >>> UNITED KINGDOM >>> Tel: +44-(0)1223-494416 >>> >> > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From richard.holland at ebi.ac.uk Wed Jul 5 05:51:35 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Wed, 05 Jul 2006 10:51:35 +0100 Subject: [BioSQL-l] a biosql/biojavax localization question In-Reply-To: References: <44996380.6060300@autohandle.com> <44A95A2E.8000203@autohandle.com> <1152000782.3948.36.camel@texas.ebi.ac.uk> Message-ID: <1152093096.3948.82.camel@texas.ebi.ac.uk> I think you should create it as you are the only one at present who knows what is already planned and what is not! :) cheers, Richard On Wed, 2006-07-05 at 00:04 -0400, Hilmar Lapp wrote: > On Jul 4, 2006, at 4:13 AM, Richard Holland wrote: > > > Personally I'd like to see *_qualifier_value tables for all BioSQL > > tables that represents an entity of any kind, be it term, feature, > > location, sequence, taxon, or anything else. > > I can see that making sense. Basically what it would say is that > every entity in BioSQL is derivable, as opposed to final, in an OO > sense. > > In fact, there aren't many entities that don't have a qualifier_value > association table yet. Adding one for biodatabase would have been in > my book of 1.1 changes as I use it in SymAtlas already. > > > > > > > In the case of is_taxon_hidden, this is specific to an individual > > taxon, > > and I can see cases where it would be appropriate to search by it (for > > instance, pulling out all ancestors of a given taxon that are > > visible). > > So I think this should be an additional column. > > I would like to ask that a systematist. I have not seen it anywhere > else in a taxonomy other than NCBI's. I'm not convinced it's a good > idea to elevate NCBI's (or anybody else's) idiosyncrasies to columns > in the Bio* persistence interface. > > > > > By the way, is there a document somewhere detailing all the changes > > that > > are planned for 1.1? > > No, not yet. Good point though. Volunteers for starting one are > welcome ... :-) > > -hilmar > > > > > > cheers, > > Richard > > > > > > On Mon, 2006-07-03 at 14:07 -0400, Hilmar Lapp wrote: > >> Hi David, I wish I were in the south of France soaking up sun ... > >> although there is no shortage of sun (or heat for that matter, and > >> throw humidity in there too) where I am. > >> > >> Is_Circular is a general attribute that will apply to any sequence > >> (given the fact that many sequences are indeed circular). This, and > >> the fact that one may even want to search for it, would justify > >> inclusion directly as a column in the biosequence table. > >> > >> Is_Taxon_Hidden is one of those attributes that BioSQL by design > >> handles through attribute/value associations, that is, using ontology > >> term associations that have a value (the term is the attribute name). > >> > >> However, there is no taxon_qualifier_value table in BioSQL, so in > >> essence you are asking for adding that table. > >> > >> Does anybody else have ideas for taxon attributes for which this > >> table may be used? > >> > >> I don't really favor a proliferation of 'localized' versions of > >> BioSQL - this tends to defeat the purpose both of the rationale > >> behind a standardized persistence interface, as well as the design of > >> the schema for ultimate extensibility through weak typing and the use > >> of controlled vocabularies. > >> > >> Any thoughts to this end welcome. > >> > >> -hilmar > >> > >> On Jul 3, 2006, at 1:55 PM, David Scott wrote: > >> > >>> sure hilmar- > >>> > >>> in the genbank taxonomy file - nodes.dmp: > >>> ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt > >>> there is a field: > >>> > >>> GenBank hidden flag (1 or 0) -- 1 if name is suppressed > >>> in GenBank entry lineage > >>> > >>> this field controls whether the level is included in the taxonomy > >>> hierarchy when the genbank ORGANISM section is generated - but the > >>> more general problem trying to be solved is: > >>> o parse genbank entries > >>> o store parsed entry in biosql > >>> o pull parsed entry from biosql > >>> o (re)create the genbank entry > >>> o compare the recreated entry with the source document for > >>> identity. well - ok - almost identical. > >>> > >>> there are several parameters missing from biosql to make this > >>> possible. the general approach to a solution has been: > >>> o alter the biosql table to add a new column (a sql ddl file) > >>> o add a private get/set for the column in the biojavax object (a > >>> java file) > >>> o add the column to the biojavax hibernate o/r mapping (an xml file) > >>> > >>> to help others that might have the same objective, and to > >>> accomodate those that don't wish these nonstandard columns - it is > >>> planned to release the o/r mapping files with the additional > >>> columns/fields commented out - these xml files along with the java > >>> files are checked out with cvs. it was not clear what to do with > >>> the ddl files - and it would be helpful to have them reviewed - no > >>> matter what is done with them. > >>> > >>> thanks for helping me - i just assumed you were late in responding > >>> because it is summer - and, well - you were in the the south of > >>> france soaking up the sun. > >>> > >>> looking to you for suggestions- > >>> david > >>> > >>> > >>> Hilmar Lapp wrote: > >>>> Hi David, sorry for dropping (or rather, not ever picking up) the > >>>> ball on this ... got lost in inbox stack. > >>>> > >>>> The earlier consensus was if I recall correctly to include > >>>> is_circular as a biosequence attribute in the 1.1 version. > >>>> > >>>> isTaxonHidden is new to me and I don't even understand what it > >>>> would mean. Can you elaborate? > >>>> > >>>> -hilmar > >>>> > >>>> On Jun 21, 2006, at 11:19 AM, David Scott wrote: > >>>> > >>>>> biojavax is using hibernate to o/r map the biosql database to > >>>>> biojavax > >>>>> objects. biojavax is planning support in the biojavax objects for > >>>>> fields > >>>>> not directly supported in the biosql database (e.g. isCircular, > >>>>> isTaxonHidden). in order to conform to the current biosql > >>>>> database, the > >>>>> default mapping file from biosql to biojavax will comment out the > >>>>> unsupported fields (so the object fields will not be initialized) > >>>>> and > >>>>> the objects will default an appropriate conforming value (e.g. > >>>>> false for > >>>>> isCircular and isTaxonHidden). for users wishing to localize > >>>>> biojavax: > >>>>> the user would uncomment the mapping file and alter the database > >>>>> tables. > >>>>> altering the database would require running ddl on the existing > >>>>> database > >>>>> to create the new table columns. what is the best way to review > >>>>> and then > >>>>> distribute the alter/create ddl for users to localize their > >>>>> database? > >>>>> _______________________________________________ > >>>>> BioSQL-l mailing list > >>>>> BioSQL-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >>>>> > >>>> > >>>> --=========================================================== > >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>>> =========================================================== > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>> > >> > > -- > > Richard Holland (BioMart Team) > > EMBL-EBI > > Wellcome Trust Genome Campus > > Hinxton > > Cambridge CB10 1SD > > UNITED KINGDOM > > Tel: +44-(0)1223-494416 > > > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From pim.van.nierop at falw.vu.nl Wed Jul 5 09:53:39 2006 From: pim.van.nierop at falw.vu.nl (Pim van Nierop) Date: Wed, 05 Jul 2006 15:53:39 +0200 Subject: [BioSQL-l] Prolem with loading bioseqsql scheme Message-ID: <85343C76-6149-4439-B410-4D04B642D567@falw.vu.nl> Hello all, I have just started out exploring using bioSQL in combination with PERL scripting to run a local instance of GenBank on mySQL at my lab. I have to appologize for my ignorance beforehand, as I do not know much about mySQL. I followed the instructions as provided on the BioPerl wiki page on how to start using bioSQL with bioPerl. Unfortunately, I seem to get stuck when loading my newly created database named "bioseqdb" with "biosqldb-mysql.sql" file. I use this command: > mysql -u root -p bioseqdb < c:\biosqldb-mysql.sql This generates the following error: ERROR 1005 (HY000) at line 39: Can't create table '.\bioseqdb\biodatabase.frm' (errno: 121) I looked on th einternet what the errorcode ERROR 1005 errno: 121 means. It seems it has something to do with foreign keys, but I have no clue how to act from here. Could someone please explain what I am doing wrong? Oh yeah, I use a windows XP system. All the best, Pim -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- *-*-*-*-*- Pim van Nierop Department of Molecular and Cellular Neurobiology Faculty of Earth and Life Sciences Vrije Universiteit Amsterdam Tel. +31 (0)20 5987114 Fax. +31 (0)20 5987112 *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- *-*-*-*-*- _______________________________________________ Open-Bio-l mailing list Open-Bio-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/open-bio-l From hlapp at gmx.net Thu Jul 6 07:44:38 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Jul 2006 07:44:38 -0400 Subject: [BioSQL-l] [Open-bio-l] [Fwd: Prolem with loading bioseqsql scheme] In-Reply-To: <44ACD59C.3020604@falw.vu.nl> References: <44ACD59C.3020604@falw.vu.nl> Message-ID: Hi Pim, I forwarded your email to biosql-l at lists.open-bio.org, which is where the BioSQL discussions take place. I wanted to respond yesterday but didn't get to respond to it. The page to subscribe to biosql-l is at http://obda.open-bio.org/mailman/listinfo/biosql-l -hilmar On Jul 6, 2006, at 5:19 AM, Pim van Nierop wrote: > I resend this message as I shipped it before my participation to this > mailing list was confirmed. I am sorry if its a double post. > > -------- Original Message -------- > Subject: Prolem with loading bioseqsql scheme > Date: Wed, 05 Jul 2006 15:53:39 +0200 > From: Pim van Nierop > To: open-bio-l at lists.open-bio.org > > > > Hello all, > > I have just started out exploring using bioSQL in combination with > PERL > scripting to run a local instance of GenBank on mySQL at my lab. I > have > to appologize for my ignorance beforehand, as I do not know much about > mySQL. > > I followed the instructions as provided on the BioPerl wiki page on > how > to start using bioSQL with bioPerl. Unfortunately, I seem to get stuck > when loading my newly created database named "bioseqdb" with > "biosqldb-mysql.sql" file. > > I use this command: >> mysql -u root -p bioseqdb < c:\biosqldb-mysql.sql > > This generates the following error: > ERROR 1005 (HY000) at line 39: Can't create table > '.\bioseqdb\biodatabase.frm' (errno: 121) > > I looked on th einternet what the errorcode ERROR 1005 errno: 121 > means. > It seems it has something to do with foreign keys, but I have no clue > how to act from here. > > Could someone please explain what I am doing wrong? > > Oh yeah, I use a windows XP system. > > All the best, > > Pim > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > *-*-*-*-*-*- > > Pim van Nierop > > Department of Molecular and Cellular Neurobiology > Faculty of Earth and Life Sciences > Vrije Universiteit > Amsterdam > > Tel. +31 (0)20 5987114 > Fax. +31 (0)20 5987112 > > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > *-*-*-*-*-*- > > > > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > *-*-*-*-*-*- > > Pim van Nierop > > Department of Molecular and Cellular Neurobiology > Faculty of Earth and Life Sciences > Vrije Universiteit > Amsterdam > > Tel. +31 (0)20 5987114 > Fax. +31 (0)20 5987112 > > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > *-*-*-*-*-*- > > _______________________________________________ > Open-Bio-l mailing list > Open-Bio-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/open-bio-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From pim.van.nierop at falw.vu.nl Sat Jul 8 07:19:04 2006 From: pim.van.nierop at falw.vu.nl (Pim van Nierop) Date: Sat, 08 Jul 2006 13:19:04 +0200 Subject: [BioSQL-l] Prolem with loading bioseqsql scheme Message-ID: <44AF94A8.8030501@falw.vu.nl> Hello all, I have been experimenting myself a little and it turns out that the problem (InnoDB Error 1005 errno 121) occurs with mySQL 5.0, but not with mySQL 4.1. I will continue to use 4.1 to create a bioseq-database instead. I guess the 5.0 version is bugged. Greetz, Pim From mark.schreiber at novartis.com Sun Jul 9 23:03:10 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Mon, 10 Jul 2006 11:03:10 +0800 Subject: [BioSQL-l] null title and CRC Message-ID: Hi - We are having a problem in biojava parsing some genbank records that contain references with no title. These cannot have a CRC value which is required in BioSQL. If we make the title an empty string then we quickly get non-unique CRC numbers. What does BioPerl do in these cases? - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From hlapp at gmx.net Sun Jul 9 23:22:26 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 9 Jul 2006 23:22:26 -0400 Subject: [BioSQL-l] null title and CRC In-Reply-To: References: Message-ID: <92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net> The CRC for references uses the authors, title, and location attributes in Bioperl-db, and empty (or null) strings default to the string "". If title is empty and authors and location do not distinguish two references, then why do you want to have two rows for those references? Basically, there are identical for all intents and purposes, or are they not? -hilmar On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote: > Hi - > > We are having a problem in biojava parsing some genbank records that > contain references with no title. These cannot have a CRC value > which is > required in BioSQL. If we make the title an empty string then we > quickly > get non-unique CRC numbers. > > What does BioPerl do in these cases? > > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mark.schreiber at novartis.com Thu Jul 13 01:23:18 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 13 Jul 2006 13:23:18 +0800 Subject: [BioSQL-l] Abstracts and Full Text on References Message-ID: Hi - As an enhancement for a future version of BioSQL it would be nice to have CLOB rows for abstract and full text (Full text might need to be a BLOB depending on format). Obviously they could both be null. Alternatively they could be in another table linked to Reference. I don't know if it could be done via the term relationship method?? Any thoughts? - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From hlapp at gmx.net Thu Jul 13 12:59:04 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 13 Jul 2006 12:59:04 -0400 Subject: [BioSQL-l] Abstracts and Full Text on References In-Reply-To: References: Message-ID: <21289F28-309E-4A81-B326-E939838A5820@gmx.net> Sounds reasonable to me. Attribute association wouldn't be desirable I think (it would only bloat and overload the value field). The only thing I'd be concerned about is accumulating stuff that is not supported by the language bindings ... i.e., bioperl doesn't support this, and so there isn't a way for bioperl-db to do so either. What are the plans for Biojava? Are any Biopython or Bioruby folks on this list? Any comments from those fronts? -hilmar On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote: > Hi - > > As an enhancement for a future version of BioSQL it would be nice > to have > CLOB rows for abstract and full text (Full text might need to be a > BLOB > depending on format). Obviously they could both be null. > > Alternatively they could be in another table linked to Reference. I > don't > know if it could be done via the term relationship method?? > > Any thoughts? > > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mark.schreiber at novartis.com Thu Jul 13 21:56:13 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 14 Jul 2006 09:56:13 +0800 Subject: [BioSQL-l] Abstracts and Full Text on References Message-ID: Hello - There are no specific plans for biojava although the Reference object could easily be modified to contain String getAbstract() void setAbstract(String abs) etc. I wonder if the full text of an article should be a byte[] or BLOB or a String/ CLOB. Are people more likely to want to store a PDF (usually more available) or a parsed String? - Mark Hilmar Lapp 07/14/2006 12:59 AM To: mark.schreiber at novartis.com cc: biosql-l at open-bio.org Subject: Re: [BioSQL-l] Abstracts and Full Text on References Sounds reasonable to me. Attribute association wouldn't be desirable I think (it would only bloat and overload the value field). The only thing I'd be concerned about is accumulating stuff that is not supported by the language bindings ... i.e., bioperl doesn't support this, and so there isn't a way for bioperl-db to do so either. What are the plans for Biojava? Are any Biopython or Bioruby folks on this list? Any comments from those fronts? -hilmar On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote: > Hi - > > As an enhancement for a future version of BioSQL it would be nice > to have > CLOB rows for abstract and full text (Full text might need to be a > BLOB > depending on format). Obviously they could both be null. > > Alternatively they could be in another table linked to Reference. I > don't > know if it could be done via the term relationship method?? > > Any thoughts? > > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Fri Jul 14 07:24:19 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 14 Jul 2006 07:24:19 -0400 Subject: [BioSQL-l] Abstracts and Full Text on References In-Reply-To: <1152864626.3943.61.camel@texas.ebi.ac.uk> References: <1152864626.3943.61.camel@texas.ebi.ac.uk> Message-ID: <748F3120-1FD3-4DF8-A0D7-EF9EE0414A14@gmx.net> Right. I like this. However, it also suggests to have an additional table. Who knows what other fields one will want to know for an abstract. Also, plenty of references will never have an abstract, e.g. automatic submissions, ontology term references etc. -hilmar On Jul 14, 2006, at 4:10 AM, Richard Holland wrote: > Make it a BLOB and add another column indicating the MIME type of the > BLOB. > > BLOB abstract > VARCHAR abstract_mime_type > > Then if you stored a PDF in it you could set abstract_mime_type to > 'application/x-pdf', or if it was plain text, you could set the > abstract_mime_type to 'text/plain'. > > cheers, > Richard > > On Fri, 2006-07-14 at 09:56 +0800, mark.schreiber at novartis.com wrote: >> Hello - >> >> There are no specific plans for biojava although the Reference object >> could easily be modified to contain >> >> String getAbstract() >> void setAbstract(String abs) >> etc. >> >> I wonder if the full text of an article should be a byte[] or BLOB >> or a >> String/ CLOB. Are people more likely to want to store a PDF >> (usually more >> available) or a parsed String? >> >> - Mark >> >> >> >> >> >> Hilmar Lapp >> 07/14/2006 12:59 AM >> >> >> To: mark.schreiber at novartis.com >> cc: biosql-l at open-bio.org >> Subject: Re: [BioSQL-l] Abstracts and Full Text on >> References >> >> >> Sounds reasonable to me. Attribute association wouldn't be desirable >> I think (it would only bloat and overload the value field). >> >> The only thing I'd be concerned about is accumulating stuff that is >> not supported by the language bindings ... i.e., bioperl doesn't >> support this, and so there isn't a way for bioperl-db to do so >> either. What are the plans for Biojava? >> >> Are any Biopython or Bioruby folks on this list? Any comments from >> those fronts? >> >> -hilmar >> >> On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote: >> >>> Hi - >>> >>> As an enhancement for a future version of BioSQL it would be nice >>> to have >>> CLOB rows for abstract and full text (Full text might need to be a >>> BLOB >>> depending on format). Obviously they could both be null. >>> >>> Alternatively they could be in another table linked to Reference. I >>> don't >>> know if it could be done via the term relationship method?? >>> >>> Any thoughts? >>> >>> - Mark >>> >>> Mark Schreiber >>> Research Investigator (Bioinformatics) >>> >>> Novartis Institute for Tropical Diseases (NITD) >>> 10 Biopolis Road >>> #05-01 Chromos >>> Singapore 138670 >>> www.nitd.novartis.com >>> >>> phone +65 6722 2973 >>> fax +65 6722 2910 >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >> > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From david at autohandle.com Fri Jul 14 13:48:50 2006 From: david at autohandle.com (David Scott) Date: Fri, 14 Jul 2006 10:48:50 -0700 Subject: [BioSQL-l] null title and CRC In-Reply-To: <92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net> References: <92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net> Message-ID: <44B7D902.6040804@autohandle.com> we are currently using "" in the crc calculation for the case where the title is empty (or null) - i can extend that for authors and location - what should we be storing the the table: "", empty, or null? thanks- david p.s. fog for sale: http://www.sfgate.com/liveviews/ Hilmar Lapp wrote: > The CRC for references uses the authors, title, and location > attributes in Bioperl-db, and empty (or null) strings default to the > string "". > > If title is empty and authors and location do not distinguish two > references, then why do you want to have two rows for those > references? Basically, there are identical for all intents and > purposes, or are they not? > > -hilmar > > On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote: > > >> Hi - >> >> We are having a problem in biojava parsing some genbank records that >> contain references with no title. These cannot have a CRC value >> which is >> required in BioSQL. If we make the title an empty string then we >> quickly >> get non-unique CRC numbers. >> >> What does BioPerl do in these cases? >> >> - Mark >> >> Mark Schreiber >> Research Investigator (Bioinformatics) >> >> Novartis Institute for Tropical Diseases (NITD) >> 10 Biopolis Road >> #05-01 Chromos >> Singapore 138670 >> www.nitd.novartis.com >> >> phone +65 6722 2973 >> fax +65 6722 2910 >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> >> > > From hlapp at gmx.net Fri Jul 14 14:31:44 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 14 Jul 2006 14:31:44 -0400 Subject: [BioSQL-l] null title and CRC In-Reply-To: <44B7D902.6040804@autohandle.com> References: <92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net> <44B7D902.6040804@autohandle.com> Message-ID: In the table you store the value of the attribute, not a default that substitutes for it in some calculation. I.e., either null or an empty string, depending on what the value is. (in Oracle an empty string is treated as null.) -hilmar On Jul 14, 2006, at 1:48 PM, David Scott wrote: > we are currently using "" in the crc calculation for the > case where the title is empty (or null) - i can extend that for > authors and location - what should we be storing the the table: > "", empty, or null? > > thanks- > david > > p.s. fog for sale: > http://www.sfgate.com/liveviews/ > > > Hilmar Lapp wrote: >> The CRC for references uses the authors, title, and location >> attributes in Bioperl-db, and empty (or null) strings default to the >> string "". >> >> If title is empty and authors and location do not distinguish two >> references, then why do you want to have two rows for those >> references? Basically, there are identical for all intents and >> purposes, or are they not? >> >> -hilmar >> >> On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote: >> >> >>> Hi - >>> >>> We are having a problem in biojava parsing some genbank records that >>> contain references with no title. These cannot have a CRC value >>> which is >>> required in BioSQL. If we make the title an empty string then we >>> quickly >>> get non-unique CRC numbers. >>> >>> What does BioPerl do in these cases? >>> >>> - Mark >>> >>> Mark Schreiber >>> Research Investigator (Bioinformatics) >>> >>> Novartis Institute for Tropical Diseases (NITD) >>> 10 Biopolis Road >>> #05-01 Chromos >>> Singapore 138670 >>> www.nitd.novartis.com >>> >>> phone +65 6722 2973 >>> fax +65 6722 2910 >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >>> >> > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From david at autohandle.com Fri Jul 14 14:51:18 2006 From: david at autohandle.com (David Scott) Date: Fri, 14 Jul 2006 11:51:18 -0700 Subject: [BioSQL-l] null title and CRC In-Reply-To: References: <92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net> <44B7D902.6040804@autohandle.com> Message-ID: <44B7E7A6.9040300@autohandle.com> ok, then - in the case of genbank: i'm going to try to treat missing titles as null - store them in the object as null - and provide them to the hibernate o/r mapping as null - presumably they will go into the table as null. best- Hilmar Lapp wrote: > In the table you store the value of the attribute, not a default that > substitutes for it in some calculation. I.e., either null or an empty > string, depending on what the value is. (in Oracle an empty string is > treated as null.) > > -hilmar > On Jul 14, 2006, at 1:48 PM, David Scott wrote: > >> we are currently using "" in the crc calculation for the case >> where the title is empty (or null) - i can extend that for authors >> and location - what should we be storing the the table: "", >> empty, or null? >> >> thanks- >> david >> >> p.s. fog for sale: >> http://www.sfgate.com/liveviews/ >> >> >> Hilmar Lapp wrote: >>> The CRC for references uses the authors, title, and location >>> attributes in Bioperl-db, and empty (or null) strings default to the >>> string "". >>> >>> If title is empty and authors and location do not distinguish two >>> references, then why do you want to have two rows for those >>> references? Basically, there are identical for all intents and >>> purposes, or are they not? >>> >>> -hilmar >>> >>> On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote: >>> >>> >>>> Hi - >>>> >>>> We are having a problem in biojava parsing some genbank records that >>>> contain references with no title. These cannot have a CRC value >>>> which is >>>> required in BioSQL. If we make the title an empty string then we >>>> quickly >>>> get non-unique CRC numbers. >>>> >>>> What does BioPerl do in these cases? >>>> >>>> - Mark >>>> >>>> Mark Schreiber >>>> Research Investigator (Bioinformatics) >>>> >>>> Novartis Institute for Tropical Diseases (NITD) >>>> 10 Biopolis Road >>>> #05-01 Chromos >>>> Singapore 138670 >>>> www.nitd.novartis.com >>>> >>>> phone +65 6722 2973 >>>> fax +65 6722 2910 >>>> >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>> >>>> >>> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > From richard.holland at ebi.ac.uk Thu Jul 13 04:14:55 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Thu, 13 Jul 2006 09:14:55 +0100 Subject: [BioSQL-l] Abstracts and Full Text on References In-Reply-To: References: Message-ID: <1152778496.3943.51.camel@texas.ebi.ac.uk> I'd like to enhance that request by asking for individual author records instead of a single string, and a flag indicating the type of publication - eg. journal, book, article, conference paper, etc. On Thu, 2006-07-13 at 13:23 +0800, mark.schreiber at novartis.com wrote: > Hi - > > As an enhancement for a future version of BioSQL it would be nice to have > CLOB rows for abstract and full text (Full text might need to be a BLOB > depending on format). Obviously they could both be null. > > Alternatively they could be in another table linked to Reference. I don't > know if it could be done via the term relationship method?? > > Any thoughts? > > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From richard.holland at ebi.ac.uk Fri Jul 14 04:10:25 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Fri, 14 Jul 2006 09:10:25 +0100 Subject: [BioSQL-l] Abstracts and Full Text on References In-Reply-To: References: Message-ID: <1152864626.3943.61.camel@texas.ebi.ac.uk> Make it a BLOB and add another column indicating the MIME type of the BLOB. BLOB abstract VARCHAR abstract_mime_type Then if you stored a PDF in it you could set abstract_mime_type to 'application/x-pdf', or if it was plain text, you could set the abstract_mime_type to 'text/plain'. cheers, Richard On Fri, 2006-07-14 at 09:56 +0800, mark.schreiber at novartis.com wrote: > Hello - > > There are no specific plans for biojava although the Reference object > could easily be modified to contain > > String getAbstract() > void setAbstract(String abs) > etc. > > I wonder if the full text of an article should be a byte[] or BLOB or a > String/ CLOB. Are people more likely to want to store a PDF (usually more > available) or a parsed String? > > - Mark > > > > > > Hilmar Lapp > 07/14/2006 12:59 AM > > > To: mark.schreiber at novartis.com > cc: biosql-l at open-bio.org > Subject: Re: [BioSQL-l] Abstracts and Full Text on References > > > Sounds reasonable to me. Attribute association wouldn't be desirable > I think (it would only bloat and overload the value field). > > The only thing I'd be concerned about is accumulating stuff that is > not supported by the language bindings ... i.e., bioperl doesn't > support this, and so there isn't a way for bioperl-db to do so > either. What are the plans for Biojava? > > Are any Biopython or Bioruby folks on this list? Any comments from > those fronts? > > -hilmar > > On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote: > > > Hi - > > > > As an enhancement for a future version of BioSQL it would be nice > > to have > > CLOB rows for abstract and full text (Full text might need to be a > > BLOB > > depending on format). Obviously they could both be null. > > > > Alternatively they could be in another table linked to Reference. I > > don't > > know if it could be done via the term relationship method?? > > > > Any thoughts? > > > > - Mark > > > > Mark Schreiber > > Research Investigator (Bioinformatics) > > > > Novartis Institute for Tropical Diseases (NITD) > > 10 Biopolis Road > > #05-01 Chromos > > Singapore 138670 > > www.nitd.novartis.com > > > > phone +65 6722 2973 > > fax +65 6722 2910 > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From richard.holland at ebi.ac.uk Mon Jul 17 04:57:55 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Mon, 17 Jul 2006 09:57:55 +0100 Subject: [BioSQL-l] null title and CRC In-Reply-To: References: <92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net> <44B7D902.6040804@autohandle.com> Message-ID: <1153126675.3957.17.camel@texas.ebi.ac.uk> Sounds good. cheers, Richard On Fri, 2006-07-14 at 14:31 -0400, Hilmar Lapp wrote: > In the table you store the value of the attribute, not a default that > substitutes for it in some calculation. I.e., either null or an empty > string, depending on what the value is. (in Oracle an empty string is > treated as null.) > > > -hilmar > On Jul 14, 2006, at 1:48 PM, David Scott wrote: > > > we are currently using "" in the crc calculation for the case > > where the title is empty (or null) - i can extend that for authors > > and location - what should we be storing the the table: "", > > empty, or null? > > > > thanks- > > david > > > > p.s. fog for sale: > > http://www.sfgate.com/liveviews/ > > > > > > Hilmar Lapp wrote: > > > The CRC for references uses the authors, title, and location > > > attributes in Bioperl-db, and empty (or null) strings default to the > > > string "". > > > > > > If title is empty and authors and location do not distinguish two > > > references, then why do you want to have two rows for those > > > references? Basically, there are identical for all intents and > > > purposes, or are they not? > > > > > > -hilmar > > > > > > On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote: > > > > > > > > > > Hi - > > > > > > > > We are having a problem in biojava parsing some genbank records that > > > > contain references with no title. These cannot have a CRC value > > > > which is > > > > required in BioSQL. If we make the title an empty string then we > > > > quickly > > > > get non-unique CRC numbers. > > > > > > > > What does BioPerl do in these cases? > > > > > > > > - Mark > > > > > > > > Mark Schreiber > > > > Research Investigator (Bioinformatics) > > > > > > > > Novartis Institute for Tropical Diseases (NITD) > > > > 10 Biopolis Road > > > > #05-01 Chromos > > > > Singapore 138670 > > > > www.nitd.novartis.com > > > > > > > > phone +65 6722 2973 > > > > fax +65 6722 2910 > > > > > > > > _______________________________________________ > > > > BioSQL-l mailing list > > > > BioSQL-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > > > > > > > > > > > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > > > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From mark.schreiber at novartis.com Tue Jul 18 16:41:34 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 19 Jul 2006 04:41:34 +0800 Subject: [BioSQL-l] Abstracts and Full Text on References Message-ID: Another table is probably best. Is there a working version of BioSQL 1.1 this can be added to? - Mark Hilmar Lapp 07/14/2006 07:24 PM To: Richard Holland cc: Mark Schreiber , biosql-l at open-bio.org Subject: Re: [BioSQL-l] Abstracts and Full Text on References Right. I like this. However, it also suggests to have an additional table. Who knows what other fields one will want to know for an abstract. Also, plenty of references will never have an abstract, e.g. automatic submissions, ontology term references etc. -hilmar On Jul 14, 2006, at 4:10 AM, Richard Holland wrote: > Make it a BLOB and add another column indicating the MIME type of the > BLOB. > > BLOB abstract > VARCHAR abstract_mime_type > > Then if you stored a PDF in it you could set abstract_mime_type to > 'application/x-pdf', or if it was plain text, you could set the > abstract_mime_type to 'text/plain'. > > cheers, > Richard > > On Fri, 2006-07-14 at 09:56 +0800, mark.schreiber at novartis.com wrote: >> Hello - >> >> There are no specific plans for biojava although the Reference object >> could easily be modified to contain >> >> String getAbstract() >> void setAbstract(String abs) >> etc. >> >> I wonder if the full text of an article should be a byte[] or BLOB >> or a >> String/ CLOB. Are people more likely to want to store a PDF >> (usually more >> available) or a parsed String? >> >> - Mark >> >> >> >> >> >> Hilmar Lapp >> 07/14/2006 12:59 AM >> >> >> To: mark.schreiber at novartis.com >> cc: biosql-l at open-bio.org >> Subject: Re: [BioSQL-l] Abstracts and Full Text on >> References >> >> >> Sounds reasonable to me. Attribute association wouldn't be desirable >> I think (it would only bloat and overload the value field). >> >> The only thing I'd be concerned about is accumulating stuff that is >> not supported by the language bindings ... i.e., bioperl doesn't >> support this, and so there isn't a way for bioperl-db to do so >> either. What are the plans for Biojava? >> >> Are any Biopython or Bioruby folks on this list? Any comments from >> those fronts? >> >> -hilmar >> >> On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote: >> >>> Hi - >>> >>> As an enhancement for a future version of BioSQL it would be nice >>> to have >>> CLOB rows for abstract and full text (Full text might need to be a >>> BLOB >>> depending on format). Obviously they could both be null. >>> >>> Alternatively they could be in another table linked to Reference. I >>> don't >>> know if it could be done via the term relationship method?? >>> >>> Any thoughts? >>> >>> - Mark >>> >>> Mark Schreiber >>> Research Investigator (Bioinformatics) >>> >>> Novartis Institute for Tropical Diseases (NITD) >>> 10 Biopolis Road >>> #05-01 Chromos >>> Singapore 138670 >>> www.nitd.novartis.com >>> >>> phone +65 6722 2973 >>> fax +65 6722 2910 >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >> > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Jul 18 16:50:25 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 18 Jul 2006 16:50:25 -0400 Subject: [BioSQL-l] Abstracts and Full Text on References In-Reply-To: References: Message-ID: <99FEA1E7-8540-46DE-8025-9F34D8026D0C@gmx.net> Yes and no. I was working on one at GNF. I'll have to create this in the repository. -hilmar On Jul 18, 2006, at 4:41 PM, mark.schreiber at novartis.com wrote: > Another table is probably best. > > Is there a working version of BioSQL 1.1 this can be added to? > > - Mark > > > > > > Hilmar Lapp > 07/14/2006 07:24 PM > > > To: Richard Holland > cc: Mark Schreiber , > biosql-l at open-bio.org > Subject: Re: [BioSQL-l] Abstracts and Full Text on > References > > > Right. I like this. However, it also suggests to have an additional > table. Who knows what other fields one will want to know for an > abstract. Also, plenty of references will never have an abstract, > e.g. automatic submissions, ontology term references etc. > > -hilmar > > On Jul 14, 2006, at 4:10 AM, Richard Holland wrote: > >> Make it a BLOB and add another column indicating the MIME type of the >> BLOB. >> >> BLOB abstract >> VARCHAR abstract_mime_type >> >> Then if you stored a PDF in it you could set abstract_mime_type to >> 'application/x-pdf', or if it was plain text, you could set the >> abstract_mime_type to 'text/plain'. >> >> cheers, >> Richard >> >> On Fri, 2006-07-14 at 09:56 +0800, mark.schreiber at novartis.com wrote: >>> Hello - >>> >>> There are no specific plans for biojava although the Reference >>> object >>> could easily be modified to contain >>> >>> String getAbstract() >>> void setAbstract(String abs) >>> etc. >>> >>> I wonder if the full text of an article should be a byte[] or BLOB >>> or a >>> String/ CLOB. Are people more likely to want to store a PDF >>> (usually more >>> available) or a parsed String? >>> >>> - Mark >>> >>> >>> >>> >>> >>> Hilmar Lapp >>> 07/14/2006 12:59 AM >>> >>> >>> To: mark.schreiber at novartis.com >>> cc: biosql-l at open-bio.org >>> Subject: Re: [BioSQL-l] Abstracts and Full Text on >>> References >>> >>> >>> Sounds reasonable to me. Attribute association wouldn't be desirable >>> I think (it would only bloat and overload the value field). >>> >>> The only thing I'd be concerned about is accumulating stuff that is >>> not supported by the language bindings ... i.e., bioperl doesn't >>> support this, and so there isn't a way for bioperl-db to do so >>> either. What are the plans for Biojava? >>> >>> Are any Biopython or Bioruby folks on this list? Any comments from >>> those fronts? >>> >>> -hilmar >>> >>> On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote: >>> >>>> Hi - >>>> >>>> As an enhancement for a future version of BioSQL it would be nice >>>> to have >>>> CLOB rows for abstract and full text (Full text might need to be a >>>> BLOB >>>> depending on format). Obviously they could both be null. >>>> >>>> Alternatively they could be in another table linked to Reference. I >>>> don't >>>> know if it could be done via the term relationship method?? >>>> >>>> Any thoughts? >>>> >>>> - Mark >>>> >>>> Mark Schreiber >>>> Research Investigator (Bioinformatics) >>>> >>>> Novartis Institute for Tropical Diseases (NITD) >>>> 10 Biopolis Road >>>> #05-01 Chromos >>>> Singapore 138670 >>>> www.nitd.novartis.com >>>> >>>> phone +65 6722 2973 >>>> fax +65 6722 2910 >>>> >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>> >>> >> -- >> Richard Holland (BioMart Team) >> EMBL-EBI >> Wellcome Trust Genome Campus >> Hinxton >> Cambridge CB10 1SD >> UNITED KINGDOM >> Tel: +44-(0)1223-494416 >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Jul 2 13:20:53 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 2 Jul 2006 09:20:53 -0400 Subject: [BioSQL-l] BioSQL Schema problem In-Reply-To: <44A275E5.2040104@librophyt.com> References: <44A275E5.2040104@librophyt.com> Message-ID: <2F4506F2-84FC-412A-9BC5-8E3C92E086C8@gmx.net> The biosqldb-views-pg.sql is badly outdated I notice. Sorry about that. Are you sure you need it? (Most applications will not.) I probably shouldn't just delete but try to update it. The offending seqfeature_key table has long been removed from the schema and you can safely delete the view definition from the file, but there may be a few more errors given its age. I need to investigate the script's failure on inserting nodes - this is assuming that you put the file by hand in the right place. Apparently there is an alphanumerical value that gets parsed as the taxon id (which must be numeric indeed). --download is a switch and hence does not take any arguments, -- download 0 does ask to download, which is why you see the error. I don't know why the download fails, maybe there's a problem with extended ftp mode (EPSV/EPRT commands) but I don't know off hand how you disable them in Net::FTP. -hilmar On Jun 28, 2006, at 8:28 AM, Samuel Thoraval wrote: > > Hello, > > I am new to biosql and I have 2 problems installing last CVS version > (*1.4.2.1*, /Sun Jun 16)/: > - running biosqldb-views-pg.sql after biosqldb-pg.sql gives errors, > the > first one being: > psql:biosqldb-views-pg.sql:6: ERROR: relation "seqfeature_key" > does not > exist > - running load_ncbi_taxonomy.pl with > ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz > (the script > download option set to 1 doesn't download anything) gives the > following > error : > ---------------------------------------------------------------------- > ------------------------------------------------------------------ > ./scripts/load_ncbi_taxonomy.pl --dbname bioseqdb --driver Pg -- > download 0 > gunzip: taxdata/taxdump.tar.gz: No such file or directory > tar: taxdump.tar: ne peut open: Aucun fichier ou r?pertoire de ce type > tar: Erreur non r?cup?rable: fin de l'ex?cution imm?diate > Loading NCBI taxon database in taxdata: > ... retrieving all taxon nodes in the database > ... reading in taxon nodes from nodes.dmp > ... insert / update / delete taxon nodes > failed to insert node (1;1;1;no rank;1;0): ERROR: column > "taxon_id" is > of type integer but expression is of type character varying > HINT: You will need to rewrite or cast the expression. > ---------------------------------------------------------------------- > ------------------------------------------------------------------ > > The schema expected from the biosqldb-views-pg.sql or taxonomy dump > file does not match the one in biosqldb-pg.sql. > > > Best regards, > > -- > Samuel Thoraval > LIBROPHYT, Bioinformatique > Centre de Cadarache > B?timent 185, DEVM > 13108 St Paul-Lez-Durance > France > T?l: +33 442 574 799 > Fax: +33 442 574 439 > e-mail : samuel.thoraval at librophyt.com > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Jul 2 17:44:21 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 2 Jul 2006 13:44:21 -0400 Subject: [BioSQL-l] Versioning of features In-Reply-To: References: Message-ID: <39FD8AB6-26F2-40B6-A3BC-42A42A42A06F@gmx.net> It should be straightforward. In essence you control it through the source type which as you say is an ontology term. You can for instance include the software version in the source term. This is what I did for the BLAT-derived genome mappings in SymAtlas (which runs on top of BioSQL). This wouldn't even necessitate to 'obsolete' a previous source term. You'd only have to do that if you wanted to have the exact same name for the source term, and have old and new 'version' term in the same ontology. I probably wouldn't be in much favor of doing so because then you don't have an explicit version anywhere. However, of course if you include it into the name then if compared by name two source types appear different even though they are effectively the same (e.g., same algorithm), just different versions. You can take care of that though by introducing 'parent' source (e.g. algorithm) terms that would have the versioned ones as children. Let me know if this doesn't help. -hilmar On Jun 30, 2006, at 6:16 PM, Sandie Peters wrote: > In the BioSQL v. 1.0 schema overview, the author briefly mentions > the possibility of feature set versioning using "dated" source > ontology terms. Has anyone tried this or any other versioning > methods with seqfeatures in BioSQL? > > Thanks, > Sandie Peters > Vollum Institute/OHSU > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From darin.london at duke.edu Mon Jul 3 12:41:33 2006 From: darin.london at duke.edu (Darin London) Date: Mon, 03 Jul 2006 08:41:33 -0400 Subject: [BioSQL-l] Call For Birds of a Feather Suggestions Message-ID: <44A9107D.2050304@duke.edu> The BOSC organizing comittee is currently seeking suggestions for Birds of a Feather meeting ideas. Birds of a Feather meetings are one of the more popular activities at BOSC, occurring at the end of each days session. These are free-form meetings organized by the attendees themselves to discuss one or a few topics of interest in greater detail. BOF?s have been formed to allow developers and users of individual OBF software to meet each other face-to-face to discuss the project, or to discuss completely new ideas, and even start new software development projects. These meetings offer a unique opportunity for individuals to explore more about the activities of the various Open Source Projects, and, in some cases, even take an active role influencing the future of Open Source Software development. If you would like to create a BOF, just sign up for a wiki account, login, and edit the BOSC 2006 Birds of a Feather page. From hlapp at gmx.net Mon Jul 3 17:04:48 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 3 Jul 2006 13:04:48 -0400 Subject: [BioSQL-l] a biosql/biojavax localization question In-Reply-To: <44996380.6060300@autohandle.com> References: <44996380.6060300@autohandle.com> Message-ID: Hi David, sorry for dropping (or rather, not ever picking up) the ball on this ... got lost in inbox stack. The earlier consensus was if I recall correctly to include is_circular as a biosequence attribute in the 1.1 version. isTaxonHidden is new to me and I don't even understand what it would mean. Can you elaborate? -hilmar On Jun 21, 2006, at 11:19 AM, David Scott wrote: > biojavax is using hibernate to o/r map the biosql database to biojavax > objects. biojavax is planning support in the biojavax objects for > fields > not directly supported in the biosql database (e.g. isCircular, > isTaxonHidden). in order to conform to the current biosql database, > the > default mapping file from biosql to biojavax will comment out the > unsupported fields (so the object fields will not be initialized) and > the objects will default an appropriate conforming value (e.g. > false for > isCircular and isTaxonHidden). for users wishing to localize biojavax: > the user would uncomment the mapping file and alter the database > tables. > altering the database would require running ddl on the existing > database > to create the new table columns. what is the best way to review and > then > distribute the alter/create ddl for users to localize their database? > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Mon Jul 3 18:07:10 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 3 Jul 2006 14:07:10 -0400 Subject: [BioSQL-l] a biosql/biojavax localization question In-Reply-To: <44A95A2E.8000203@autohandle.com> References: <44996380.6060300@autohandle.com> <44A95A2E.8000203@autohandle.com> Message-ID: Hi David, I wish I were in the south of France soaking up sun ... although there is no shortage of sun (or heat for that matter, and throw humidity in there too) where I am. Is_Circular is a general attribute that will apply to any sequence (given the fact that many sequences are indeed circular). This, and the fact that one may even want to search for it, would justify inclusion directly as a column in the biosequence table. Is_Taxon_Hidden is one of those attributes that BioSQL by design handles through attribute/value associations, that is, using ontology term associations that have a value (the term is the attribute name). However, there is no taxon_qualifier_value table in BioSQL, so in essence you are asking for adding that table. Does anybody else have ideas for taxon attributes for which this table may be used? I don't really favor a proliferation of 'localized' versions of BioSQL - this tends to defeat the purpose both of the rationale behind a standardized persistence interface, as well as the design of the schema for ultimate extensibility through weak typing and the use of controlled vocabularies. Any thoughts to this end welcome. -hilmar On Jul 3, 2006, at 1:55 PM, David Scott wrote: > sure hilmar- > > in the genbank taxonomy file - nodes.dmp: > ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt > there is a field: > > GenBank hidden flag (1 or 0) -- 1 if name is suppressed > in GenBank entry lineage > > this field controls whether the level is included in the taxonomy > hierarchy when the genbank ORGANISM section is generated - but the > more general problem trying to be solved is: > o parse genbank entries > o store parsed entry in biosql > o pull parsed entry from biosql > o (re)create the genbank entry > o compare the recreated entry with the source document for > identity. well - ok - almost identical. > > there are several parameters missing from biosql to make this > possible. the general approach to a solution has been: > o alter the biosql table to add a new column (a sql ddl file) > o add a private get/set for the column in the biojavax object (a > java file) > o add the column to the biojavax hibernate o/r mapping (an xml file) > > to help others that might have the same objective, and to > accomodate those that don't wish these nonstandard columns - it is > planned to release the o/r mapping files with the additional > columns/fields commented out - these xml files along with the java > files are checked out with cvs. it was not clear what to do with > the ddl files - and it would be helpful to have them reviewed - no > matter what is done with them. > > thanks for helping me - i just assumed you were late in responding > because it is summer - and, well - you were in the the south of > france soaking up the sun. > > looking to you for suggestions- > david > > > Hilmar Lapp wrote: >> Hi David, sorry for dropping (or rather, not ever picking up) the >> ball on this ... got lost in inbox stack. >> >> The earlier consensus was if I recall correctly to include >> is_circular as a biosequence attribute in the 1.1 version. >> >> isTaxonHidden is new to me and I don't even understand what it >> would mean. Can you elaborate? >> >> -hilmar >> >> On Jun 21, 2006, at 11:19 AM, David Scott wrote: >> >>> biojavax is using hibernate to o/r map the biosql database to >>> biojavax >>> objects. biojavax is planning support in the biojavax objects for >>> fields >>> not directly supported in the biosql database (e.g. isCircular, >>> isTaxonHidden). in order to conform to the current biosql >>> database, the >>> default mapping file from biosql to biojavax will comment out the >>> unsupported fields (so the object fields will not be initialized) >>> and >>> the objects will default an appropriate conforming value (e.g. >>> false for >>> isCircular and isTaxonHidden). for users wishing to localize >>> biojavax: >>> the user would uncomment the mapping file and alter the database >>> tables. >>> altering the database would require running ddl on the existing >>> database >>> to create the new table columns. what is the best way to review >>> and then >>> distribute the alter/create ddl for users to localize their >>> database? >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >> >> --=========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> >> > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From david at autohandle.com Mon Jul 3 17:55:58 2006 From: david at autohandle.com (David Scott) Date: Mon, 03 Jul 2006 10:55:58 -0700 Subject: [BioSQL-l] a biosql/biojavax localization question In-Reply-To: References: <44996380.6060300@autohandle.com> Message-ID: <44A95A2E.8000203@autohandle.com> sure hilmar- in the genbank taxonomy file - nodes.dmp: ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt there is a field: GenBank hidden flag (1 or 0) -- 1 if name is suppressed in GenBank entry lineage this field controls whether the level is included in the taxonomy hierarchy when the genbank ORGANISM section is generated - but the more general problem trying to be solved is: o parse genbank entries o store parsed entry in biosql o pull parsed entry from biosql o (re)create the genbank entry o compare the recreated entry with the source document for identity. well - ok - almost identical. there are several parameters missing from biosql to make this possible. the general approach to a solution has been: o alter the biosql table to add a new column (a sql ddl file) o add a private get/set for the column in the biojavax object (a java file) o add the column to the biojavax hibernate o/r mapping (an xml file) to help others that might have the same objective, and to accomodate those that don't wish these nonstandard columns - it is planned to release the o/r mapping files with the additional columns/fields commented out - these xml files along with the java files are checked out with cvs. it was not clear what to do with the ddl files - and it would be helpful to have them reviewed - no matter what is done with them. thanks for helping me - i just assumed you were late in responding because it is summer - and, well - you were in the the south of france soaking up the sun. looking to you for suggestions- david Hilmar Lapp wrote: > Hi David, sorry for dropping (or rather, not ever picking up) the ball > on this ... got lost in inbox stack. > > The earlier consensus was if I recall correctly to include is_circular > as a biosequence attribute in the 1.1 version. > > isTaxonHidden is new to me and I don't even understand what it would > mean. Can you elaborate? > > -hilmar > > On Jun 21, 2006, at 11:19 AM, David Scott wrote: > >> biojavax is using hibernate to o/r map the biosql database to biojavax >> objects. biojavax is planning support in the biojavax objects for fields >> not directly supported in the biosql database (e.g. isCircular, >> isTaxonHidden). in order to conform to the current biosql database, the >> default mapping file from biosql to biojavax will comment out the >> unsupported fields (so the object fields will not be initialized) and >> the objects will default an appropriate conforming value (e.g. false for >> isCircular and isTaxonHidden). for users wishing to localize biojavax: >> the user would uncomment the mapping file and alter the database tables. >> altering the database would require running ddl on the existing database >> to create the new table columns. what is the best way to review and then >> distribute the alter/create ddl for users to localize their database? >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > > --=========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > From mark.schreiber at novartis.com Tue Jul 4 05:48:43 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 4 Jul 2006 13:48:43 +0800 Subject: [BioSQL-l] a biosql/biojavax localization question Message-ID: >Is_Circular is a general attribute that will apply to any sequence >(given the fact that many sequences are indeed circular). This, and >the fact that one may even want to search for it, would justify >inclusion directly as a column in the biosequence table. > >Is_Taxon_Hidden is one of those attributes that BioSQL by design >handles through attribute/value associations, that is, using ontology >term associations that have a value (the term is the attribute name). > >However, there is no taxon_qualifier_value table in BioSQL, so in >essence you are asking for adding that table. > >Does anybody else have ideas for taxon attributes for which this >table may be used? A taxon_qualifier_value table would be potentially useful. One may want to have conflicting taxa (taxonomists never agree) that could be differentiated by use of a qualifier. The hidden attribute could also be one. >I don't really favor a proliferation of 'localized' versions of >BioSQL - this tends to defeat the purpose both of the rationale >behind a standardized persistence interface, as well as the design of >the schema for ultimate extensibility through weak typing and the use >of controlled vocabularies. > >Any thoughts to this end welcome. I think that the best way to avoid localized versions might be to release a BioSQL 1.1 as soon as possible. The is_circular column has been on the todo list for a very long time. The above taxon_qualifier_value table would also be required to give more complete persistence of genbank data. Is there any reason why 1.1 cannot be released promptly? I also wonder about how likely a standardised persistence interface is when there is the possibility of using custom ontologies. Biojavax is much better at using the correct tables in BioSQL but we use our own ontology terms for all kinds of qualifiers. The way we persist data to BioSQL is undoubtably closer to BioPerlDB than the old biojava mapping but whenever ontology comes into it there is bound to be breaks. To be truely unified the two projects (and all the other bio*s) would need to use a common ontology. I gues I am saying what do you mean by standardised persistence? - Mark From richard.holland at ebi.ac.uk Tue Jul 4 08:13:02 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 04 Jul 2006 09:13:02 +0100 Subject: [BioSQL-l] a biosql/biojavax localization question In-Reply-To: References: <44996380.6060300@autohandle.com> <44A95A2E.8000203@autohandle.com> Message-ID: <1152000782.3948.36.camel@texas.ebi.ac.uk> Personally I'd like to see *_qualifier_value tables for all BioSQL tables that represents an entity of any kind, be it term, feature, location, sequence, taxon, or anything else. In the case of is_taxon_hidden, this is specific to an individual taxon, and I can see cases where it would be appropriate to search by it (for instance, pulling out all ancestors of a given taxon that are visible). So I think this should be an additional column. By the way, is there a document somewhere detailing all the changes that are planned for 1.1? cheers, Richard On Mon, 2006-07-03 at 14:07 -0400, Hilmar Lapp wrote: > Hi David, I wish I were in the south of France soaking up sun ... > although there is no shortage of sun (or heat for that matter, and > throw humidity in there too) where I am. > > Is_Circular is a general attribute that will apply to any sequence > (given the fact that many sequences are indeed circular). This, and > the fact that one may even want to search for it, would justify > inclusion directly as a column in the biosequence table. > > Is_Taxon_Hidden is one of those attributes that BioSQL by design > handles through attribute/value associations, that is, using ontology > term associations that have a value (the term is the attribute name). > > However, there is no taxon_qualifier_value table in BioSQL, so in > essence you are asking for adding that table. > > Does anybody else have ideas for taxon attributes for which this > table may be used? > > I don't really favor a proliferation of 'localized' versions of > BioSQL - this tends to defeat the purpose both of the rationale > behind a standardized persistence interface, as well as the design of > the schema for ultimate extensibility through weak typing and the use > of controlled vocabularies. > > Any thoughts to this end welcome. > > -hilmar > > On Jul 3, 2006, at 1:55 PM, David Scott wrote: > > > sure hilmar- > > > > in the genbank taxonomy file - nodes.dmp: > > ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt > > there is a field: > > > > GenBank hidden flag (1 or 0) -- 1 if name is suppressed > > in GenBank entry lineage > > > > this field controls whether the level is included in the taxonomy > > hierarchy when the genbank ORGANISM section is generated - but the > > more general problem trying to be solved is: > > o parse genbank entries > > o store parsed entry in biosql > > o pull parsed entry from biosql > > o (re)create the genbank entry > > o compare the recreated entry with the source document for > > identity. well - ok - almost identical. > > > > there are several parameters missing from biosql to make this > > possible. the general approach to a solution has been: > > o alter the biosql table to add a new column (a sql ddl file) > > o add a private get/set for the column in the biojavax object (a > > java file) > > o add the column to the biojavax hibernate o/r mapping (an xml file) > > > > to help others that might have the same objective, and to > > accomodate those that don't wish these nonstandard columns - it is > > planned to release the o/r mapping files with the additional > > columns/fields commented out - these xml files along with the java > > files are checked out with cvs. it was not clear what to do with > > the ddl files - and it would be helpful to have them reviewed - no > > matter what is done with them. > > > > thanks for helping me - i just assumed you were late in responding > > because it is summer - and, well - you were in the the south of > > france soaking up the sun. > > > > looking to you for suggestions- > > david > > > > > > Hilmar Lapp wrote: > >> Hi David, sorry for dropping (or rather, not ever picking up) the > >> ball on this ... got lost in inbox stack. > >> > >> The earlier consensus was if I recall correctly to include > >> is_circular as a biosequence attribute in the 1.1 version. > >> > >> isTaxonHidden is new to me and I don't even understand what it > >> would mean. Can you elaborate? > >> > >> -hilmar > >> > >> On Jun 21, 2006, at 11:19 AM, David Scott wrote: > >> > >>> biojavax is using hibernate to o/r map the biosql database to > >>> biojavax > >>> objects. biojavax is planning support in the biojavax objects for > >>> fields > >>> not directly supported in the biosql database (e.g. isCircular, > >>> isTaxonHidden). in order to conform to the current biosql > >>> database, the > >>> default mapping file from biosql to biojavax will comment out the > >>> unsupported fields (so the object fields will not be initialized) > >>> and > >>> the objects will default an appropriate conforming value (e.g. > >>> false for > >>> isCircular and isTaxonHidden). for users wishing to localize > >>> biojavax: > >>> the user would uncomment the mapping file and alter the database > >>> tables. > >>> altering the database would require running ddl on the existing > >>> database > >>> to create the new table columns. what is the best way to review > >>> and then > >>> distribute the alter/create ddl for users to localize their > >>> database? > >>> _______________________________________________ > >>> BioSQL-l mailing list > >>> BioSQL-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >>> > >> > >> --=========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > >> > >> > >> > >> > > > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From hlapp at gmx.net Wed Jul 5 04:04:12 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 5 Jul 2006 00:04:12 -0400 Subject: [BioSQL-l] a biosql/biojavax localization question In-Reply-To: <1152000782.3948.36.camel@texas.ebi.ac.uk> References: <44996380.6060300@autohandle.com> <44A95A2E.8000203@autohandle.com> <1152000782.3948.36.camel@texas.ebi.ac.uk> Message-ID: On Jul 4, 2006, at 4:13 AM, Richard Holland wrote: > Personally I'd like to see *_qualifier_value tables for all BioSQL > tables that represents an entity of any kind, be it term, feature, > location, sequence, taxon, or anything else. I can see that making sense. Basically what it would say is that every entity in BioSQL is derivable, as opposed to final, in an OO sense. In fact, there aren't many entities that don't have a qualifier_value association table yet. Adding one for biodatabase would have been in my book of 1.1 changes as I use it in SymAtlas already. > > > In the case of is_taxon_hidden, this is specific to an individual > taxon, > and I can see cases where it would be appropriate to search by it (for > instance, pulling out all ancestors of a given taxon that are > visible). > So I think this should be an additional column. I would like to ask that a systematist. I have not seen it anywhere else in a taxonomy other than NCBI's. I'm not convinced it's a good idea to elevate NCBI's (or anybody else's) idiosyncrasies to columns in the Bio* persistence interface. > > By the way, is there a document somewhere detailing all the changes > that > are planned for 1.1? No, not yet. Good point though. Volunteers for starting one are welcome ... :-) -hilmar > > cheers, > Richard > > > On Mon, 2006-07-03 at 14:07 -0400, Hilmar Lapp wrote: >> Hi David, I wish I were in the south of France soaking up sun ... >> although there is no shortage of sun (or heat for that matter, and >> throw humidity in there too) where I am. >> >> Is_Circular is a general attribute that will apply to any sequence >> (given the fact that many sequences are indeed circular). This, and >> the fact that one may even want to search for it, would justify >> inclusion directly as a column in the biosequence table. >> >> Is_Taxon_Hidden is one of those attributes that BioSQL by design >> handles through attribute/value associations, that is, using ontology >> term associations that have a value (the term is the attribute name). >> >> However, there is no taxon_qualifier_value table in BioSQL, so in >> essence you are asking for adding that table. >> >> Does anybody else have ideas for taxon attributes for which this >> table may be used? >> >> I don't really favor a proliferation of 'localized' versions of >> BioSQL - this tends to defeat the purpose both of the rationale >> behind a standardized persistence interface, as well as the design of >> the schema for ultimate extensibility through weak typing and the use >> of controlled vocabularies. >> >> Any thoughts to this end welcome. >> >> -hilmar >> >> On Jul 3, 2006, at 1:55 PM, David Scott wrote: >> >>> sure hilmar- >>> >>> in the genbank taxonomy file - nodes.dmp: >>> ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt >>> there is a field: >>> >>> GenBank hidden flag (1 or 0) -- 1 if name is suppressed >>> in GenBank entry lineage >>> >>> this field controls whether the level is included in the taxonomy >>> hierarchy when the genbank ORGANISM section is generated - but the >>> more general problem trying to be solved is: >>> o parse genbank entries >>> o store parsed entry in biosql >>> o pull parsed entry from biosql >>> o (re)create the genbank entry >>> o compare the recreated entry with the source document for >>> identity. well - ok - almost identical. >>> >>> there are several parameters missing from biosql to make this >>> possible. the general approach to a solution has been: >>> o alter the biosql table to add a new column (a sql ddl file) >>> o add a private get/set for the column in the biojavax object (a >>> java file) >>> o add the column to the biojavax hibernate o/r mapping (an xml file) >>> >>> to help others that might have the same objective, and to >>> accomodate those that don't wish these nonstandard columns - it is >>> planned to release the o/r mapping files with the additional >>> columns/fields commented out - these xml files along with the java >>> files are checked out with cvs. it was not clear what to do with >>> the ddl files - and it would be helpful to have them reviewed - no >>> matter what is done with them. >>> >>> thanks for helping me - i just assumed you were late in responding >>> because it is summer - and, well - you were in the the south of >>> france soaking up the sun. >>> >>> looking to you for suggestions- >>> david >>> >>> >>> Hilmar Lapp wrote: >>>> Hi David, sorry for dropping (or rather, not ever picking up) the >>>> ball on this ... got lost in inbox stack. >>>> >>>> The earlier consensus was if I recall correctly to include >>>> is_circular as a biosequence attribute in the 1.1 version. >>>> >>>> isTaxonHidden is new to me and I don't even understand what it >>>> would mean. Can you elaborate? >>>> >>>> -hilmar >>>> >>>> On Jun 21, 2006, at 11:19 AM, David Scott wrote: >>>> >>>>> biojavax is using hibernate to o/r map the biosql database to >>>>> biojavax >>>>> objects. biojavax is planning support in the biojavax objects for >>>>> fields >>>>> not directly supported in the biosql database (e.g. isCircular, >>>>> isTaxonHidden). in order to conform to the current biosql >>>>> database, the >>>>> default mapping file from biosql to biojavax will comment out the >>>>> unsupported fields (so the object fields will not be initialized) >>>>> and >>>>> the objects will default an appropriate conforming value (e.g. >>>>> false for >>>>> isCircular and isTaxonHidden). for users wishing to localize >>>>> biojavax: >>>>> the user would uncomment the mapping file and alter the database >>>>> tables. >>>>> altering the database would require running ddl on the existing >>>>> database >>>>> to create the new table columns. what is the best way to review >>>>> and then >>>>> distribute the alter/create ddl for users to localize their >>>>> database? >>>>> _______________________________________________ >>>>> BioSQL-l mailing list >>>>> BioSQL-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>>> >>>> >>>> --=========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >> > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Jul 5 12:47:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 5 Jul 2006 08:47:05 -0400 Subject: [BioSQL-l] a biosql/biojavax localization question In-Reply-To: <1152093096.3948.82.camel@texas.ebi.ac.uk> References: <44996380.6060300@autohandle.com> <44A95A2E.8000203@autohandle.com> <1152000782.3948.36.camel@texas.ebi.ac.uk> <1152093096.3948.82.camel@texas.ebi.ac.uk> Message-ID: Alright - but was a nice try, no? On Jul 5, 2006, at 5:51 AM, Richard Holland wrote: > I think you should create it as you are the only one at present who > knows what is already planned and what is not! :) > > cheers, > Richard > > On Wed, 2006-07-05 at 00:04 -0400, Hilmar Lapp wrote: >> On Jul 4, 2006, at 4:13 AM, Richard Holland wrote: >> >>> Personally I'd like to see *_qualifier_value tables for all BioSQL >>> tables that represents an entity of any kind, be it term, feature, >>> location, sequence, taxon, or anything else. >> >> I can see that making sense. Basically what it would say is that >> every entity in BioSQL is derivable, as opposed to final, in an OO >> sense. >> >> In fact, there aren't many entities that don't have a qualifier_value >> association table yet. Adding one for biodatabase would have been in >> my book of 1.1 changes as I use it in SymAtlas already. >> >>> >>> >>> In the case of is_taxon_hidden, this is specific to an individual >>> taxon, >>> and I can see cases where it would be appropriate to search by it >>> (for >>> instance, pulling out all ancestors of a given taxon that are >>> visible). >>> So I think this should be an additional column. >> >> I would like to ask that a systematist. I have not seen it anywhere >> else in a taxonomy other than NCBI's. I'm not convinced it's a good >> idea to elevate NCBI's (or anybody else's) idiosyncrasies to columns >> in the Bio* persistence interface. >> >>> >>> By the way, is there a document somewhere detailing all the changes >>> that >>> are planned for 1.1? >> >> No, not yet. Good point though. Volunteers for starting one are >> welcome ... :-) >> >> -hilmar >> >> >>> >>> cheers, >>> Richard >>> >>> >>> On Mon, 2006-07-03 at 14:07 -0400, Hilmar Lapp wrote: >>>> Hi David, I wish I were in the south of France soaking up sun ... >>>> although there is no shortage of sun (or heat for that matter, and >>>> throw humidity in there too) where I am. >>>> >>>> Is_Circular is a general attribute that will apply to any sequence >>>> (given the fact that many sequences are indeed circular). This, and >>>> the fact that one may even want to search for it, would justify >>>> inclusion directly as a column in the biosequence table. >>>> >>>> Is_Taxon_Hidden is one of those attributes that BioSQL by design >>>> handles through attribute/value associations, that is, using >>>> ontology >>>> term associations that have a value (the term is the attribute >>>> name). >>>> >>>> However, there is no taxon_qualifier_value table in BioSQL, so in >>>> essence you are asking for adding that table. >>>> >>>> Does anybody else have ideas for taxon attributes for which this >>>> table may be used? >>>> >>>> I don't really favor a proliferation of 'localized' versions of >>>> BioSQL - this tends to defeat the purpose both of the rationale >>>> behind a standardized persistence interface, as well as the >>>> design of >>>> the schema for ultimate extensibility through weak typing and >>>> the use >>>> of controlled vocabularies. >>>> >>>> Any thoughts to this end welcome. >>>> >>>> -hilmar >>>> >>>> On Jul 3, 2006, at 1:55 PM, David Scott wrote: >>>> >>>>> sure hilmar- >>>>> >>>>> in the genbank taxonomy file - nodes.dmp: >>>>> ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt >>>>> there is a field: >>>>> >>>>> GenBank hidden flag (1 or 0) -- 1 if name is suppressed >>>>> in GenBank entry lineage >>>>> >>>>> this field controls whether the level is included in the taxonomy >>>>> hierarchy when the genbank ORGANISM section is generated - but the >>>>> more general problem trying to be solved is: >>>>> o parse genbank entries >>>>> o store parsed entry in biosql >>>>> o pull parsed entry from biosql >>>>> o (re)create the genbank entry >>>>> o compare the recreated entry with the source document for >>>>> identity. well - ok - almost identical. >>>>> >>>>> there are several parameters missing from biosql to make this >>>>> possible. the general approach to a solution has been: >>>>> o alter the biosql table to add a new column (a sql ddl file) >>>>> o add a private get/set for the column in the biojavax object (a >>>>> java file) >>>>> o add the column to the biojavax hibernate o/r mapping (an xml >>>>> file) >>>>> >>>>> to help others that might have the same objective, and to >>>>> accomodate those that don't wish these nonstandard columns - >>>>> it is >>>>> planned to release the o/r mapping files with the additional >>>>> columns/fields commented out - these xml files along with the java >>>>> files are checked out with cvs. it was not clear what to do with >>>>> the ddl files - and it would be helpful to have them reviewed - no >>>>> matter what is done with them. >>>>> >>>>> thanks for helping me - i just assumed you were late in responding >>>>> because it is summer - and, well - you were in the the south of >>>>> france soaking up the sun. >>>>> >>>>> looking to you for suggestions- >>>>> david >>>>> >>>>> >>>>> Hilmar Lapp wrote: >>>>>> Hi David, sorry for dropping (or rather, not ever picking up) the >>>>>> ball on this ... got lost in inbox stack. >>>>>> >>>>>> The earlier consensus was if I recall correctly to include >>>>>> is_circular as a biosequence attribute in the 1.1 version. >>>>>> >>>>>> isTaxonHidden is new to me and I don't even understand what it >>>>>> would mean. Can you elaborate? >>>>>> >>>>>> -hilmar >>>>>> >>>>>> On Jun 21, 2006, at 11:19 AM, David Scott wrote: >>>>>> >>>>>>> biojavax is using hibernate to o/r map the biosql database to >>>>>>> biojavax >>>>>>> objects. biojavax is planning support in the biojavax objects >>>>>>> for >>>>>>> fields >>>>>>> not directly supported in the biosql database (e.g. isCircular, >>>>>>> isTaxonHidden). in order to conform to the current biosql >>>>>>> database, the >>>>>>> default mapping file from biosql to biojavax will comment out >>>>>>> the >>>>>>> unsupported fields (so the object fields will not be >>>>>>> initialized) >>>>>>> and >>>>>>> the objects will default an appropriate conforming value (e.g. >>>>>>> false for >>>>>>> isCircular and isTaxonHidden). for users wishing to localize >>>>>>> biojavax: >>>>>>> the user would uncomment the mapping file and alter the database >>>>>>> tables. >>>>>>> altering the database would require running ddl on the existing >>>>>>> database >>>>>>> to create the new table columns. what is the best way to review >>>>>>> and then >>>>>>> distribute the alter/create ddl for users to localize their >>>>>>> database? >>>>>>> _______________________________________________ >>>>>>> BioSQL-l mailing list >>>>>>> BioSQL-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>>>>> >>>>>> >>>>>> --=========================================================== >>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>>> =========================================================== >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> -- >>> Richard Holland (BioMart Team) >>> EMBL-EBI >>> Wellcome Trust Genome Campus >>> Hinxton >>> Cambridge CB10 1SD >>> UNITED KINGDOM >>> Tel: +44-(0)1223-494416 >>> >> > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From richard.holland at ebi.ac.uk Wed Jul 5 09:51:35 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Wed, 05 Jul 2006 10:51:35 +0100 Subject: [BioSQL-l] a biosql/biojavax localization question In-Reply-To: References: <44996380.6060300@autohandle.com> <44A95A2E.8000203@autohandle.com> <1152000782.3948.36.camel@texas.ebi.ac.uk> Message-ID: <1152093096.3948.82.camel@texas.ebi.ac.uk> I think you should create it as you are the only one at present who knows what is already planned and what is not! :) cheers, Richard On Wed, 2006-07-05 at 00:04 -0400, Hilmar Lapp wrote: > On Jul 4, 2006, at 4:13 AM, Richard Holland wrote: > > > Personally I'd like to see *_qualifier_value tables for all BioSQL > > tables that represents an entity of any kind, be it term, feature, > > location, sequence, taxon, or anything else. > > I can see that making sense. Basically what it would say is that > every entity in BioSQL is derivable, as opposed to final, in an OO > sense. > > In fact, there aren't many entities that don't have a qualifier_value > association table yet. Adding one for biodatabase would have been in > my book of 1.1 changes as I use it in SymAtlas already. > > > > > > > In the case of is_taxon_hidden, this is specific to an individual > > taxon, > > and I can see cases where it would be appropriate to search by it (for > > instance, pulling out all ancestors of a given taxon that are > > visible). > > So I think this should be an additional column. > > I would like to ask that a systematist. I have not seen it anywhere > else in a taxonomy other than NCBI's. I'm not convinced it's a good > idea to elevate NCBI's (or anybody else's) idiosyncrasies to columns > in the Bio* persistence interface. > > > > > By the way, is there a document somewhere detailing all the changes > > that > > are planned for 1.1? > > No, not yet. Good point though. Volunteers for starting one are > welcome ... :-) > > -hilmar > > > > > > cheers, > > Richard > > > > > > On Mon, 2006-07-03 at 14:07 -0400, Hilmar Lapp wrote: > >> Hi David, I wish I were in the south of France soaking up sun ... > >> although there is no shortage of sun (or heat for that matter, and > >> throw humidity in there too) where I am. > >> > >> Is_Circular is a general attribute that will apply to any sequence > >> (given the fact that many sequences are indeed circular). This, and > >> the fact that one may even want to search for it, would justify > >> inclusion directly as a column in the biosequence table. > >> > >> Is_Taxon_Hidden is one of those attributes that BioSQL by design > >> handles through attribute/value associations, that is, using ontology > >> term associations that have a value (the term is the attribute name). > >> > >> However, there is no taxon_qualifier_value table in BioSQL, so in > >> essence you are asking for adding that table. > >> > >> Does anybody else have ideas for taxon attributes for which this > >> table may be used? > >> > >> I don't really favor a proliferation of 'localized' versions of > >> BioSQL - this tends to defeat the purpose both of the rationale > >> behind a standardized persistence interface, as well as the design of > >> the schema for ultimate extensibility through weak typing and the use > >> of controlled vocabularies. > >> > >> Any thoughts to this end welcome. > >> > >> -hilmar > >> > >> On Jul 3, 2006, at 1:55 PM, David Scott wrote: > >> > >>> sure hilmar- > >>> > >>> in the genbank taxonomy file - nodes.dmp: > >>> ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt > >>> there is a field: > >>> > >>> GenBank hidden flag (1 or 0) -- 1 if name is suppressed > >>> in GenBank entry lineage > >>> > >>> this field controls whether the level is included in the taxonomy > >>> hierarchy when the genbank ORGANISM section is generated - but the > >>> more general problem trying to be solved is: > >>> o parse genbank entries > >>> o store parsed entry in biosql > >>> o pull parsed entry from biosql > >>> o (re)create the genbank entry > >>> o compare the recreated entry with the source document for > >>> identity. well - ok - almost identical. > >>> > >>> there are several parameters missing from biosql to make this > >>> possible. the general approach to a solution has been: > >>> o alter the biosql table to add a new column (a sql ddl file) > >>> o add a private get/set for the column in the biojavax object (a > >>> java file) > >>> o add the column to the biojavax hibernate o/r mapping (an xml file) > >>> > >>> to help others that might have the same objective, and to > >>> accomodate those that don't wish these nonstandard columns - it is > >>> planned to release the o/r mapping files with the additional > >>> columns/fields commented out - these xml files along with the java > >>> files are checked out with cvs. it was not clear what to do with > >>> the ddl files - and it would be helpful to have them reviewed - no > >>> matter what is done with them. > >>> > >>> thanks for helping me - i just assumed you were late in responding > >>> because it is summer - and, well - you were in the the south of > >>> france soaking up the sun. > >>> > >>> looking to you for suggestions- > >>> david > >>> > >>> > >>> Hilmar Lapp wrote: > >>>> Hi David, sorry for dropping (or rather, not ever picking up) the > >>>> ball on this ... got lost in inbox stack. > >>>> > >>>> The earlier consensus was if I recall correctly to include > >>>> is_circular as a biosequence attribute in the 1.1 version. > >>>> > >>>> isTaxonHidden is new to me and I don't even understand what it > >>>> would mean. Can you elaborate? > >>>> > >>>> -hilmar > >>>> > >>>> On Jun 21, 2006, at 11:19 AM, David Scott wrote: > >>>> > >>>>> biojavax is using hibernate to o/r map the biosql database to > >>>>> biojavax > >>>>> objects. biojavax is planning support in the biojavax objects for > >>>>> fields > >>>>> not directly supported in the biosql database (e.g. isCircular, > >>>>> isTaxonHidden). in order to conform to the current biosql > >>>>> database, the > >>>>> default mapping file from biosql to biojavax will comment out the > >>>>> unsupported fields (so the object fields will not be initialized) > >>>>> and > >>>>> the objects will default an appropriate conforming value (e.g. > >>>>> false for > >>>>> isCircular and isTaxonHidden). for users wishing to localize > >>>>> biojavax: > >>>>> the user would uncomment the mapping file and alter the database > >>>>> tables. > >>>>> altering the database would require running ddl on the existing > >>>>> database > >>>>> to create the new table columns. what is the best way to review > >>>>> and then > >>>>> distribute the alter/create ddl for users to localize their > >>>>> database? > >>>>> _______________________________________________ > >>>>> BioSQL-l mailing list > >>>>> BioSQL-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >>>>> > >>>> > >>>> --=========================================================== > >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>>> =========================================================== > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>> > >> > > -- > > Richard Holland (BioMart Team) > > EMBL-EBI > > Wellcome Trust Genome Campus > > Hinxton > > Cambridge CB10 1SD > > UNITED KINGDOM > > Tel: +44-(0)1223-494416 > > > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From pim.van.nierop at falw.vu.nl Wed Jul 5 13:53:39 2006 From: pim.van.nierop at falw.vu.nl (Pim van Nierop) Date: Wed, 05 Jul 2006 15:53:39 +0200 Subject: [BioSQL-l] Prolem with loading bioseqsql scheme Message-ID: <85343C76-6149-4439-B410-4D04B642D567@falw.vu.nl> Hello all, I have just started out exploring using bioSQL in combination with PERL scripting to run a local instance of GenBank on mySQL at my lab. I have to appologize for my ignorance beforehand, as I do not know much about mySQL. I followed the instructions as provided on the BioPerl wiki page on how to start using bioSQL with bioPerl. Unfortunately, I seem to get stuck when loading my newly created database named "bioseqdb" with "biosqldb-mysql.sql" file. I use this command: > mysql -u root -p bioseqdb < c:\biosqldb-mysql.sql This generates the following error: ERROR 1005 (HY000) at line 39: Can't create table '.\bioseqdb\biodatabase.frm' (errno: 121) I looked on th einternet what the errorcode ERROR 1005 errno: 121 means. It seems it has something to do with foreign keys, but I have no clue how to act from here. Could someone please explain what I am doing wrong? Oh yeah, I use a windows XP system. All the best, Pim -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- *-*-*-*-*- Pim van Nierop Department of Molecular and Cellular Neurobiology Faculty of Earth and Life Sciences Vrije Universiteit Amsterdam Tel. +31 (0)20 5987114 Fax. +31 (0)20 5987112 *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- *-*-*-*-*- _______________________________________________ Open-Bio-l mailing list Open-Bio-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/open-bio-l From hlapp at gmx.net Thu Jul 6 11:44:38 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Jul 2006 07:44:38 -0400 Subject: [BioSQL-l] [Open-bio-l] [Fwd: Prolem with loading bioseqsql scheme] In-Reply-To: <44ACD59C.3020604@falw.vu.nl> References: <44ACD59C.3020604@falw.vu.nl> Message-ID: Hi Pim, I forwarded your email to biosql-l at lists.open-bio.org, which is where the BioSQL discussions take place. I wanted to respond yesterday but didn't get to respond to it. The page to subscribe to biosql-l is at http://obda.open-bio.org/mailman/listinfo/biosql-l -hilmar On Jul 6, 2006, at 5:19 AM, Pim van Nierop wrote: > I resend this message as I shipped it before my participation to this > mailing list was confirmed. I am sorry if its a double post. > > -------- Original Message -------- > Subject: Prolem with loading bioseqsql scheme > Date: Wed, 05 Jul 2006 15:53:39 +0200 > From: Pim van Nierop > To: open-bio-l at lists.open-bio.org > > > > Hello all, > > I have just started out exploring using bioSQL in combination with > PERL > scripting to run a local instance of GenBank on mySQL at my lab. I > have > to appologize for my ignorance beforehand, as I do not know much about > mySQL. > > I followed the instructions as provided on the BioPerl wiki page on > how > to start using bioSQL with bioPerl. Unfortunately, I seem to get stuck > when loading my newly created database named "bioseqdb" with > "biosqldb-mysql.sql" file. > > I use this command: >> mysql -u root -p bioseqdb < c:\biosqldb-mysql.sql > > This generates the following error: > ERROR 1005 (HY000) at line 39: Can't create table > '.\bioseqdb\biodatabase.frm' (errno: 121) > > I looked on th einternet what the errorcode ERROR 1005 errno: 121 > means. > It seems it has something to do with foreign keys, but I have no clue > how to act from here. > > Could someone please explain what I am doing wrong? > > Oh yeah, I use a windows XP system. > > All the best, > > Pim > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > *-*-*-*-*-*- > > Pim van Nierop > > Department of Molecular and Cellular Neurobiology > Faculty of Earth and Life Sciences > Vrije Universiteit > Amsterdam > > Tel. +31 (0)20 5987114 > Fax. +31 (0)20 5987112 > > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > *-*-*-*-*-*- > > > > > -- > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > *-*-*-*-*-*- > > Pim van Nierop > > Department of Molecular and Cellular Neurobiology > Faculty of Earth and Life Sciences > Vrije Universiteit > Amsterdam > > Tel. +31 (0)20 5987114 > Fax. +31 (0)20 5987112 > > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- > *-*-*-*-*-*- > > _______________________________________________ > Open-Bio-l mailing list > Open-Bio-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/open-bio-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From pim.van.nierop at falw.vu.nl Sat Jul 8 11:19:04 2006 From: pim.van.nierop at falw.vu.nl (Pim van Nierop) Date: Sat, 08 Jul 2006 13:19:04 +0200 Subject: [BioSQL-l] Prolem with loading bioseqsql scheme Message-ID: <44AF94A8.8030501@falw.vu.nl> Hello all, I have been experimenting myself a little and it turns out that the problem (InnoDB Error 1005 errno 121) occurs with mySQL 5.0, but not with mySQL 4.1. I will continue to use 4.1 to create a bioseq-database instead. I guess the 5.0 version is bugged. Greetz, Pim From mark.schreiber at novartis.com Mon Jul 10 03:03:10 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Mon, 10 Jul 2006 11:03:10 +0800 Subject: [BioSQL-l] null title and CRC Message-ID: Hi - We are having a problem in biojava parsing some genbank records that contain references with no title. These cannot have a CRC value which is required in BioSQL. If we make the title an empty string then we quickly get non-unique CRC numbers. What does BioPerl do in these cases? - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From hlapp at gmx.net Mon Jul 10 03:22:26 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 9 Jul 2006 23:22:26 -0400 Subject: [BioSQL-l] null title and CRC In-Reply-To: References: Message-ID: <92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net> The CRC for references uses the authors, title, and location attributes in Bioperl-db, and empty (or null) strings default to the string "". If title is empty and authors and location do not distinguish two references, then why do you want to have two rows for those references? Basically, there are identical for all intents and purposes, or are they not? -hilmar On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote: > Hi - > > We are having a problem in biojava parsing some genbank records that > contain references with no title. These cannot have a CRC value > which is > required in BioSQL. If we make the title an empty string then we > quickly > get non-unique CRC numbers. > > What does BioPerl do in these cases? > > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mark.schreiber at novartis.com Thu Jul 13 05:23:18 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 13 Jul 2006 13:23:18 +0800 Subject: [BioSQL-l] Abstracts and Full Text on References Message-ID: Hi - As an enhancement for a future version of BioSQL it would be nice to have CLOB rows for abstract and full text (Full text might need to be a BLOB depending on format). Obviously they could both be null. Alternatively they could be in another table linked to Reference. I don't know if it could be done via the term relationship method?? Any thoughts? - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From hlapp at gmx.net Thu Jul 13 16:59:04 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 13 Jul 2006 12:59:04 -0400 Subject: [BioSQL-l] Abstracts and Full Text on References In-Reply-To: References: Message-ID: <21289F28-309E-4A81-B326-E939838A5820@gmx.net> Sounds reasonable to me. Attribute association wouldn't be desirable I think (it would only bloat and overload the value field). The only thing I'd be concerned about is accumulating stuff that is not supported by the language bindings ... i.e., bioperl doesn't support this, and so there isn't a way for bioperl-db to do so either. What are the plans for Biojava? Are any Biopython or Bioruby folks on this list? Any comments from those fronts? -hilmar On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote: > Hi - > > As an enhancement for a future version of BioSQL it would be nice > to have > CLOB rows for abstract and full text (Full text might need to be a > BLOB > depending on format). Obviously they could both be null. > > Alternatively they could be in another table linked to Reference. I > don't > know if it could be done via the term relationship method?? > > Any thoughts? > > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mark.schreiber at novartis.com Fri Jul 14 01:56:13 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 14 Jul 2006 09:56:13 +0800 Subject: [BioSQL-l] Abstracts and Full Text on References Message-ID: Hello - There are no specific plans for biojava although the Reference object could easily be modified to contain String getAbstract() void setAbstract(String abs) etc. I wonder if the full text of an article should be a byte[] or BLOB or a String/ CLOB. Are people more likely to want to store a PDF (usually more available) or a parsed String? - Mark Hilmar Lapp 07/14/2006 12:59 AM To: mark.schreiber at novartis.com cc: biosql-l at open-bio.org Subject: Re: [BioSQL-l] Abstracts and Full Text on References Sounds reasonable to me. Attribute association wouldn't be desirable I think (it would only bloat and overload the value field). The only thing I'd be concerned about is accumulating stuff that is not supported by the language bindings ... i.e., bioperl doesn't support this, and so there isn't a way for bioperl-db to do so either. What are the plans for Biojava? Are any Biopython or Bioruby folks on this list? Any comments from those fronts? -hilmar On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote: > Hi - > > As an enhancement for a future version of BioSQL it would be nice > to have > CLOB rows for abstract and full text (Full text might need to be a > BLOB > depending on format). Obviously they could both be null. > > Alternatively they could be in another table linked to Reference. I > don't > know if it could be done via the term relationship method?? > > Any thoughts? > > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Fri Jul 14 11:24:19 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 14 Jul 2006 07:24:19 -0400 Subject: [BioSQL-l] Abstracts and Full Text on References In-Reply-To: <1152864626.3943.61.camel@texas.ebi.ac.uk> References: <1152864626.3943.61.camel@texas.ebi.ac.uk> Message-ID: <748F3120-1FD3-4DF8-A0D7-EF9EE0414A14@gmx.net> Right. I like this. However, it also suggests to have an additional table. Who knows what other fields one will want to know for an abstract. Also, plenty of references will never have an abstract, e.g. automatic submissions, ontology term references etc. -hilmar On Jul 14, 2006, at 4:10 AM, Richard Holland wrote: > Make it a BLOB and add another column indicating the MIME type of the > BLOB. > > BLOB abstract > VARCHAR abstract_mime_type > > Then if you stored a PDF in it you could set abstract_mime_type to > 'application/x-pdf', or if it was plain text, you could set the > abstract_mime_type to 'text/plain'. > > cheers, > Richard > > On Fri, 2006-07-14 at 09:56 +0800, mark.schreiber at novartis.com wrote: >> Hello - >> >> There are no specific plans for biojava although the Reference object >> could easily be modified to contain >> >> String getAbstract() >> void setAbstract(String abs) >> etc. >> >> I wonder if the full text of an article should be a byte[] or BLOB >> or a >> String/ CLOB. Are people more likely to want to store a PDF >> (usually more >> available) or a parsed String? >> >> - Mark >> >> >> >> >> >> Hilmar Lapp >> 07/14/2006 12:59 AM >> >> >> To: mark.schreiber at novartis.com >> cc: biosql-l at open-bio.org >> Subject: Re: [BioSQL-l] Abstracts and Full Text on >> References >> >> >> Sounds reasonable to me. Attribute association wouldn't be desirable >> I think (it would only bloat and overload the value field). >> >> The only thing I'd be concerned about is accumulating stuff that is >> not supported by the language bindings ... i.e., bioperl doesn't >> support this, and so there isn't a way for bioperl-db to do so >> either. What are the plans for Biojava? >> >> Are any Biopython or Bioruby folks on this list? Any comments from >> those fronts? >> >> -hilmar >> >> On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote: >> >>> Hi - >>> >>> As an enhancement for a future version of BioSQL it would be nice >>> to have >>> CLOB rows for abstract and full text (Full text might need to be a >>> BLOB >>> depending on format). Obviously they could both be null. >>> >>> Alternatively they could be in another table linked to Reference. I >>> don't >>> know if it could be done via the term relationship method?? >>> >>> Any thoughts? >>> >>> - Mark >>> >>> Mark Schreiber >>> Research Investigator (Bioinformatics) >>> >>> Novartis Institute for Tropical Diseases (NITD) >>> 10 Biopolis Road >>> #05-01 Chromos >>> Singapore 138670 >>> www.nitd.novartis.com >>> >>> phone +65 6722 2973 >>> fax +65 6722 2910 >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >> > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From david at autohandle.com Fri Jul 14 17:48:50 2006 From: david at autohandle.com (David Scott) Date: Fri, 14 Jul 2006 10:48:50 -0700 Subject: [BioSQL-l] null title and CRC In-Reply-To: <92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net> References: <92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net> Message-ID: <44B7D902.6040804@autohandle.com> we are currently using "" in the crc calculation for the case where the title is empty (or null) - i can extend that for authors and location - what should we be storing the the table: "", empty, or null? thanks- david p.s. fog for sale: http://www.sfgate.com/liveviews/ Hilmar Lapp wrote: > The CRC for references uses the authors, title, and location > attributes in Bioperl-db, and empty (or null) strings default to the > string "". > > If title is empty and authors and location do not distinguish two > references, then why do you want to have two rows for those > references? Basically, there are identical for all intents and > purposes, or are they not? > > -hilmar > > On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote: > > >> Hi - >> >> We are having a problem in biojava parsing some genbank records that >> contain references with no title. These cannot have a CRC value >> which is >> required in BioSQL. If we make the title an empty string then we >> quickly >> get non-unique CRC numbers. >> >> What does BioPerl do in these cases? >> >> - Mark >> >> Mark Schreiber >> Research Investigator (Bioinformatics) >> >> Novartis Institute for Tropical Diseases (NITD) >> 10 Biopolis Road >> #05-01 Chromos >> Singapore 138670 >> www.nitd.novartis.com >> >> phone +65 6722 2973 >> fax +65 6722 2910 >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> >> > > From hlapp at gmx.net Fri Jul 14 18:31:44 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 14 Jul 2006 14:31:44 -0400 Subject: [BioSQL-l] null title and CRC In-Reply-To: <44B7D902.6040804@autohandle.com> References: <92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net> <44B7D902.6040804@autohandle.com> Message-ID: In the table you store the value of the attribute, not a default that substitutes for it in some calculation. I.e., either null or an empty string, depending on what the value is. (in Oracle an empty string is treated as null.) -hilmar On Jul 14, 2006, at 1:48 PM, David Scott wrote: > we are currently using "" in the crc calculation for the > case where the title is empty (or null) - i can extend that for > authors and location - what should we be storing the the table: > "", empty, or null? > > thanks- > david > > p.s. fog for sale: > http://www.sfgate.com/liveviews/ > > > Hilmar Lapp wrote: >> The CRC for references uses the authors, title, and location >> attributes in Bioperl-db, and empty (or null) strings default to the >> string "". >> >> If title is empty and authors and location do not distinguish two >> references, then why do you want to have two rows for those >> references? Basically, there are identical for all intents and >> purposes, or are they not? >> >> -hilmar >> >> On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote: >> >> >>> Hi - >>> >>> We are having a problem in biojava parsing some genbank records that >>> contain references with no title. These cannot have a CRC value >>> which is >>> required in BioSQL. If we make the title an empty string then we >>> quickly >>> get non-unique CRC numbers. >>> >>> What does BioPerl do in these cases? >>> >>> - Mark >>> >>> Mark Schreiber >>> Research Investigator (Bioinformatics) >>> >>> Novartis Institute for Tropical Diseases (NITD) >>> 10 Biopolis Road >>> #05-01 Chromos >>> Singapore 138670 >>> www.nitd.novartis.com >>> >>> phone +65 6722 2973 >>> fax +65 6722 2910 >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >>> >> > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From david at autohandle.com Fri Jul 14 18:51:18 2006 From: david at autohandle.com (David Scott) Date: Fri, 14 Jul 2006 11:51:18 -0700 Subject: [BioSQL-l] null title and CRC In-Reply-To: References: <92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net> <44B7D902.6040804@autohandle.com> Message-ID: <44B7E7A6.9040300@autohandle.com> ok, then - in the case of genbank: i'm going to try to treat missing titles as null - store them in the object as null - and provide them to the hibernate o/r mapping as null - presumably they will go into the table as null. best- Hilmar Lapp wrote: > In the table you store the value of the attribute, not a default that > substitutes for it in some calculation. I.e., either null or an empty > string, depending on what the value is. (in Oracle an empty string is > treated as null.) > > -hilmar > On Jul 14, 2006, at 1:48 PM, David Scott wrote: > >> we are currently using "" in the crc calculation for the case >> where the title is empty (or null) - i can extend that for authors >> and location - what should we be storing the the table: "", >> empty, or null? >> >> thanks- >> david >> >> p.s. fog for sale: >> http://www.sfgate.com/liveviews/ >> >> >> Hilmar Lapp wrote: >>> The CRC for references uses the authors, title, and location >>> attributes in Bioperl-db, and empty (or null) strings default to the >>> string "". >>> >>> If title is empty and authors and location do not distinguish two >>> references, then why do you want to have two rows for those >>> references? Basically, there are identical for all intents and >>> purposes, or are they not? >>> >>> -hilmar >>> >>> On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote: >>> >>> >>>> Hi - >>>> >>>> We are having a problem in biojava parsing some genbank records that >>>> contain references with no title. These cannot have a CRC value >>>> which is >>>> required in BioSQL. If we make the title an empty string then we >>>> quickly >>>> get non-unique CRC numbers. >>>> >>>> What does BioPerl do in these cases? >>>> >>>> - Mark >>>> >>>> Mark Schreiber >>>> Research Investigator (Bioinformatics) >>>> >>>> Novartis Institute for Tropical Diseases (NITD) >>>> 10 Biopolis Road >>>> #05-01 Chromos >>>> Singapore 138670 >>>> www.nitd.novartis.com >>>> >>>> phone +65 6722 2973 >>>> fax +65 6722 2910 >>>> >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>> >>>> >>> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > From richard.holland at ebi.ac.uk Thu Jul 13 08:14:55 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Thu, 13 Jul 2006 09:14:55 +0100 Subject: [BioSQL-l] Abstracts and Full Text on References In-Reply-To: References: Message-ID: <1152778496.3943.51.camel@texas.ebi.ac.uk> I'd like to enhance that request by asking for individual author records instead of a single string, and a flag indicating the type of publication - eg. journal, book, article, conference paper, etc. On Thu, 2006-07-13 at 13:23 +0800, mark.schreiber at novartis.com wrote: > Hi - > > As an enhancement for a future version of BioSQL it would be nice to have > CLOB rows for abstract and full text (Full text might need to be a BLOB > depending on format). Obviously they could both be null. > > Alternatively they could be in another table linked to Reference. I don't > know if it could be done via the term relationship method?? > > Any thoughts? > > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From richard.holland at ebi.ac.uk Fri Jul 14 08:10:25 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Fri, 14 Jul 2006 09:10:25 +0100 Subject: [BioSQL-l] Abstracts and Full Text on References In-Reply-To: References: Message-ID: <1152864626.3943.61.camel@texas.ebi.ac.uk> Make it a BLOB and add another column indicating the MIME type of the BLOB. BLOB abstract VARCHAR abstract_mime_type Then if you stored a PDF in it you could set abstract_mime_type to 'application/x-pdf', or if it was plain text, you could set the abstract_mime_type to 'text/plain'. cheers, Richard On Fri, 2006-07-14 at 09:56 +0800, mark.schreiber at novartis.com wrote: > Hello - > > There are no specific plans for biojava although the Reference object > could easily be modified to contain > > String getAbstract() > void setAbstract(String abs) > etc. > > I wonder if the full text of an article should be a byte[] or BLOB or a > String/ CLOB. Are people more likely to want to store a PDF (usually more > available) or a parsed String? > > - Mark > > > > > > Hilmar Lapp > 07/14/2006 12:59 AM > > > To: mark.schreiber at novartis.com > cc: biosql-l at open-bio.org > Subject: Re: [BioSQL-l] Abstracts and Full Text on References > > > Sounds reasonable to me. Attribute association wouldn't be desirable > I think (it would only bloat and overload the value field). > > The only thing I'd be concerned about is accumulating stuff that is > not supported by the language bindings ... i.e., bioperl doesn't > support this, and so there isn't a way for bioperl-db to do so > either. What are the plans for Biojava? > > Are any Biopython or Bioruby folks on this list? Any comments from > those fronts? > > -hilmar > > On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote: > > > Hi - > > > > As an enhancement for a future version of BioSQL it would be nice > > to have > > CLOB rows for abstract and full text (Full text might need to be a > > BLOB > > depending on format). Obviously they could both be null. > > > > Alternatively they could be in another table linked to Reference. I > > don't > > know if it could be done via the term relationship method?? > > > > Any thoughts? > > > > - Mark > > > > Mark Schreiber > > Research Investigator (Bioinformatics) > > > > Novartis Institute for Tropical Diseases (NITD) > > 10 Biopolis Road > > #05-01 Chromos > > Singapore 138670 > > www.nitd.novartis.com > > > > phone +65 6722 2973 > > fax +65 6722 2910 > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From richard.holland at ebi.ac.uk Mon Jul 17 08:57:55 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Mon, 17 Jul 2006 09:57:55 +0100 Subject: [BioSQL-l] null title and CRC In-Reply-To: References: <92EF66A4-68EF-4805-8A89-E26CCED80EF4@gmx.net> <44B7D902.6040804@autohandle.com> Message-ID: <1153126675.3957.17.camel@texas.ebi.ac.uk> Sounds good. cheers, Richard On Fri, 2006-07-14 at 14:31 -0400, Hilmar Lapp wrote: > In the table you store the value of the attribute, not a default that > substitutes for it in some calculation. I.e., either null or an empty > string, depending on what the value is. (in Oracle an empty string is > treated as null.) > > > -hilmar > On Jul 14, 2006, at 1:48 PM, David Scott wrote: > > > we are currently using "" in the crc calculation for the case > > where the title is empty (or null) - i can extend that for authors > > and location - what should we be storing the the table: "", > > empty, or null? > > > > thanks- > > david > > > > p.s. fog for sale: > > http://www.sfgate.com/liveviews/ > > > > > > Hilmar Lapp wrote: > > > The CRC for references uses the authors, title, and location > > > attributes in Bioperl-db, and empty (or null) strings default to the > > > string "". > > > > > > If title is empty and authors and location do not distinguish two > > > references, then why do you want to have two rows for those > > > references? Basically, there are identical for all intents and > > > purposes, or are they not? > > > > > > -hilmar > > > > > > On Jul 9, 2006, at 11:03 PM, mark.schreiber at novartis.com wrote: > > > > > > > > > > Hi - > > > > > > > > We are having a problem in biojava parsing some genbank records that > > > > contain references with no title. These cannot have a CRC value > > > > which is > > > > required in BioSQL. If we make the title an empty string then we > > > > quickly > > > > get non-unique CRC numbers. > > > > > > > > What does BioPerl do in these cases? > > > > > > > > - Mark > > > > > > > > Mark Schreiber > > > > Research Investigator (Bioinformatics) > > > > > > > > Novartis Institute for Tropical Diseases (NITD) > > > > 10 Biopolis Road > > > > #05-01 Chromos > > > > Singapore 138670 > > > > www.nitd.novartis.com > > > > > > > > phone +65 6722 2973 > > > > fax +65 6722 2910 > > > > > > > > _______________________________________________ > > > > BioSQL-l mailing list > > > > BioSQL-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > > > > > > > > > > > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > > > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From mark.schreiber at novartis.com Tue Jul 18 20:41:34 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 19 Jul 2006 04:41:34 +0800 Subject: [BioSQL-l] Abstracts and Full Text on References Message-ID: Another table is probably best. Is there a working version of BioSQL 1.1 this can be added to? - Mark Hilmar Lapp 07/14/2006 07:24 PM To: Richard Holland cc: Mark Schreiber , biosql-l at open-bio.org Subject: Re: [BioSQL-l] Abstracts and Full Text on References Right. I like this. However, it also suggests to have an additional table. Who knows what other fields one will want to know for an abstract. Also, plenty of references will never have an abstract, e.g. automatic submissions, ontology term references etc. -hilmar On Jul 14, 2006, at 4:10 AM, Richard Holland wrote: > Make it a BLOB and add another column indicating the MIME type of the > BLOB. > > BLOB abstract > VARCHAR abstract_mime_type > > Then if you stored a PDF in it you could set abstract_mime_type to > 'application/x-pdf', or if it was plain text, you could set the > abstract_mime_type to 'text/plain'. > > cheers, > Richard > > On Fri, 2006-07-14 at 09:56 +0800, mark.schreiber at novartis.com wrote: >> Hello - >> >> There are no specific plans for biojava although the Reference object >> could easily be modified to contain >> >> String getAbstract() >> void setAbstract(String abs) >> etc. >> >> I wonder if the full text of an article should be a byte[] or BLOB >> or a >> String/ CLOB. Are people more likely to want to store a PDF >> (usually more >> available) or a parsed String? >> >> - Mark >> >> >> >> >> >> Hilmar Lapp >> 07/14/2006 12:59 AM >> >> >> To: mark.schreiber at novartis.com >> cc: biosql-l at open-bio.org >> Subject: Re: [BioSQL-l] Abstracts and Full Text on >> References >> >> >> Sounds reasonable to me. Attribute association wouldn't be desirable >> I think (it would only bloat and overload the value field). >> >> The only thing I'd be concerned about is accumulating stuff that is >> not supported by the language bindings ... i.e., bioperl doesn't >> support this, and so there isn't a way for bioperl-db to do so >> either. What are the plans for Biojava? >> >> Are any Biopython or Bioruby folks on this list? Any comments from >> those fronts? >> >> -hilmar >> >> On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote: >> >>> Hi - >>> >>> As an enhancement for a future version of BioSQL it would be nice >>> to have >>> CLOB rows for abstract and full text (Full text might need to be a >>> BLOB >>> depending on format). Obviously they could both be null. >>> >>> Alternatively they could be in another table linked to Reference. I >>> don't >>> know if it could be done via the term relationship method?? >>> >>> Any thoughts? >>> >>> - Mark >>> >>> Mark Schreiber >>> Research Investigator (Bioinformatics) >>> >>> Novartis Institute for Tropical Diseases (NITD) >>> 10 Biopolis Road >>> #05-01 Chromos >>> Singapore 138670 >>> www.nitd.novartis.com >>> >>> phone +65 6722 2973 >>> fax +65 6722 2910 >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >> > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Jul 18 20:50:25 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 18 Jul 2006 16:50:25 -0400 Subject: [BioSQL-l] Abstracts and Full Text on References In-Reply-To: References: Message-ID: <99FEA1E7-8540-46DE-8025-9F34D8026D0C@gmx.net> Yes and no. I was working on one at GNF. I'll have to create this in the repository. -hilmar On Jul 18, 2006, at 4:41 PM, mark.schreiber at novartis.com wrote: > Another table is probably best. > > Is there a working version of BioSQL 1.1 this can be added to? > > - Mark > > > > > > Hilmar Lapp > 07/14/2006 07:24 PM > > > To: Richard Holland > cc: Mark Schreiber , > biosql-l at open-bio.org > Subject: Re: [BioSQL-l] Abstracts and Full Text on > References > > > Right. I like this. However, it also suggests to have an additional > table. Who knows what other fields one will want to know for an > abstract. Also, plenty of references will never have an abstract, > e.g. automatic submissions, ontology term references etc. > > -hilmar > > On Jul 14, 2006, at 4:10 AM, Richard Holland wrote: > >> Make it a BLOB and add another column indicating the MIME type of the >> BLOB. >> >> BLOB abstract >> VARCHAR abstract_mime_type >> >> Then if you stored a PDF in it you could set abstract_mime_type to >> 'application/x-pdf', or if it was plain text, you could set the >> abstract_mime_type to 'text/plain'. >> >> cheers, >> Richard >> >> On Fri, 2006-07-14 at 09:56 +0800, mark.schreiber at novartis.com wrote: >>> Hello - >>> >>> There are no specific plans for biojava although the Reference >>> object >>> could easily be modified to contain >>> >>> String getAbstract() >>> void setAbstract(String abs) >>> etc. >>> >>> I wonder if the full text of an article should be a byte[] or BLOB >>> or a >>> String/ CLOB. Are people more likely to want to store a PDF >>> (usually more >>> available) or a parsed String? >>> >>> - Mark >>> >>> >>> >>> >>> >>> Hilmar Lapp >>> 07/14/2006 12:59 AM >>> >>> >>> To: mark.schreiber at novartis.com >>> cc: biosql-l at open-bio.org >>> Subject: Re: [BioSQL-l] Abstracts and Full Text on >>> References >>> >>> >>> Sounds reasonable to me. Attribute association wouldn't be desirable >>> I think (it would only bloat and overload the value field). >>> >>> The only thing I'd be concerned about is accumulating stuff that is >>> not supported by the language bindings ... i.e., bioperl doesn't >>> support this, and so there isn't a way for bioperl-db to do so >>> either. What are the plans for Biojava? >>> >>> Are any Biopython or Bioruby folks on this list? Any comments from >>> those fronts? >>> >>> -hilmar >>> >>> On Jul 13, 2006, at 1:23 AM, mark.schreiber at novartis.com wrote: >>> >>>> Hi - >>>> >>>> As an enhancement for a future version of BioSQL it would be nice >>>> to have >>>> CLOB rows for abstract and full text (Full text might need to be a >>>> BLOB >>>> depending on format). Obviously they could both be null. >>>> >>>> Alternatively they could be in another table linked to Reference. I >>>> don't >>>> know if it could be done via the term relationship method?? >>>> >>>> Any thoughts? >>>> >>>> - Mark >>>> >>>> Mark Schreiber >>>> Research Investigator (Bioinformatics) >>>> >>>> Novartis Institute for Tropical Diseases (NITD) >>>> 10 Biopolis Road >>>> #05-01 Chromos >>>> Singapore 138670 >>>> www.nitd.novartis.com >>>> >>>> phone +65 6722 2973 >>>> fax +65 6722 2910 >>>> >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>> >>> >> -- >> Richard Holland (BioMart Team) >> EMBL-EBI >> Wellcome Trust Genome Campus >> Hinxton >> Cambridge CB10 1SD >> UNITED KINGDOM >> Tel: +44-(0)1223-494416 >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : ===========================================================