From cjfields at uiuc.edu Sun Sep 2 19:52:40 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 2 Sep 2007 18:52:40 -0500 Subject: [BioSQL-l] recursion issues with bioperl-db Message-ID: <38D1DDAE-59DE-4A7B-B799-47E1DD506484@uiuc.edu> I noticed some critical recursion issues with bioperl-db when working in Bio::Ontology changes. This was using bioperl-live (post-feature/ annotation fixes). Bug report is here: http://bugzilla.open-bio.org/show_bug.cgi?id=2355 It seems to be Bio:Taxon related; this is from 03swiss.t: --------------------- WARNING --------------------- MSG: recursion detected for Bio::Taxon object STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_process_child /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ BasePersistenceAdaptor.pm:681 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_create_persistent /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ BasePersistenceAdaptor.pm:630 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_process_child /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ BasePersistenceAdaptor.pm:692 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_create_persistent /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ BasePersistenceAdaptor.pm:630 ... /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ BasePersistenceAdaptor.pm:587 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ BasePersistenceAdaptor.pm:253 STACK Bio::DB::BioSQL::PrimarySeqAdaptor::store_children /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ PrimarySeqAdaptor.pm:229 STACK Bio::DB::BioSQL::SeqAdaptor::store_children /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ SeqAdaptor.pm:217 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ BasePersistenceAdaptor.pm:214 STACK Bio::DB::Persistent::PersistentObject::create /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/Persistent/ PersistentObject.pm:244 STACK toplevel t/04swiss.t:36 --------------------------------------------------- Also, seeing this with 13remove.t and 15.cluster.t, both of which appear to infinitely recurse: Deep recursion on subroutine "Bio::DB::BioSQL::BasePersistenceAdaptor::_create_persistent" at /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ BasePersistenceAdaptor.pm line 587, line 1. Deep recursion on subroutine "Bio::DB::BioSQL::BasePersistenceAdaptor::_process_child" at /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ BasePersistenceAdaptor.pm line 630, line 1. chris From cjfields at uiuc.edu Sun Sep 2 21:40:48 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 2 Sep 2007 20:40:48 -0500 Subject: [BioSQL-l] [Bioperl-l] recursion issues with bioperl-db In-Reply-To: <2E14450C-C135-42DD-A9DE-EB47EB80E6AC@uiuc.edu> References: <2E14450C-C135-42DD-A9DE-EB47EB80E6AC@uiuc.edu> Message-ID: <25CFD36D-D921-4F5F-BADF-D858A2FE76D4@uiuc.edu> Okay, we can the previous posts! Odd, but I started from scratch and can't reproduce the issue; there may have been some cross-talk with different bioperl installations on my laptop. Anyway, everything passes now w/o recursion so I'll mark the bug as invalid. chris On Sep 2, 2007, at 6:57 PM, Chris Fields wrote: > Apologies if you get this more than once; the first post appeared to > get sent w/o a proper subject line. Posted this to biosql-l already > but felt it needed posting here as well. > > I noticed some critical recursion issues with bioperl-db when working > in Bio::Ontology changes. This was using bioperl-live (post-feature/ > annotation fixes). Bug report is here: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2355 > > It seems to be Bio:Taxon related; this is from 03swiss.t: > > --------------------- WARNING --------------------- > MSG: recursion detected for Bio::Taxon object > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_process_child > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:681 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_create_persistent > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:630 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_process_child > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:692 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_create_persistent > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:630 > ... > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:587 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:253 > STACK Bio::DB::BioSQL::PrimarySeqAdaptor::store_children > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > PrimarySeqAdaptor.pm:229 > STACK Bio::DB::BioSQL::SeqAdaptor::store_children > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > SeqAdaptor.pm:217 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:214 > STACK Bio::DB::Persistent::PersistentObject::create > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/Persistent/ > PersistentObject.pm:244 > STACK toplevel t/04swiss.t:36 > --------------------------------------------------- > > Also, seeing this with 13remove.t and 15.cluster.t, both of which > appear to infinitely recurse: > > Deep recursion on subroutine > "Bio::DB::BioSQL::BasePersistenceAdaptor::_create_persistent" at > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm > line 587, line 1. > Deep recursion on subroutine > "Bio::DB::BioSQL::BasePersistenceAdaptor::_process_child" at > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm > line 630, line 1. > > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From paul.joseph.davis at gmail.com Tue Sep 11 10:17:39 2007 From: paul.joseph.davis at gmail.com (Paul Davis) Date: Tue, 11 Sep 2007 10:17:39 -0400 Subject: [BioSQL-l] Description Message-ID: I've been going over the biosql schema and I was wondering if there was a good place to read about examples of actual data that goes into each table. Specifically, I'm a bit confused about which parts of a genbank record go in which tables. Thanks, Paul Davis From holland at ebi.ac.uk Tue Sep 11 10:54:50 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 11 Sep 2007 15:54:50 +0100 Subject: [BioSQL-l] Description In-Reply-To: References: Message-ID: <46E6AC3A.5000203@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 There is no formal specification for what goes where in BioSQL, but you can refer to the BioJava documentation for a good approximation of where a GenBank file should end up. The BioJava objects share similar names to the BioSQL tables and are mapped using Hibernate. The most useful parts of the docs are probably: http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank and: http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object-relational_mappings. cheers, Richard Paul Davis wrote: > I've been going over the biosql schema and I was wondering if there > was a good place to read about examples of actual data that goes into > each table. Specifically, I'm a bit confused about which parts of a > genbank record go in which tables. > > Thanks, > Paul Davis > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd Q8i8g2bUyB17L++fuSKXa+0= =q8G2 -----END PGP SIGNATURE----- From cjfields at uiuc.edu Tue Sep 11 11:10:37 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 11 Sep 2007 10:10:37 -0500 Subject: [BioSQL-l] Description In-Reply-To: <46E6AC3A.5000203@ebi.ac.uk> References: <46E6AC3A.5000203@ebi.ac.uk> Message-ID: <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> Here's a question I couldn't find the answer to: should any BioSQL- loaded data (via BioJava, BioPerl, etc) be expected to fully round trip across any BioSQL-utilizing language? In other words, if I use BioJava/Hibernate to load sequence data in to a BioSQL database and use BioPerl to work with the data, can one expect it to work? My guess is no, as long as there is no formal specification... chris On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > There is no formal specification for what goes where in BioSQL, but > you > can refer to the BioJava documentation for a good approximation of > where > a GenBank file should end up. The BioJava objects share similar > names to > the BioSQL tables and are mapped using Hibernate. > > The most useful parts of the docs are probably: > > http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank > > and: > > http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- > relational_mappings. > > cheers, > Richard > > Paul Davis wrote: >> I've been going over the biosql schema and I was wondering if there >> was a good place to read about examples of actual data that goes into >> each table. Specifically, I'm a bit confused about which parts of a >> genbank record go in which tables. >> >> Thanks, >> Paul Davis >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd > Q8i8g2bUyB17L++fuSKXa+0= > =q8G2 > -----END PGP SIGNATURE----- > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Sep 11 11:38:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 11 Sep 2007 10:38:45 -0500 Subject: [BioSQL-l] Description In-Reply-To: <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> Message-ID: <4A365ABF-1C1E-46B0-A200-D2E3C0BC72B5@uiuc.edu> D'oh, it's in the README! .... This is the BioSQL distribution. BioSQL is a generic unifying schema for storing sequences from different sources, for instance Genbank or Swissprot. BioSQL is meant to be a common data storage layer supported by all the different Bio* projects, Bioperl, Biojava, Biopython, and Bioruby. Entries stored through an application written in, say, Bioperl could be retrieved by another written in Biojava. chris On Sep 11, 2007, at 10:10 AM, Chris Fields wrote: > Here's a question I couldn't find the answer to: should any BioSQL- > loaded data (via BioJava, BioPerl, etc) be expected to fully round > trip across any BioSQL-utilizing language? In other words, if I use > BioJava/Hibernate to load sequence data in to a BioSQL database and > use BioPerl to work with the data, can one expect it to work? > > My guess is no, as long as there is no formal specification... > > chris > > On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> There is no formal specification for what goes where in BioSQL, but >> you >> can refer to the BioJava documentation for a good approximation of >> where >> a GenBank file should end up. The BioJava objects share similar >> names to >> the BioSQL tables and are mapped using Hibernate. >> >> The most useful parts of the docs are probably: >> >> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank >> >> and: >> >> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- >> relational_mappings. >> >> cheers, >> Richard >> >> Paul Davis wrote: >>> I've been going over the biosql schema and I was wondering if there >>> was a good place to read about examples of actual data that goes >>> into >>> each table. Specifically, I'm a bit confused about which parts of a >>> genbank record go in which tables. >>> >>> Thanks, >>> Paul Davis >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.2.2 (GNU/Linux) >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> >> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd >> Q8i8g2bUyB17L++fuSKXa+0= >> =q8G2 >> -----END PGP SIGNATURE----- >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From barry.moore at genetics.utah.edu Tue Sep 11 11:49:45 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Tue, 11 Sep 2007 09:49:45 -0600 Subject: [BioSQL-l] Description In-Reply-To: <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> Message-ID: <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu> Well, the schema is the formal specification as to what goes where and as long as your BioJava and BioPerl DB interface plays by the rules of the schema, then yes you should be able to use both languages on the same database. Of course the devil is in the details and since I've only worked with the BioPerl interface I don't know if that is in fact reality right now. I think what Richard meant was there is not detailed human documentation about where each bit of a GenBank record goes into what table and column. Paul, I think you will find this document to be what you are looking for - or at least as good as you'll get: go to http://cvs.open-bio.org/cgi- bin/viewcvs/viewcvs.cgi/biosql-schema/doc/?cvsroot=biosql and look for schema-overview.txt. There is also a ERD in pdf format which can help you get your head around the schema. If you end up with specific questions about what's where, send another e-mail or just load some files and go exploring. Barry On Sep 11, 2007, at 9:10 AM, Chris Fields wrote: > Here's a question I couldn't find the answer to: should any BioSQL- > loaded data (via BioJava, BioPerl, etc) be expected to fully round > trip across any BioSQL-utilizing language? In other words, if I use > BioJava/Hibernate to load sequence data in to a BioSQL database and > use BioPerl to work with the data, can one expect it to work? > > My guess is no, as long as there is no formal specification... > > chris > > On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> There is no formal specification for what goes where in BioSQL, but >> you >> can refer to the BioJava documentation for a good approximation of >> where >> a GenBank file should end up. The BioJava objects share similar >> names to >> the BioSQL tables and are mapped using Hibernate. >> >> The most useful parts of the docs are probably: >> >> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank >> >> and: >> >> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- >> relational_mappings. >> >> cheers, >> Richard >> >> Paul Davis wrote: >>> I've been going over the biosql schema and I was wondering if there >>> was a good place to read about examples of actual data that goes >>> into >>> each table. Specifically, I'm a bit confused about which parts of a >>> genbank record go in which tables. >>> >>> Thanks, >>> Paul Davis >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.2.2 (GNU/Linux) >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> >> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd >> Q8i8g2bUyB17L++fuSKXa+0= >> =q8G2 >> -----END PGP SIGNATURE----- >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From cjfields at uiuc.edu Tue Sep 11 12:16:08 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 11 Sep 2007 11:16:08 -0500 Subject: [BioSQL-l] TAXON,TAXON_NAME, was Re: Description In-Reply-To: <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu> References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu> Message-ID: I think one area of possible headache will be TAXON/TAXON_NAME. For instance, with BioPerl we kept running into genus/species parsing problems (virus, bacterial names) when going from seqrecord->object. Due to that we decided to greatly simplify Species parsing in Bioperl so there isn't any 'guessing' as to genus/species names; you get what's already there, nothing more. If one wants extra taxonomic information then one must use NCBI Taxonomy somehow. However, currently bioperl-db still splits into genus/species (acts like older BioPerl), which obviously clashes with current Bioperl behavior. Not sure how the other Bio* store this data; Richard? There is a BioPerl bug filed on this: http://bugzilla.open-bio.org/show_bug.cgi?id=2092 chris On Sep 11, 2007, at 10:49 AM, Barry Moore wrote: > Well, the schema is the formal specification as to what goes where > and as long as your BioJava and BioPerl DB interface plays by the > rules of the schema, then yes you should be able to use both > languages on the same database. Of course the devil is in the > details and since I've only worked with the BioPerl interface I > don't know if that is in fact reality right now. I think what > Richard meant was there is not detailed human documentation about > where each bit of a GenBank record goes into what table and > column. Paul, I think you will find this document to be what you > are looking for - or at least as good as you'll get: go to http:// > cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/doc/? > cvsroot=biosql and look for schema-overview.txt. There is also a > ERD in pdf format which can help you get your head around the > schema. If you end up with specific questions about what's where, > send another e-mail or just load some files and go exploring. > > Barry > > On Sep 11, 2007, at 9:10 AM, Chris Fields wrote: > >> Here's a question I couldn't find the answer to: should any BioSQL- >> loaded data (via BioJava, BioPerl, etc) be expected to fully round >> trip across any BioSQL-utilizing language? In other words, if I use >> BioJava/Hibernate to load sequence data in to a BioSQL database and >> use BioPerl to work with the data, can one expect it to work? >> >> My guess is no, as long as there is no formal specification... >> >> chris >> >> On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: >> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> There is no formal specification for what goes where in BioSQL, but >>> you >>> can refer to the BioJava documentation for a good approximation of >>> where >>> a GenBank file should end up. The BioJava objects share similar >>> names to >>> the BioSQL tables and are mapped using Hibernate. >>> >>> The most useful parts of the docs are probably: >>> >>> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank >>> >>> and: >>> >>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- >>> relational_mappings. >>> >>> cheers, >>> Richard >>> >>> Paul Davis wrote: >>>> I've been going over the biosql schema and I was wondering if there >>>> was a good place to read about examples of actual data that goes >>>> into >>>> each table. Specifically, I'm a bit confused about which parts of a >>>> genbank record go in which tables. >>>> >>>> Thanks, >>>> Paul Davis >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>> >>> -----BEGIN PGP SIGNATURE----- >>> Version: GnuPG v1.4.2.2 (GNU/Linux) >>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >>> >>> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd >>> Q8i8g2bUyB17L++fuSKXa+0= >>> =q8G2 >>> -----END PGP SIGNATURE----- >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From paul.joseph.davis at gmail.com Tue Sep 11 12:59:37 2007 From: paul.joseph.davis at gmail.com (Paul Davis) Date: Tue, 11 Sep 2007 12:59:37 -0400 Subject: [BioSQL-l] Description In-Reply-To: <4A365ABF-1C1E-46B0-A200-D2E3C0BC72B5@uiuc.edu> References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> <4A365ABF-1C1E-46B0-A200-D2E3C0BC72B5@uiuc.edu> Message-ID: On 9/11/07, Chris Fields wrote: > D'oh, it's in the README! > > .... > > This is the BioSQL distribution. BioSQL is a generic unifying schema > for storing sequences from different sources, for instance Genbank or > Swissprot. > > BioSQL is meant to be a common data storage layer supported by all the > different Bio* projects, Bioperl, Biojava, Biopython, and Bioruby. > Entries stored through an application written in, say, Bioperl could > be retrieved by another written in Biojava. > > chris > I think this was the underlying idea behind standardizing the schema. Granted I haven't groked all of the package's DB access layers, but given what I've seen in the BioPython interface, I'm guessing this is more of a theoretical possibility vs. a standard practice. > On Sep 11, 2007, at 10:10 AM, Chris Fields wrote: > > > Here's a question I couldn't find the answer to: should any BioSQL- > > loaded data (via BioJava, BioPerl, etc) be expected to fully round > > trip across any BioSQL-utilizing language? In other words, if I use > > BioJava/Hibernate to load sequence data in to a BioSQL database and > > use BioPerl to work with the data, can one expect it to work? > > > > My guess is no, as long as there is no formal specification... > > > > chris > > > > On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: > > > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> There is no formal specification for what goes where in BioSQL, but > >> you > >> can refer to the BioJava documentation for a good approximation of > >> where > >> a GenBank file should end up. The BioJava objects share similar > >> names to > >> the BioSQL tables and are mapped using Hibernate. > >> > >> The most useful parts of the docs are probably: > >> > >> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank > >> > >> and: > >> > >> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- > >> relational_mappings. > >> > >> cheers, > >> Richard > >> > >> Paul Davis wrote: > >>> I've been going over the biosql schema and I was wondering if there > >>> was a good place to read about examples of actual data that goes > >>> into > >>> each table. Specifically, I'm a bit confused about which parts of a > >>> genbank record go in which tables. > >>> > >>> Thanks, > >>> Paul Davis > >>> _______________________________________________ > >>> BioSQL-l mailing list > >>> BioSQL-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >>> > >> -----BEGIN PGP SIGNATURE----- > >> Version: GnuPG v1.4.2.2 (GNU/Linux) > >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > >> > >> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd > >> Q8i8g2bUyB17L++fuSKXa+0= > >> =q8G2 > >> -----END PGP SIGNATURE----- > >> _______________________________________________ > >> BioSQL-l mailing list > >> BioSQL-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > From holland at ebi.ac.uk Wed Sep 12 03:32:42 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 12 Sep 2007 08:32:42 +0100 Subject: [BioSQL-l] TAXON,TAXON_NAME, was Re: Description In-Reply-To: References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu> Message-ID: <46E7961A.8090709@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 We use the taxon table to store taxon information. See: http://biojava.org/wiki/BioJava:BioJavaXDocs#NCBI_Taxonomy. Each RichSequence object then gets an NCBITaxon object associated with it using set/getTaxon(). For Genbank this is always parsed from the appropriate entry in the feature table - the Organism and Species lines are ignored. cheers, Richard Chris Fields wrote: > I think one area of possible headache will be TAXON/TAXON_NAME. For > instance, with BioPerl we kept running into genus/species parsing > problems (virus, bacterial names) when going from seqrecord->object. > Due to that we decided to greatly simplify Species parsing in Bioperl so > there isn't any 'guessing' as to genus/species names; you get what's > already there, nothing more. If one wants extra taxonomic information > then one must use NCBI Taxonomy somehow. > > However, currently bioperl-db still splits into genus/species (acts like > older BioPerl), which obviously clashes with current Bioperl behavior. > Not sure how the other Bio* store this data; Richard? > > There is a BioPerl bug filed on this: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2092 > > chris > > On Sep 11, 2007, at 10:49 AM, Barry Moore wrote: > >> Well, the schema is the formal specification as to what goes where and >> as long as your BioJava and BioPerl DB interface plays by the rules of >> the schema, then yes you should be able to use both languages on the >> same database. Of course the devil is in the details and since I've >> only worked with the BioPerl interface I don't know if that is in fact >> reality right now. I think what Richard meant was there is not >> detailed human documentation about where each bit of a GenBank record >> goes into what table and column. Paul, I think you will find this >> document to be what you are looking for - or at least as good as >> you'll get: go to >> http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/doc/?cvsroot=biosql >> and look for schema-overview.txt. There is also a ERD in pdf format >> which can help you get your head around the schema. If you end up >> with specific questions about what's where, send another e-mail or >> just load some files and go exploring. >> >> Barry >> >> On Sep 11, 2007, at 9:10 AM, Chris Fields wrote: >> >>> Here's a question I couldn't find the answer to: should any BioSQL- >>> loaded data (via BioJava, BioPerl, etc) be expected to fully round >>> trip across any BioSQL-utilizing language? In other words, if I use >>> BioJava/Hibernate to load sequence data in to a BioSQL database and >>> use BioPerl to work with the data, can one expect it to work? >>> >>> My guess is no, as long as there is no formal specification... >>> >>> chris >>> >>> On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: >>> > There is no formal specification for what goes where in BioSQL, but > you > can refer to the BioJava documentation for a good approximation of > where > a GenBank file should end up. The BioJava objects share similar > names to > the BioSQL tables and are mapped using Hibernate. > > The most useful parts of the docs are probably: > > http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank > > and: > > http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- > relational_mappings. > > cheers, > Richard > > Paul Davis wrote: >>>>>> I've been going over the biosql schema and I was wondering if there >>>>>> was a good place to read about examples of actual data that goes into >>>>>> each table. Specifically, I'm a bit confused about which parts of a >>>>>> genbank record go in which tables. >>>>>> >>>>>> Thanks, >>>>>> Paul Davis >>>>>> _______________________________________________ >>>>>> BioSQL-l mailing list >>>>>> BioSQL-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>>>> _______________________________________________ BioSQL-l mailing list BioSQL-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG55YZ4C5LeMEKA/QRAg7wAJwPa7GXHKSdaYVHrk9a3JM8GhLIHwCeLRSq jaQ6oAARv+oOpuaeBhNSA2U= =xc8y -----END PGP SIGNATURE----- From hlapp at gmx.net Wed Sep 12 19:01:28 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 12 Sep 2007 19:01:28 -0400 Subject: [BioSQL-l] Description In-Reply-To: <46E6AC3A.5000203@ebi.ac.uk> References: <46E6AC3A.5000203@ebi.ac.uk> Message-ID: <364A0795-3399-4F2C-A292-23BA5AA9F899@gmx.net> On Sep 11, 2007, at 10:54 AM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > There is no formal specification for what goes where in BioSQL, Indeed there isn't a formal specification in text. To understand this it may be worth keeping in mind that the historic, and still in a sense primary, use-case of BioSQL is to be the common persistence API for the Bio* projects. Hence, what is relatively well defined is how to map a Bio* object model (in particular, BioPerl's - and meanwhile Biojava's - object model) into BioSQL and back. Where a particular piece of a GenBank file ends up in BioSQL would therefore depend on where it ends up in the respective object model, strictly speaking. Since this doesn't bode well for interoperability between the toolkits (which was one of the points of having BioSQL) Richard, Mark Schreiber, and I got together 2 years ago to reconcile BioPerl's and Biojava's way of ingesting and representing a richly annotated sequence, leading to the RichSeq work being added to Biojava (correct me Richard if I'm confusing things). So in theory, at least meanwhile BioPerl and Biojava should map a GenBank sequence to BioSQL in a very similar or ideally identical way, so I'm not sure this has ever been put to the test. I'm not aware of a similar effort that has been undertaken on the end of Biopython, though I'd be more than happy to work with anyone from the Biopython community who is interested in resolving this. Given the recent Bio.SeqIO work there, this may be a good time to take this up. -hilmar > but you can refer to the BioJava documentation for a good > approximation of where > a GenBank file should end up. The BioJava objects share similar > names to > the BioSQL tables and are mapped using Hibernate. > > The most useful parts of the docs are probably: > > http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank > > and: > > http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- > relational_mappings. > > cheers, > Richard -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Sep 12 19:05:12 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 12 Sep 2007 19:05:12 -0400 Subject: [BioSQL-l] Description In-Reply-To: <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> Message-ID: <3E63A460-6C83-43E2-984C-39D7A25F7D74@gmx.net> On Sep 11, 2007, at 11:10 AM, Chris Fields wrote: > Here's a question I couldn't find the answer to: should any BioSQL- > loaded data (via BioJava, BioPerl, etc) be expected to fully round > trip across any BioSQL-utilizing language? In other words, if I use > BioJava/Hibernate to load sequence data in to a BioSQL database and > use BioPerl to work with the data, can one expect it to work? In theory yes. In practice, there's hasn't been a great effort of writing tests and working out the kinks until this is really true. Though minor differences are easily possible, I'd be surprised though if there are still huge incompatibilities between how Biojava and BioPerl store things, given the biojavax work of Mark and Richard. I don't know how big the differences would be for Biopython or BioRuby, though, and between those two. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Sep 12 19:15:42 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 12 Sep 2007 19:15:42 -0400 Subject: [BioSQL-l] TAXON,TAXON_NAME, was Re: Description In-Reply-To: References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu> Message-ID: The species/taxon handling shouldn't be a problem if you have the NCBI taxonID and have preloaded the NCBI taxonomy. However, if it's a new species (i.e., the lookup of the NCBI taxonID in the taxon table fails), then bioperl-db tries to create the lineage based on what it finds in the species object. As the bug report says, the issue can be fixed, but it also looks like the fix will break compatibility with earlier versions of BioPerl. I think at some point that's fine, but I was wondering whether that's the way it needs to be. -hilmar On Sep 11, 2007, at 12:16 PM, Chris Fields wrote: > I think one area of possible headache will be TAXON/TAXON_NAME. For > instance, with BioPerl we kept running into genus/species parsing > problems (virus, bacterial names) when going from seqrecord->object. > Due to that we decided to greatly simplify Species parsing in Bioperl > so there isn't any 'guessing' as to genus/species names; you get > what's already there, nothing more. If one wants extra taxonomic > information then one must use NCBI Taxonomy somehow. > > However, currently bioperl-db still splits into genus/species (acts > like older BioPerl), which obviously clashes with current Bioperl > behavior. Not sure how the other Bio* store this data; Richard? > > There is a BioPerl bug filed on this: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2092 > > chris > > On Sep 11, 2007, at 10:49 AM, Barry Moore wrote: > >> Well, the schema is the formal specification as to what goes where >> and as long as your BioJava and BioPerl DB interface plays by the >> rules of the schema, then yes you should be able to use both >> languages on the same database. Of course the devil is in the >> details and since I've only worked with the BioPerl interface I >> don't know if that is in fact reality right now. I think what >> Richard meant was there is not detailed human documentation about >> where each bit of a GenBank record goes into what table and >> column. Paul, I think you will find this document to be what you >> are looking for - or at least as good as you'll get: go to http:// >> cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/doc/? >> cvsroot=biosql and look for schema-overview.txt. There is also a >> ERD in pdf format which can help you get your head around the >> schema. If you end up with specific questions about what's where, >> send another e-mail or just load some files and go exploring. >> >> Barry >> >> On Sep 11, 2007, at 9:10 AM, Chris Fields wrote: >> >>> Here's a question I couldn't find the answer to: should any BioSQL- >>> loaded data (via BioJava, BioPerl, etc) be expected to fully round >>> trip across any BioSQL-utilizing language? In other words, if I use >>> BioJava/Hibernate to load sequence data in to a BioSQL database and >>> use BioPerl to work with the data, can one expect it to work? >>> >>> My guess is no, as long as there is no formal specification... >>> >>> chris >>> >>> On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: >>> >>>> -----BEGIN PGP SIGNED MESSAGE----- >>>> Hash: SHA1 >>>> >>>> There is no formal specification for what goes where in BioSQL, but >>>> you >>>> can refer to the BioJava documentation for a good approximation of >>>> where >>>> a GenBank file should end up. The BioJava objects share similar >>>> names to >>>> the BioSQL tables and are mapped using Hibernate. >>>> >>>> The most useful parts of the docs are probably: >>>> >>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank >>>> >>>> and: >>>> >>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- >>>> relational_mappings. >>>> >>>> cheers, >>>> Richard >>>> >>>> Paul Davis wrote: >>>>> I've been going over the biosql schema and I was wondering if >>>>> there >>>>> was a good place to read about examples of actual data that goes >>>>> into >>>>> each table. Specifically, I'm a bit confused about which parts >>>>> of a >>>>> genbank record go in which tables. >>>>> >>>>> Thanks, >>>>> Paul Davis >>>>> _______________________________________________ >>>>> BioSQL-l mailing list >>>>> BioSQL-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>>> >>>> -----BEGIN PGP SIGNATURE----- >>>> Version: GnuPG v1.4.2.2 (GNU/Linux) >>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >>>> >>>> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd >>>> Q8i8g2bUyB17L++fuSKXa+0= >>>> =q8G2 >>>> -----END PGP SIGNATURE----- >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From paul.joseph.davis at gmail.com Wed Sep 12 20:13:31 2007 From: paul.joseph.davis at gmail.com (Paul Davis) Date: Wed, 12 Sep 2007 20:13:31 -0400 Subject: [BioSQL-l] TAXON,TAXON_NAME, was Re: Description In-Reply-To: References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu>

Message-ID: I glanced through the bioperl cvs a bit but couldn't find the part where it tries to load a new taxonomy name. Does this go and try to rebuild the nested sets information, or basically leave any inserted taxonomic data (non-NCBI data) as nodes dangling outside the nested sets information? Paul On 9/12/07, Hilmar Lapp wrote: > The species/taxon handling shouldn't be a problem if you have the > NCBI taxonID and have preloaded the NCBI taxonomy. > > However, if it's a new species (i.e., the lookup of the NCBI taxonID > in the taxon table fails), then bioperl-db tries to create the > lineage based on what it finds in the species object. > > As the bug report says, the issue can be fixed, but it also looks > like the fix will break compatibility with earlier versions of > BioPerl. I think at some point that's fine, but I was wondering > whether that's the way it needs to be. > > -hilmar > > On Sep 11, 2007, at 12:16 PM, Chris Fields wrote: > > > I think one area of possible headache will be TAXON/TAXON_NAME. For > > instance, with BioPerl we kept running into genus/species parsing > > problems (virus, bacterial names) when going from seqrecord->object. > > Due to that we decided to greatly simplify Species parsing in Bioperl > > so there isn't any 'guessing' as to genus/species names; you get > > what's already there, nothing more. If one wants extra taxonomic > > information then one must use NCBI Taxonomy somehow. > > > > However, currently bioperl-db still splits into genus/species (acts > > like older BioPerl), which obviously clashes with current Bioperl > > behavior. Not sure how the other Bio* store this data; Richard? > > > > There is a BioPerl bug filed on this: > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=2092 > > > > chris > > > > On Sep 11, 2007, at 10:49 AM, Barry Moore wrote: > > > >> Well, the schema is the formal specification as to what goes where > >> and as long as your BioJava and BioPerl DB interface plays by the > >> rules of the schema, then yes you should be able to use both > >> languages on the same database. Of course the devil is in the > >> details and since I've only worked with the BioPerl interface I > >> don't know if that is in fact reality right now. I think what > >> Richard meant was there is not detailed human documentation about > >> where each bit of a GenBank record goes into what table and > >> column. Paul, I think you will find this document to be what you > >> are looking for - or at least as good as you'll get: go to http:// > >> cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/doc/? > >> cvsroot=biosql and look for schema-overview.txt. There is also a > >> ERD in pdf format which can help you get your head around the > >> schema. If you end up with specific questions about what's where, > >> send another e-mail or just load some files and go exploring. > >> > >> Barry > >> > >> On Sep 11, 2007, at 9:10 AM, Chris Fields wrote: > >> > >>> Here's a question I couldn't find the answer to: should any BioSQL- > >>> loaded data (via BioJava, BioPerl, etc) be expected to fully round > >>> trip across any BioSQL-utilizing language? In other words, if I use > >>> BioJava/Hibernate to load sequence data in to a BioSQL database and > >>> use BioPerl to work with the data, can one expect it to work? > >>> > >>> My guess is no, as long as there is no formal specification... > >>> > >>> chris > >>> > >>> On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: > >>> > >>>> -----BEGIN PGP SIGNED MESSAGE----- > >>>> Hash: SHA1 > >>>> > >>>> There is no formal specification for what goes where in BioSQL, but > >>>> you > >>>> can refer to the BioJava documentation for a good approximation of > >>>> where > >>>> a GenBank file should end up. The BioJava objects share similar > >>>> names to > >>>> the BioSQL tables and are mapped using Hibernate. > >>>> > >>>> The most useful parts of the docs are probably: > >>>> > >>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank > >>>> > >>>> and: > >>>> > >>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- > >>>> relational_mappings. > >>>> > >>>> cheers, > >>>> Richard > >>>> > >>>> Paul Davis wrote: > >>>>> I've been going over the biosql schema and I was wondering if > >>>>> there > >>>>> was a good place to read about examples of actual data that goes > >>>>> into > >>>>> each table. Specifically, I'm a bit confused about which parts > >>>>> of a > >>>>> genbank record go in which tables. > >>>>> > >>>>> Thanks, > >>>>> Paul Davis > >>>>> _______________________________________________ > >>>>> BioSQL-l mailing list > >>>>> BioSQL-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >>>>> > >>>> -----BEGIN PGP SIGNATURE----- > >>>> Version: GnuPG v1.4.2.2 (GNU/Linux) > >>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > >>>> > >>>> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd > >>>> Q8i8g2bUyB17L++fuSKXa+0= > >>>> =q8G2 > >>>> -----END PGP SIGNATURE----- > >>>> _______________________________________________ > >>>> BioSQL-l mailing list > >>>> BioSQL-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >>> > >>> Christopher Fields > >>> Postdoctoral Researcher > >>> Lab of Dr. Robert Switzer > >>> Dept of Biochemistry > >>> University of Illinois Urbana-Champaign > >>> > >>> > >>> > >>> _______________________________________________ > >>> BioSQL-l mailing list > >>> BioSQL-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >> > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > From hlapp at gmx.net Wed Sep 12 20:19:24 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 12 Sep 2007 20:19:24 -0400 Subject: [BioSQL-l] TAXON,TAXON_NAME, was Re: Description In-Reply-To: References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu>

Message-ID: <96E80B9C-2F46-43E0-80E4-30160AEDABD7@gmx.net> The code is in bioperl-db (which is a sub-repository of bioperl, as is bioperl-live). It makes no attempt at updating the nested-set values. That raises a good point - there is currently no script that would update that; the load_ncbi_taxonomy.pl script does recompute it, but will also want to load or update the NCBI taxonomy. It should be relatively easy to factor out the nested-set computing code into a separate stand-alone script. -hilmar On Sep 12, 2007, at 8:13 PM, Paul Davis wrote: > I glanced through the bioperl cvs a bit but couldn't find the part > where it tries to load a new taxonomy name. Does this go and try to > rebuild the nested sets information, or basically leave any inserted > taxonomic data (non-NCBI data) as nodes dangling outside the nested > sets information? > > Paul > > On 9/12/07, Hilmar Lapp wrote: >> The species/taxon handling shouldn't be a problem if you have the >> NCBI taxonID and have preloaded the NCBI taxonomy. >> >> However, if it's a new species (i.e., the lookup of the NCBI taxonID >> in the taxon table fails), then bioperl-db tries to create the >> lineage based on what it finds in the species object. >> >> As the bug report says, the issue can be fixed, but it also looks >> like the fix will break compatibility with earlier versions of >> BioPerl. I think at some point that's fine, but I was wondering >> whether that's the way it needs to be. >> >> -hilmar >> >> On Sep 11, 2007, at 12:16 PM, Chris Fields wrote: >> >>> I think one area of possible headache will be TAXON/TAXON_NAME. For >>> instance, with BioPerl we kept running into genus/species parsing >>> problems (virus, bacterial names) when going from seqrecord->object. >>> Due to that we decided to greatly simplify Species parsing in >>> Bioperl >>> so there isn't any 'guessing' as to genus/species names; you get >>> what's already there, nothing more. If one wants extra taxonomic >>> information then one must use NCBI Taxonomy somehow. >>> >>> However, currently bioperl-db still splits into genus/species (acts >>> like older BioPerl), which obviously clashes with current Bioperl >>> behavior. Not sure how the other Bio* store this data; Richard? >>> >>> There is a BioPerl bug filed on this: >>> >>> http://bugzilla.open-bio.org/show_bug.cgi?id=2092 >>> >>> chris >>> >>> On Sep 11, 2007, at 10:49 AM, Barry Moore wrote: >>> >>>> Well, the schema is the formal specification as to what goes where >>>> and as long as your BioJava and BioPerl DB interface plays by the >>>> rules of the schema, then yes you should be able to use both >>>> languages on the same database. Of course the devil is in the >>>> details and since I've only worked with the BioPerl interface I >>>> don't know if that is in fact reality right now. I think what >>>> Richard meant was there is not detailed human documentation about >>>> where each bit of a GenBank record goes into what table and >>>> column. Paul, I think you will find this document to be what you >>>> are looking for - or at least as good as you'll get: go to http:// >>>> cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/doc/? >>>> cvsroot=biosql and look for schema-overview.txt. There is also a >>>> ERD in pdf format which can help you get your head around the >>>> schema. If you end up with specific questions about what's where, >>>> send another e-mail or just load some files and go exploring. >>>> >>>> Barry >>>> >>>> On Sep 11, 2007, at 9:10 AM, Chris Fields wrote: >>>> >>>>> Here's a question I couldn't find the answer to: should any >>>>> BioSQL- >>>>> loaded data (via BioJava, BioPerl, etc) be expected to fully round >>>>> trip across any BioSQL-utilizing language? In other words, if >>>>> I use >>>>> BioJava/Hibernate to load sequence data in to a BioSQL database >>>>> and >>>>> use BioPerl to work with the data, can one expect it to work? >>>>> >>>>> My guess is no, as long as there is no formal specification... >>>>> >>>>> chris >>>>> >>>>> On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: >>>>> >>>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>>> Hash: SHA1 >>>>>> >>>>>> There is no formal specification for what goes where in >>>>>> BioSQL, but >>>>>> you >>>>>> can refer to the BioJava documentation for a good >>>>>> approximation of >>>>>> where >>>>>> a GenBank file should end up. The BioJava objects share similar >>>>>> names to >>>>>> the BioSQL tables and are mapped using Hibernate. >>>>>> >>>>>> The most useful parts of the docs are probably: >>>>>> >>>>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank >>>>>> >>>>>> and: >>>>>> >>>>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- >>>>>> relational_mappings. >>>>>> >>>>>> cheers, >>>>>> Richard >>>>>> >>>>>> Paul Davis wrote: >>>>>>> I've been going over the biosql schema and I was wondering if >>>>>>> there >>>>>>> was a good place to read about examples of actual data that goes >>>>>>> into >>>>>>> each table. Specifically, I'm a bit confused about which parts >>>>>>> of a >>>>>>> genbank record go in which tables. >>>>>>> >>>>>>> Thanks, >>>>>>> Paul Davis >>>>>>> _______________________________________________ >>>>>>> BioSQL-l mailing list >>>>>>> BioSQL-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>>>>> >>>>>> -----BEGIN PGP SIGNATURE----- >>>>>> Version: GnuPG v1.4.2.2 (GNU/Linux) >>>>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >>>>>> >>>>>> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd >>>>>> Q8i8g2bUyB17L++fuSKXa+0= >>>>>> =q8G2 >>>>>> -----END PGP SIGNATURE----- >>>>>> _______________________________________________ >>>>>> BioSQL-l mailing list >>>>>> BioSQL-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>>> >>>>> Christopher Fields >>>>> Postdoctoral Researcher >>>>> Lab of Dr. Robert Switzer >>>>> Dept of Biochemistry >>>>> University of Illinois Urbana-Champaign >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> BioSQL-l mailing list >>>>> BioSQL-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>> >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From paul.joseph.davis at gmail.com Wed Sep 12 20:24:08 2007 From: paul.joseph.davis at gmail.com (Paul Davis) Date: Wed, 12 Sep 2007 20:24:08 -0400 Subject: [BioSQL-l] TAXON,TAXON_NAME, was Re: Description In-Reply-To: <96E80B9C-2F46-43E0-80E4-30160AEDABD7@gmx.net> References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu>

<96E80B9C-2F46-43E0-80E4-30160AEDABD7@gmx.net> Message-ID: I was more wondering if there was an efficient way to recompute that information. As you seem to be confirming, I was faily certain that to update those values would require recalculating all values. Paul On 9/12/07, Hilmar Lapp wrote: > The code is in bioperl-db (which is a sub-repository of bioperl, as > is bioperl-live). > > It makes no attempt at updating the nested-set values. That raises a > good point - there is currently no script that would update that; the > load_ncbi_taxonomy.pl script does recompute it, but will also want to > load or update the NCBI taxonomy. It should be relatively easy to > factor out the nested-set computing code into a separate stand-alone > script. > > -hilmar > > On Sep 12, 2007, at 8:13 PM, Paul Davis wrote: > > > I glanced through the bioperl cvs a bit but couldn't find the part > > where it tries to load a new taxonomy name. Does this go and try to > > rebuild the nested sets information, or basically leave any inserted > > taxonomic data (non-NCBI data) as nodes dangling outside the nested > > sets information? > > > > Paul > > > > On 9/12/07, Hilmar Lapp wrote: > >> The species/taxon handling shouldn't be a problem if you have the > >> NCBI taxonID and have preloaded the NCBI taxonomy. > >> > >> However, if it's a new species (i.e., the lookup of the NCBI taxonID > >> in the taxon table fails), then bioperl-db tries to create the > >> lineage based on what it finds in the species object. > >> > >> As the bug report says, the issue can be fixed, but it also looks > >> like the fix will break compatibility with earlier versions of > >> BioPerl. I think at some point that's fine, but I was wondering > >> whether that's the way it needs to be. > >> > >> -hilmar > >> > >> On Sep 11, 2007, at 12:16 PM, Chris Fields wrote: > >> > >>> I think one area of possible headache will be TAXON/TAXON_NAME. For > >>> instance, with BioPerl we kept running into genus/species parsing > >>> problems (virus, bacterial names) when going from seqrecord->object. > >>> Due to that we decided to greatly simplify Species parsing in > >>> Bioperl > >>> so there isn't any 'guessing' as to genus/species names; you get > >>> what's already there, nothing more. If one wants extra taxonomic > >>> information then one must use NCBI Taxonomy somehow. > >>> > >>> However, currently bioperl-db still splits into genus/species (acts > >>> like older BioPerl), which obviously clashes with current Bioperl > >>> behavior. Not sure how the other Bio* store this data; Richard? > >>> > >>> There is a BioPerl bug filed on this: > >>> > >>> http://bugzilla.open-bio.org/show_bug.cgi?id=2092 > >>> > >>> chris > >>> > >>> On Sep 11, 2007, at 10:49 AM, Barry Moore wrote: > >>> > >>>> Well, the schema is the formal specification as to what goes where > >>>> and as long as your BioJava and BioPerl DB interface plays by the > >>>> rules of the schema, then yes you should be able to use both > >>>> languages on the same database. Of course the devil is in the > >>>> details and since I've only worked with the BioPerl interface I > >>>> don't know if that is in fact reality right now. I think what > >>>> Richard meant was there is not detailed human documentation about > >>>> where each bit of a GenBank record goes into what table and > >>>> column. Paul, I think you will find this document to be what you > >>>> are looking for - or at least as good as you'll get: go to http:// > >>>> cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/doc/? > >>>> cvsroot=biosql and look for schema-overview.txt. There is also a > >>>> ERD in pdf format which can help you get your head around the > >>>> schema. If you end up with specific questions about what's where, > >>>> send another e-mail or just load some files and go exploring. > >>>> > >>>> Barry > >>>> > >>>> On Sep 11, 2007, at 9:10 AM, Chris Fields wrote: > >>>> > >>>>> Here's a question I couldn't find the answer to: should any > >>>>> BioSQL- > >>>>> loaded data (via BioJava, BioPerl, etc) be expected to fully round > >>>>> trip across any BioSQL-utilizing language? In other words, if > >>>>> I use > >>>>> BioJava/Hibernate to load sequence data in to a BioSQL database > >>>>> and > >>>>> use BioPerl to work with the data, can one expect it to work? > >>>>> > >>>>> My guess is no, as long as there is no formal specification... > >>>>> > >>>>> chris > >>>>> > >>>>> On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: > >>>>> > >>>>>> -----BEGIN PGP SIGNED MESSAGE----- > >>>>>> Hash: SHA1 > >>>>>> > >>>>>> There is no formal specification for what goes where in > >>>>>> BioSQL, but > >>>>>> you > >>>>>> can refer to the BioJava documentation for a good > >>>>>> approximation of > >>>>>> where > >>>>>> a GenBank file should end up. The BioJava objects share similar > >>>>>> names to > >>>>>> the BioSQL tables and are mapped using Hibernate. > >>>>>> > >>>>>> The most useful parts of the docs are probably: > >>>>>> > >>>>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank > >>>>>> > >>>>>> and: > >>>>>> > >>>>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- > >>>>>> relational_mappings. > >>>>>> > >>>>>> cheers, > >>>>>> Richard > >>>>>> > >>>>>> Paul Davis wrote: > >>>>>>> I've been going over the biosql schema and I was wondering if > >>>>>>> there > >>>>>>> was a good place to read about examples of actual data that goes > >>>>>>> into > >>>>>>> each table. Specifically, I'm a bit confused about which parts > >>>>>>> of a > >>>>>>> genbank record go in which tables. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Paul Davis > >>>>>>> _______________________________________________ > >>>>>>> BioSQL-l mailing list > >>>>>>> BioSQL-l at lists.open-bio.org > >>>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >>>>>>> > >>>>>> -----BEGIN PGP SIGNATURE----- > >>>>>> Version: GnuPG v1.4.2.2 (GNU/Linux) > >>>>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > >>>>>> > >>>>>> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd > >>>>>> Q8i8g2bUyB17L++fuSKXa+0= > >>>>>> =q8G2 > >>>>>> -----END PGP SIGNATURE----- > >>>>>> _______________________________________________ > >>>>>> BioSQL-l mailing list > >>>>>> BioSQL-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >>>>> > >>>>> Christopher Fields > >>>>> Postdoctoral Researcher > >>>>> Lab of Dr. Robert Switzer > >>>>> Dept of Biochemistry > >>>>> University of Illinois Urbana-Champaign > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> BioSQL-l mailing list > >>>>> BioSQL-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >>>> > >>> > >>> Christopher Fields > >>> Postdoctoral Researcher > >>> Lab of Dr. Robert Switzer > >>> Dept of Biochemistry > >>> University of Illinois Urbana-Champaign > >>> > >>> > >>> > >>> _______________________________________________ > >>> BioSQL-l mailing list > >>> BioSQL-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > >> > >> > >> _______________________________________________ > >> BioSQL-l mailing list > >> BioSQL-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biosql-l > >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > From cjfields at uiuc.edu Wed Sep 12 21:42:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 12 Sep 2007 20:42:03 -0500 Subject: [BioSQL-l] TAXON,TAXON_NAME, was Re: Description In-Reply-To: References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu>

Message-ID: <9FF304F7-A318-487B-8346-1A4CA5E424B6@uiuc.edu> If one were using bioperl versions up to 1.5.1 the Bio::Species class doesn't implement a specific interface, whereas in 1.5.2 it inherits the new Bio::Taxon (and all methods are reimplemented to work with Bio::Taxon methods). Acc. to Sendu the long-term plan was to eventually deprecate Bio::Species and just use Bio::Taxon, with no 'guessing' of the genus/species that always borked seqrcord parsing. That 'guessing' is essentially what is going on with SpeciesAaptor now (Sendu's suggestion of 'old behavior', which triggered the exception in the bug report). I'll try to look into it in a few weeks when I have some more time; there are a number of bioperl-db bugs in bugzilla that need sorting through. My thought is still to use a transition module (TaxonAdaptor) which would eventually replace SpeciesAdaptor once Bio::Species is no more. chris On Sep 12, 2007, at 6:15 PM, Hilmar Lapp wrote: > The species/taxon handling shouldn't be a problem if you have the > NCBI taxonID and have preloaded the NCBI taxonomy. > > However, if it's a new species (i.e., the lookup of the NCBI > taxonID in the taxon table fails), then bioperl-db tries to create > the lineage based on what it finds in the species object. > > As the bug report says, the issue can be fixed, but it also looks > like the fix will break compatibility with earlier versions of > BioPerl. I think at some point that's fine, but I was wondering > whether that's the way it needs to be. > > -hilmar > > On Sep 11, 2007, at 12:16 PM, Chris Fields wrote: > >> I think one area of possible headache will be TAXON/TAXON_NAME. For >> instance, with BioPerl we kept running into genus/species parsing >> problems (virus, bacterial names) when going from seqrecord->object. >> Due to that we decided to greatly simplify Species parsing in Bioperl >> so there isn't any 'guessing' as to genus/species names; you get >> what's already there, nothing more. If one wants extra taxonomic >> information then one must use NCBI Taxonomy somehow. >> >> However, currently bioperl-db still splits into genus/species (acts >> like older BioPerl), which obviously clashes with current Bioperl >> behavior. Not sure how the other Bio* store this data; Richard? >> >> There is a BioPerl bug filed on this: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2092 >> >> chris >> >> On Sep 11, 2007, at 10:49 AM, Barry Moore wrote: >> >>> Well, the schema is the formal specification as to what goes where >>> and as long as your BioJava and BioPerl DB interface plays by the >>> rules of the schema, then yes you should be able to use both >>> languages on the same database. Of course the devil is in the >>> details and since I've only worked with the BioPerl interface I >>> don't know if that is in fact reality right now. I think what >>> Richard meant was there is not detailed human documentation about >>> where each bit of a GenBank record goes into what table and >>> column. Paul, I think you will find this document to be what you >>> are looking for - or at least as good as you'll get: go to http:// >>> cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/doc/? >>> cvsroot=biosql and look for schema-overview.txt. There is also a >>> ERD in pdf format which can help you get your head around the >>> schema. If you end up with specific questions about what's where, >>> send another e-mail or just load some files and go exploring. >>> >>> Barry >>> >>> On Sep 11, 2007, at 9:10 AM, Chris Fields wrote: >>> >>>> Here's a question I couldn't find the answer to: should any BioSQL- >>>> loaded data (via BioJava, BioPerl, etc) be expected to fully round >>>> trip across any BioSQL-utilizing language? In other words, if I >>>> use >>>> BioJava/Hibernate to load sequence data in to a BioSQL database and >>>> use BioPerl to work with the data, can one expect it to work? >>>> >>>> My guess is no, as long as there is no formal specification... >>>> >>>> chris >>>> >>>> On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: >>>> >>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>> Hash: SHA1 >>>>> >>>>> There is no formal specification for what goes where in BioSQL, >>>>> but >>>>> you >>>>> can refer to the BioJava documentation for a good approximation of >>>>> where >>>>> a GenBank file should end up. The BioJava objects share similar >>>>> names to >>>>> the BioSQL tables and are mapped using Hibernate. >>>>> >>>>> The most useful parts of the docs are probably: >>>>> >>>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank >>>>> >>>>> and: >>>>> >>>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- >>>>> relational_mappings. >>>>> >>>>> cheers, >>>>> Richard >>>>> >>>>> Paul Davis wrote: >>>>>> I've been going over the biosql schema and I was wondering if >>>>>> there >>>>>> was a good place to read about examples of actual data that goes >>>>>> into >>>>>> each table. Specifically, I'm a bit confused about which parts >>>>>> of a >>>>>> genbank record go in which tables. >>>>>> >>>>>> Thanks, >>>>>> Paul Davis >>>>>> _______________________________________________ >>>>>> BioSQL-l mailing list >>>>>> BioSQL-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>>>> >>>>> -----BEGIN PGP SIGNATURE----- >>>>> Version: GnuPG v1.4.2.2 (GNU/Linux) >>>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >>>>> >>>>> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd >>>>> Q8i8g2bUyB17L++fuSKXa+0= >>>>> =q8G2 >>>>> -----END PGP SIGNATURE----- >>>>> _______________________________________________ >>>>> BioSQL-l mailing list >>>>> BioSQL-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Thu Sep 13 10:37:44 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 13 Sep 2007 10:37:44 -0400 Subject: [BioSQL-l] TAXON,TAXON_NAME, was Re: Description In-Reply-To: References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu>

<96E80B9C-2F46-43E0-80E4-30160AEDABD7@gmx.net> Message-ID: You could theoretically save some time by not updating the values that wouldn't change (e.g., all left siblings and their descendants unchanged). But on average, assuming that the new node is most likely a leaf node and that the tree is balanced, you would still have to update about half of all nodes, namely all nodes "to the right" (i.e., all siblings and their descendants to the right, all ancestors and their siblings to the right and the descendants of those siblings). -hilmar On Sep 12, 2007, at 8:24 PM, Paul Davis wrote: > I was more wondering if there was an efficient way to recompute that > information. As you seem to be confirming, I was faily certain that to > update those values would require recalculating all values. > > Paul -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Sep 30 18:24:55 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 30 Sep 2007 18:24:55 -0400 Subject: [BioSQL-l] license Message-ID: I realized that BioSQL is licensed under "the same terms as Perl itself", and then references the Perl Artistic License. First of all, Perl has changed its licensing terms to allow the GPL as an alternative, and the Artistic License for Perl will be upgraded to v2.0. Aside from all that, I'm not sure that it makes all that much sense to couple the license terms to those of Perl. Maybe a more technology- neutral license would be more appropriate, such as the GPL alone, LGPL, or simply MIT (or new BSD) license. Or just the Artistic Licence v2.0? LGPL: http://www.opensource.org/licenses/lgpl-license.php MIT: http://www.opensource.org/licenses/mit-license.php BSD: http://www.opensource.org/licenses/bsd-license.php Artistic 2.0: http://www.opensource.org/licenses/artistic- license-2.0.php No action is probably not an option (b/c issues with Artistic v1.0 and changes in Perl licensing). Any thoughts, opinions? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sun Sep 30 19:15:30 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 30 Sep 2007 18:15:30 -0500 Subject: [BioSQL-l] license In-Reply-To: References: Message-ID: <780BB4F4-343D-488F-A27D-D5D36014A3DF@uiuc.edu> BioPerl distros just changed to specifically allow Artistic and GPL. I think Artistic v2 kicks in when Perl 5.10 or Perl6 is released, but I'm not sure. For BioSQL I think any of the specific licenses you mention (GPL, LGPL, BSD, Artistic 2) would be fine. I'm a fan of GPL myself. chris On Sep 30, 2007, at 5:24 PM, Hilmar Lapp wrote: > I realized that BioSQL is licensed under "the same terms as Perl > itself", and then references the Perl Artistic License. > > First of all, Perl has changed its licensing terms to allow the GPL > as an alternative, and the Artistic License for Perl will be upgraded > to v2.0. > > Aside from all that, I'm not sure that it makes all that much sense > to couple the license terms to those of Perl. Maybe a more technology- > neutral license would be more appropriate, such as the GPL alone, > LGPL, or simply MIT (or new BSD) license. Or just the Artistic > Licence v2.0? > > LGPL: http://www.opensource.org/licenses/lgpl-license.php > MIT: http://www.opensource.org/licenses/mit-license.php > BSD: http://www.opensource.org/licenses/bsd-license.php > Artistic 2.0: http://www.opensource.org/licenses/artistic- > license-2.0.php > > No action is probably not an option (b/c issues with Artistic v1.0 > and changes in Perl licensing). Any thoughts, opinions? > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From david at autohandle.com Sun Sep 30 22:05:15 2007 From: david at autohandle.com (David Scott) Date: Sun, 30 Sep 2007 19:05:15 -0700 Subject: [BioSQL-l] license In-Reply-To: References: Message-ID: <470055DB.5080304@autohandle.com> is any kind of approval needed from the biosql authors to change the license? Hilmar Lapp wrote: > I realized that BioSQL is licensed under "the same terms as Perl > itself", and then references the Perl Artistic License. > > First of all, Perl has changed its licensing terms to allow the GPL > as an alternative, and the Artistic License for Perl will be upgraded > to v2.0. > > Aside from all that, I'm not sure that it makes all that much sense > to couple the license terms to those of Perl. Maybe a more technology- > neutral license would be more appropriate, such as the GPL alone, > LGPL, or simply MIT (or new BSD) license. Or just the Artistic > Licence v2.0? > > LGPL: http://www.opensource.org/licenses/lgpl-license.php > MIT: http://www.opensource.org/licenses/mit-license.php > BSD: http://www.opensource.org/licenses/bsd-license.php > Artistic 2.0: http://www.opensource.org/licenses/artistic- > license-2.0.php > > No action is probably not an option (b/c issues with Artistic v1.0 > and changes in Perl licensing). Any thoughts, opinions? > > -hilmar > From cjfields at uiuc.edu Sun Sep 2 23:52:40 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 2 Sep 2007 18:52:40 -0500 Subject: [BioSQL-l] recursion issues with bioperl-db Message-ID: <38D1DDAE-59DE-4A7B-B799-47E1DD506484@uiuc.edu> I noticed some critical recursion issues with bioperl-db when working in Bio::Ontology changes. This was using bioperl-live (post-feature/ annotation fixes). Bug report is here: http://bugzilla.open-bio.org/show_bug.cgi?id=2355 It seems to be Bio:Taxon related; this is from 03swiss.t: --------------------- WARNING --------------------- MSG: recursion detected for Bio::Taxon object STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_process_child /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ BasePersistenceAdaptor.pm:681 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_create_persistent /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ BasePersistenceAdaptor.pm:630 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_process_child /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ BasePersistenceAdaptor.pm:692 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_create_persistent /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ BasePersistenceAdaptor.pm:630 ... /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ BasePersistenceAdaptor.pm:587 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ BasePersistenceAdaptor.pm:253 STACK Bio::DB::BioSQL::PrimarySeqAdaptor::store_children /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ PrimarySeqAdaptor.pm:229 STACK Bio::DB::BioSQL::SeqAdaptor::store_children /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ SeqAdaptor.pm:217 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ BasePersistenceAdaptor.pm:214 STACK Bio::DB::Persistent::PersistentObject::create /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/Persistent/ PersistentObject.pm:244 STACK toplevel t/04swiss.t:36 --------------------------------------------------- Also, seeing this with 13remove.t and 15.cluster.t, both of which appear to infinitely recurse: Deep recursion on subroutine "Bio::DB::BioSQL::BasePersistenceAdaptor::_create_persistent" at /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ BasePersistenceAdaptor.pm line 587, line 1. Deep recursion on subroutine "Bio::DB::BioSQL::BasePersistenceAdaptor::_process_child" at /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ BasePersistenceAdaptor.pm line 630, line 1. chris From cjfields at uiuc.edu Mon Sep 3 01:40:48 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 2 Sep 2007 20:40:48 -0500 Subject: [BioSQL-l] [Bioperl-l] recursion issues with bioperl-db In-Reply-To: <2E14450C-C135-42DD-A9DE-EB47EB80E6AC@uiuc.edu> References: <2E14450C-C135-42DD-A9DE-EB47EB80E6AC@uiuc.edu> Message-ID: <25CFD36D-D921-4F5F-BADF-D858A2FE76D4@uiuc.edu> Okay, we can the previous posts! Odd, but I started from scratch and can't reproduce the issue; there may have been some cross-talk with different bioperl installations on my laptop. Anyway, everything passes now w/o recursion so I'll mark the bug as invalid. chris On Sep 2, 2007, at 6:57 PM, Chris Fields wrote: > Apologies if you get this more than once; the first post appeared to > get sent w/o a proper subject line. Posted this to biosql-l already > but felt it needed posting here as well. > > I noticed some critical recursion issues with bioperl-db when working > in Bio::Ontology changes. This was using bioperl-live (post-feature/ > annotation fixes). Bug report is here: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2355 > > It seems to be Bio:Taxon related; this is from 03swiss.t: > > --------------------- WARNING --------------------- > MSG: recursion detected for Bio::Taxon object > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_process_child > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:681 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_create_persistent > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:630 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_process_child > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:692 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_create_persistent > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:630 > ... > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:587 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:253 > STACK Bio::DB::BioSQL::PrimarySeqAdaptor::store_children > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > PrimarySeqAdaptor.pm:229 > STACK Bio::DB::BioSQL::SeqAdaptor::store_children > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > SeqAdaptor.pm:217 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:214 > STACK Bio::DB::Persistent::PersistentObject::create > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/Persistent/ > PersistentObject.pm:244 > STACK toplevel t/04swiss.t:36 > --------------------------------------------------- > > Also, seeing this with 13remove.t and 15.cluster.t, both of which > appear to infinitely recurse: > > Deep recursion on subroutine > "Bio::DB::BioSQL::BasePersistenceAdaptor::_create_persistent" at > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm > line 587, line 1. > Deep recursion on subroutine > "Bio::DB::BioSQL::BasePersistenceAdaptor::_process_child" at > /Users/cjfields/src/core/bioperl-db/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm > line 630, line 1. > > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From paul.joseph.davis at gmail.com Tue Sep 11 14:17:39 2007 From: paul.joseph.davis at gmail.com (Paul Davis) Date: Tue, 11 Sep 2007 10:17:39 -0400 Subject: [BioSQL-l] Description Message-ID: I've been going over the biosql schema and I was wondering if there was a good place to read about examples of actual data that goes into each table. Specifically, I'm a bit confused about which parts of a genbank record go in which tables. Thanks, Paul Davis From holland at ebi.ac.uk Tue Sep 11 14:54:50 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 11 Sep 2007 15:54:50 +0100 Subject: [BioSQL-l] Description In-Reply-To: References: Message-ID: <46E6AC3A.5000203@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 There is no formal specification for what goes where in BioSQL, but you can refer to the BioJava documentation for a good approximation of where a GenBank file should end up. The BioJava objects share similar names to the BioSQL tables and are mapped using Hibernate. The most useful parts of the docs are probably: http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank and: http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object-relational_mappings. cheers, Richard Paul Davis wrote: > I've been going over the biosql schema and I was wondering if there > was a good place to read about examples of actual data that goes into > each table. Specifically, I'm a bit confused about which parts of a > genbank record go in which tables. > > Thanks, > Paul Davis > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd Q8i8g2bUyB17L++fuSKXa+0= =q8G2 -----END PGP SIGNATURE----- From cjfields at uiuc.edu Tue Sep 11 15:10:37 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 11 Sep 2007 10:10:37 -0500 Subject: [BioSQL-l] Description In-Reply-To: <46E6AC3A.5000203@ebi.ac.uk> References: <46E6AC3A.5000203@ebi.ac.uk> Message-ID: <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> Here's a question I couldn't find the answer to: should any BioSQL- loaded data (via BioJava, BioPerl, etc) be expected to fully round trip across any BioSQL-utilizing language? In other words, if I use BioJava/Hibernate to load sequence data in to a BioSQL database and use BioPerl to work with the data, can one expect it to work? My guess is no, as long as there is no formal specification... chris On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > There is no formal specification for what goes where in BioSQL, but > you > can refer to the BioJava documentation for a good approximation of > where > a GenBank file should end up. The BioJava objects share similar > names to > the BioSQL tables and are mapped using Hibernate. > > The most useful parts of the docs are probably: > > http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank > > and: > > http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- > relational_mappings. > > cheers, > Richard > > Paul Davis wrote: >> I've been going over the biosql schema and I was wondering if there >> was a good place to read about examples of actual data that goes into >> each table. Specifically, I'm a bit confused about which parts of a >> genbank record go in which tables. >> >> Thanks, >> Paul Davis >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd > Q8i8g2bUyB17L++fuSKXa+0= > =q8G2 > -----END PGP SIGNATURE----- > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Sep 11 15:38:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 11 Sep 2007 10:38:45 -0500 Subject: [BioSQL-l] Description In-Reply-To: <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> Message-ID: <4A365ABF-1C1E-46B0-A200-D2E3C0BC72B5@uiuc.edu> D'oh, it's in the README! .... This is the BioSQL distribution. BioSQL is a generic unifying schema for storing sequences from different sources, for instance Genbank or Swissprot. BioSQL is meant to be a common data storage layer supported by all the different Bio* projects, Bioperl, Biojava, Biopython, and Bioruby. Entries stored through an application written in, say, Bioperl could be retrieved by another written in Biojava. chris On Sep 11, 2007, at 10:10 AM, Chris Fields wrote: > Here's a question I couldn't find the answer to: should any BioSQL- > loaded data (via BioJava, BioPerl, etc) be expected to fully round > trip across any BioSQL-utilizing language? In other words, if I use > BioJava/Hibernate to load sequence data in to a BioSQL database and > use BioPerl to work with the data, can one expect it to work? > > My guess is no, as long as there is no formal specification... > > chris > > On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> There is no formal specification for what goes where in BioSQL, but >> you >> can refer to the BioJava documentation for a good approximation of >> where >> a GenBank file should end up. The BioJava objects share similar >> names to >> the BioSQL tables and are mapped using Hibernate. >> >> The most useful parts of the docs are probably: >> >> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank >> >> and: >> >> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- >> relational_mappings. >> >> cheers, >> Richard >> >> Paul Davis wrote: >>> I've been going over the biosql schema and I was wondering if there >>> was a good place to read about examples of actual data that goes >>> into >>> each table. Specifically, I'm a bit confused about which parts of a >>> genbank record go in which tables. >>> >>> Thanks, >>> Paul Davis >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.2.2 (GNU/Linux) >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> >> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd >> Q8i8g2bUyB17L++fuSKXa+0= >> =q8G2 >> -----END PGP SIGNATURE----- >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From barry.moore at genetics.utah.edu Tue Sep 11 15:49:45 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Tue, 11 Sep 2007 09:49:45 -0600 Subject: [BioSQL-l] Description In-Reply-To: <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> Message-ID: <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu> Well, the schema is the formal specification as to what goes where and as long as your BioJava and BioPerl DB interface plays by the rules of the schema, then yes you should be able to use both languages on the same database. Of course the devil is in the details and since I've only worked with the BioPerl interface I don't know if that is in fact reality right now. I think what Richard meant was there is not detailed human documentation about where each bit of a GenBank record goes into what table and column. Paul, I think you will find this document to be what you are looking for - or at least as good as you'll get: go to http://cvs.open-bio.org/cgi- bin/viewcvs/viewcvs.cgi/biosql-schema/doc/?cvsroot=biosql and look for schema-overview.txt. There is also a ERD in pdf format which can help you get your head around the schema. If you end up with specific questions about what's where, send another e-mail or just load some files and go exploring. Barry On Sep 11, 2007, at 9:10 AM, Chris Fields wrote: > Here's a question I couldn't find the answer to: should any BioSQL- > loaded data (via BioJava, BioPerl, etc) be expected to fully round > trip across any BioSQL-utilizing language? In other words, if I use > BioJava/Hibernate to load sequence data in to a BioSQL database and > use BioPerl to work with the data, can one expect it to work? > > My guess is no, as long as there is no formal specification... > > chris > > On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> There is no formal specification for what goes where in BioSQL, but >> you >> can refer to the BioJava documentation for a good approximation of >> where >> a GenBank file should end up. The BioJava objects share similar >> names to >> the BioSQL tables and are mapped using Hibernate. >> >> The most useful parts of the docs are probably: >> >> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank >> >> and: >> >> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- >> relational_mappings. >> >> cheers, >> Richard >> >> Paul Davis wrote: >>> I've been going over the biosql schema and I was wondering if there >>> was a good place to read about examples of actual data that goes >>> into >>> each table. Specifically, I'm a bit confused about which parts of a >>> genbank record go in which tables. >>> >>> Thanks, >>> Paul Davis >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.2.2 (GNU/Linux) >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> >> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd >> Q8i8g2bUyB17L++fuSKXa+0= >> =q8G2 >> -----END PGP SIGNATURE----- >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From cjfields at uiuc.edu Tue Sep 11 16:16:08 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 11 Sep 2007 11:16:08 -0500 Subject: [BioSQL-l] TAXON,TAXON_NAME, was Re: Description In-Reply-To: <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu> References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu> Message-ID: I think one area of possible headache will be TAXON/TAXON_NAME. For instance, with BioPerl we kept running into genus/species parsing problems (virus, bacterial names) when going from seqrecord->object. Due to that we decided to greatly simplify Species parsing in Bioperl so there isn't any 'guessing' as to genus/species names; you get what's already there, nothing more. If one wants extra taxonomic information then one must use NCBI Taxonomy somehow. However, currently bioperl-db still splits into genus/species (acts like older BioPerl), which obviously clashes with current Bioperl behavior. Not sure how the other Bio* store this data; Richard? There is a BioPerl bug filed on this: http://bugzilla.open-bio.org/show_bug.cgi?id=2092 chris On Sep 11, 2007, at 10:49 AM, Barry Moore wrote: > Well, the schema is the formal specification as to what goes where > and as long as your BioJava and BioPerl DB interface plays by the > rules of the schema, then yes you should be able to use both > languages on the same database. Of course the devil is in the > details and since I've only worked with the BioPerl interface I > don't know if that is in fact reality right now. I think what > Richard meant was there is not detailed human documentation about > where each bit of a GenBank record goes into what table and > column. Paul, I think you will find this document to be what you > are looking for - or at least as good as you'll get: go to http:// > cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/doc/? > cvsroot=biosql and look for schema-overview.txt. There is also a > ERD in pdf format which can help you get your head around the > schema. If you end up with specific questions about what's where, > send another e-mail or just load some files and go exploring. > > Barry > > On Sep 11, 2007, at 9:10 AM, Chris Fields wrote: > >> Here's a question I couldn't find the answer to: should any BioSQL- >> loaded data (via BioJava, BioPerl, etc) be expected to fully round >> trip across any BioSQL-utilizing language? In other words, if I use >> BioJava/Hibernate to load sequence data in to a BioSQL database and >> use BioPerl to work with the data, can one expect it to work? >> >> My guess is no, as long as there is no formal specification... >> >> chris >> >> On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: >> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> There is no formal specification for what goes where in BioSQL, but >>> you >>> can refer to the BioJava documentation for a good approximation of >>> where >>> a GenBank file should end up. The BioJava objects share similar >>> names to >>> the BioSQL tables and are mapped using Hibernate. >>> >>> The most useful parts of the docs are probably: >>> >>> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank >>> >>> and: >>> >>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- >>> relational_mappings. >>> >>> cheers, >>> Richard >>> >>> Paul Davis wrote: >>>> I've been going over the biosql schema and I was wondering if there >>>> was a good place to read about examples of actual data that goes >>>> into >>>> each table. Specifically, I'm a bit confused about which parts of a >>>> genbank record go in which tables. >>>> >>>> Thanks, >>>> Paul Davis >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>> >>> -----BEGIN PGP SIGNATURE----- >>> Version: GnuPG v1.4.2.2 (GNU/Linux) >>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >>> >>> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd >>> Q8i8g2bUyB17L++fuSKXa+0= >>> =q8G2 >>> -----END PGP SIGNATURE----- >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From paul.joseph.davis at gmail.com Tue Sep 11 16:59:37 2007 From: paul.joseph.davis at gmail.com (Paul Davis) Date: Tue, 11 Sep 2007 12:59:37 -0400 Subject: [BioSQL-l] Description In-Reply-To: <4A365ABF-1C1E-46B0-A200-D2E3C0BC72B5@uiuc.edu> References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> <4A365ABF-1C1E-46B0-A200-D2E3C0BC72B5@uiuc.edu> Message-ID: On 9/11/07, Chris Fields wrote: > D'oh, it's in the README! > > .... > > This is the BioSQL distribution. BioSQL is a generic unifying schema > for storing sequences from different sources, for instance Genbank or > Swissprot. > > BioSQL is meant to be a common data storage layer supported by all the > different Bio* projects, Bioperl, Biojava, Biopython, and Bioruby. > Entries stored through an application written in, say, Bioperl could > be retrieved by another written in Biojava. > > chris > I think this was the underlying idea behind standardizing the schema. Granted I haven't groked all of the package's DB access layers, but given what I've seen in the BioPython interface, I'm guessing this is more of a theoretical possibility vs. a standard practice. > On Sep 11, 2007, at 10:10 AM, Chris Fields wrote: > > > Here's a question I couldn't find the answer to: should any BioSQL- > > loaded data (via BioJava, BioPerl, etc) be expected to fully round > > trip across any BioSQL-utilizing language? In other words, if I use > > BioJava/Hibernate to load sequence data in to a BioSQL database and > > use BioPerl to work with the data, can one expect it to work? > > > > My guess is no, as long as there is no formal specification... > > > > chris > > > > On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: > > > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> There is no formal specification for what goes where in BioSQL, but > >> you > >> can refer to the BioJava documentation for a good approximation of > >> where > >> a GenBank file should end up. The BioJava objects share similar > >> names to > >> the BioSQL tables and are mapped using Hibernate. > >> > >> The most useful parts of the docs are probably: > >> > >> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank > >> > >> and: > >> > >> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- > >> relational_mappings. > >> > >> cheers, > >> Richard > >> > >> Paul Davis wrote: > >>> I've been going over the biosql schema and I was wondering if there > >>> was a good place to read about examples of actual data that goes > >>> into > >>> each table. Specifically, I'm a bit confused about which parts of a > >>> genbank record go in which tables. > >>> > >>> Thanks, > >>> Paul Davis > >>> _______________________________________________ > >>> BioSQL-l mailing list > >>> BioSQL-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >>> > >> -----BEGIN PGP SIGNATURE----- > >> Version: GnuPG v1.4.2.2 (GNU/Linux) > >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > >> > >> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd > >> Q8i8g2bUyB17L++fuSKXa+0= > >> =q8G2 > >> -----END PGP SIGNATURE----- > >> _______________________________________________ > >> BioSQL-l mailing list > >> BioSQL-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > From holland at ebi.ac.uk Wed Sep 12 07:32:42 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 12 Sep 2007 08:32:42 +0100 Subject: [BioSQL-l] TAXON,TAXON_NAME, was Re: Description In-Reply-To: References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu> Message-ID: <46E7961A.8090709@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 We use the taxon table to store taxon information. See: http://biojava.org/wiki/BioJava:BioJavaXDocs#NCBI_Taxonomy. Each RichSequence object then gets an NCBITaxon object associated with it using set/getTaxon(). For Genbank this is always parsed from the appropriate entry in the feature table - the Organism and Species lines are ignored. cheers, Richard Chris Fields wrote: > I think one area of possible headache will be TAXON/TAXON_NAME. For > instance, with BioPerl we kept running into genus/species parsing > problems (virus, bacterial names) when going from seqrecord->object. > Due to that we decided to greatly simplify Species parsing in Bioperl so > there isn't any 'guessing' as to genus/species names; you get what's > already there, nothing more. If one wants extra taxonomic information > then one must use NCBI Taxonomy somehow. > > However, currently bioperl-db still splits into genus/species (acts like > older BioPerl), which obviously clashes with current Bioperl behavior. > Not sure how the other Bio* store this data; Richard? > > There is a BioPerl bug filed on this: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2092 > > chris > > On Sep 11, 2007, at 10:49 AM, Barry Moore wrote: > >> Well, the schema is the formal specification as to what goes where and >> as long as your BioJava and BioPerl DB interface plays by the rules of >> the schema, then yes you should be able to use both languages on the >> same database. Of course the devil is in the details and since I've >> only worked with the BioPerl interface I don't know if that is in fact >> reality right now. I think what Richard meant was there is not >> detailed human documentation about where each bit of a GenBank record >> goes into what table and column. Paul, I think you will find this >> document to be what you are looking for - or at least as good as >> you'll get: go to >> http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/doc/?cvsroot=biosql >> and look for schema-overview.txt. There is also a ERD in pdf format >> which can help you get your head around the schema. If you end up >> with specific questions about what's where, send another e-mail or >> just load some files and go exploring. >> >> Barry >> >> On Sep 11, 2007, at 9:10 AM, Chris Fields wrote: >> >>> Here's a question I couldn't find the answer to: should any BioSQL- >>> loaded data (via BioJava, BioPerl, etc) be expected to fully round >>> trip across any BioSQL-utilizing language? In other words, if I use >>> BioJava/Hibernate to load sequence data in to a BioSQL database and >>> use BioPerl to work with the data, can one expect it to work? >>> >>> My guess is no, as long as there is no formal specification... >>> >>> chris >>> >>> On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: >>> > There is no formal specification for what goes where in BioSQL, but > you > can refer to the BioJava documentation for a good approximation of > where > a GenBank file should end up. The BioJava objects share similar > names to > the BioSQL tables and are mapped using Hibernate. > > The most useful parts of the docs are probably: > > http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank > > and: > > http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- > relational_mappings. > > cheers, > Richard > > Paul Davis wrote: >>>>>> I've been going over the biosql schema and I was wondering if there >>>>>> was a good place to read about examples of actual data that goes into >>>>>> each table. Specifically, I'm a bit confused about which parts of a >>>>>> genbank record go in which tables. >>>>>> >>>>>> Thanks, >>>>>> Paul Davis >>>>>> _______________________________________________ >>>>>> BioSQL-l mailing list >>>>>> BioSQL-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>>>> _______________________________________________ BioSQL-l mailing list BioSQL-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG55YZ4C5LeMEKA/QRAg7wAJwPa7GXHKSdaYVHrk9a3JM8GhLIHwCeLRSq jaQ6oAARv+oOpuaeBhNSA2U= =xc8y -----END PGP SIGNATURE----- From hlapp at gmx.net Wed Sep 12 23:01:28 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 12 Sep 2007 19:01:28 -0400 Subject: [BioSQL-l] Description In-Reply-To: <46E6AC3A.5000203@ebi.ac.uk> References: <46E6AC3A.5000203@ebi.ac.uk> Message-ID: <364A0795-3399-4F2C-A292-23BA5AA9F899@gmx.net> On Sep 11, 2007, at 10:54 AM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > There is no formal specification for what goes where in BioSQL, Indeed there isn't a formal specification in text. To understand this it may be worth keeping in mind that the historic, and still in a sense primary, use-case of BioSQL is to be the common persistence API for the Bio* projects. Hence, what is relatively well defined is how to map a Bio* object model (in particular, BioPerl's - and meanwhile Biojava's - object model) into BioSQL and back. Where a particular piece of a GenBank file ends up in BioSQL would therefore depend on where it ends up in the respective object model, strictly speaking. Since this doesn't bode well for interoperability between the toolkits (which was one of the points of having BioSQL) Richard, Mark Schreiber, and I got together 2 years ago to reconcile BioPerl's and Biojava's way of ingesting and representing a richly annotated sequence, leading to the RichSeq work being added to Biojava (correct me Richard if I'm confusing things). So in theory, at least meanwhile BioPerl and Biojava should map a GenBank sequence to BioSQL in a very similar or ideally identical way, so I'm not sure this has ever been put to the test. I'm not aware of a similar effort that has been undertaken on the end of Biopython, though I'd be more than happy to work with anyone from the Biopython community who is interested in resolving this. Given the recent Bio.SeqIO work there, this may be a good time to take this up. -hilmar > but you can refer to the BioJava documentation for a good > approximation of where > a GenBank file should end up. The BioJava objects share similar > names to > the BioSQL tables and are mapped using Hibernate. > > The most useful parts of the docs are probably: > > http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank > > and: > > http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- > relational_mappings. > > cheers, > Richard -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Sep 12 23:05:12 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 12 Sep 2007 19:05:12 -0400 Subject: [BioSQL-l] Description In-Reply-To: <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> Message-ID: <3E63A460-6C83-43E2-984C-39D7A25F7D74@gmx.net> On Sep 11, 2007, at 11:10 AM, Chris Fields wrote: > Here's a question I couldn't find the answer to: should any BioSQL- > loaded data (via BioJava, BioPerl, etc) be expected to fully round > trip across any BioSQL-utilizing language? In other words, if I use > BioJava/Hibernate to load sequence data in to a BioSQL database and > use BioPerl to work with the data, can one expect it to work? In theory yes. In practice, there's hasn't been a great effort of writing tests and working out the kinks until this is really true. Though minor differences are easily possible, I'd be surprised though if there are still huge incompatibilities between how Biojava and BioPerl store things, given the biojavax work of Mark and Richard. I don't know how big the differences would be for Biopython or BioRuby, though, and between those two. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Sep 12 23:15:42 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 12 Sep 2007 19:15:42 -0400 Subject: [BioSQL-l] TAXON,TAXON_NAME, was Re: Description In-Reply-To: References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu> Message-ID: The species/taxon handling shouldn't be a problem if you have the NCBI taxonID and have preloaded the NCBI taxonomy. However, if it's a new species (i.e., the lookup of the NCBI taxonID in the taxon table fails), then bioperl-db tries to create the lineage based on what it finds in the species object. As the bug report says, the issue can be fixed, but it also looks like the fix will break compatibility with earlier versions of BioPerl. I think at some point that's fine, but I was wondering whether that's the way it needs to be. -hilmar On Sep 11, 2007, at 12:16 PM, Chris Fields wrote: > I think one area of possible headache will be TAXON/TAXON_NAME. For > instance, with BioPerl we kept running into genus/species parsing > problems (virus, bacterial names) when going from seqrecord->object. > Due to that we decided to greatly simplify Species parsing in Bioperl > so there isn't any 'guessing' as to genus/species names; you get > what's already there, nothing more. If one wants extra taxonomic > information then one must use NCBI Taxonomy somehow. > > However, currently bioperl-db still splits into genus/species (acts > like older BioPerl), which obviously clashes with current Bioperl > behavior. Not sure how the other Bio* store this data; Richard? > > There is a BioPerl bug filed on this: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2092 > > chris > > On Sep 11, 2007, at 10:49 AM, Barry Moore wrote: > >> Well, the schema is the formal specification as to what goes where >> and as long as your BioJava and BioPerl DB interface plays by the >> rules of the schema, then yes you should be able to use both >> languages on the same database. Of course the devil is in the >> details and since I've only worked with the BioPerl interface I >> don't know if that is in fact reality right now. I think what >> Richard meant was there is not detailed human documentation about >> where each bit of a GenBank record goes into what table and >> column. Paul, I think you will find this document to be what you >> are looking for - or at least as good as you'll get: go to http:// >> cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/doc/? >> cvsroot=biosql and look for schema-overview.txt. There is also a >> ERD in pdf format which can help you get your head around the >> schema. If you end up with specific questions about what's where, >> send another e-mail or just load some files and go exploring. >> >> Barry >> >> On Sep 11, 2007, at 9:10 AM, Chris Fields wrote: >> >>> Here's a question I couldn't find the answer to: should any BioSQL- >>> loaded data (via BioJava, BioPerl, etc) be expected to fully round >>> trip across any BioSQL-utilizing language? In other words, if I use >>> BioJava/Hibernate to load sequence data in to a BioSQL database and >>> use BioPerl to work with the data, can one expect it to work? >>> >>> My guess is no, as long as there is no formal specification... >>> >>> chris >>> >>> On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: >>> >>>> -----BEGIN PGP SIGNED MESSAGE----- >>>> Hash: SHA1 >>>> >>>> There is no formal specification for what goes where in BioSQL, but >>>> you >>>> can refer to the BioJava documentation for a good approximation of >>>> where >>>> a GenBank file should end up. The BioJava objects share similar >>>> names to >>>> the BioSQL tables and are mapped using Hibernate. >>>> >>>> The most useful parts of the docs are probably: >>>> >>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank >>>> >>>> and: >>>> >>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- >>>> relational_mappings. >>>> >>>> cheers, >>>> Richard >>>> >>>> Paul Davis wrote: >>>>> I've been going over the biosql schema and I was wondering if >>>>> there >>>>> was a good place to read about examples of actual data that goes >>>>> into >>>>> each table. Specifically, I'm a bit confused about which parts >>>>> of a >>>>> genbank record go in which tables. >>>>> >>>>> Thanks, >>>>> Paul Davis >>>>> _______________________________________________ >>>>> BioSQL-l mailing list >>>>> BioSQL-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>>> >>>> -----BEGIN PGP SIGNATURE----- >>>> Version: GnuPG v1.4.2.2 (GNU/Linux) >>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >>>> >>>> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd >>>> Q8i8g2bUyB17L++fuSKXa+0= >>>> =q8G2 >>>> -----END PGP SIGNATURE----- >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From paul.joseph.davis at gmail.com Thu Sep 13 00:13:31 2007 From: paul.joseph.davis at gmail.com (Paul Davis) Date: Wed, 12 Sep 2007 20:13:31 -0400 Subject: [BioSQL-l] TAXON,TAXON_NAME, was Re: Description In-Reply-To: References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu>

Message-ID: I glanced through the bioperl cvs a bit but couldn't find the part where it tries to load a new taxonomy name. Does this go and try to rebuild the nested sets information, or basically leave any inserted taxonomic data (non-NCBI data) as nodes dangling outside the nested sets information? Paul On 9/12/07, Hilmar Lapp wrote: > The species/taxon handling shouldn't be a problem if you have the > NCBI taxonID and have preloaded the NCBI taxonomy. > > However, if it's a new species (i.e., the lookup of the NCBI taxonID > in the taxon table fails), then bioperl-db tries to create the > lineage based on what it finds in the species object. > > As the bug report says, the issue can be fixed, but it also looks > like the fix will break compatibility with earlier versions of > BioPerl. I think at some point that's fine, but I was wondering > whether that's the way it needs to be. > > -hilmar > > On Sep 11, 2007, at 12:16 PM, Chris Fields wrote: > > > I think one area of possible headache will be TAXON/TAXON_NAME. For > > instance, with BioPerl we kept running into genus/species parsing > > problems (virus, bacterial names) when going from seqrecord->object. > > Due to that we decided to greatly simplify Species parsing in Bioperl > > so there isn't any 'guessing' as to genus/species names; you get > > what's already there, nothing more. If one wants extra taxonomic > > information then one must use NCBI Taxonomy somehow. > > > > However, currently bioperl-db still splits into genus/species (acts > > like older BioPerl), which obviously clashes with current Bioperl > > behavior. Not sure how the other Bio* store this data; Richard? > > > > There is a BioPerl bug filed on this: > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=2092 > > > > chris > > > > On Sep 11, 2007, at 10:49 AM, Barry Moore wrote: > > > >> Well, the schema is the formal specification as to what goes where > >> and as long as your BioJava and BioPerl DB interface plays by the > >> rules of the schema, then yes you should be able to use both > >> languages on the same database. Of course the devil is in the > >> details and since I've only worked with the BioPerl interface I > >> don't know if that is in fact reality right now. I think what > >> Richard meant was there is not detailed human documentation about > >> where each bit of a GenBank record goes into what table and > >> column. Paul, I think you will find this document to be what you > >> are looking for - or at least as good as you'll get: go to http:// > >> cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/doc/? > >> cvsroot=biosql and look for schema-overview.txt. There is also a > >> ERD in pdf format which can help you get your head around the > >> schema. If you end up with specific questions about what's where, > >> send another e-mail or just load some files and go exploring. > >> > >> Barry > >> > >> On Sep 11, 2007, at 9:10 AM, Chris Fields wrote: > >> > >>> Here's a question I couldn't find the answer to: should any BioSQL- > >>> loaded data (via BioJava, BioPerl, etc) be expected to fully round > >>> trip across any BioSQL-utilizing language? In other words, if I use > >>> BioJava/Hibernate to load sequence data in to a BioSQL database and > >>> use BioPerl to work with the data, can one expect it to work? > >>> > >>> My guess is no, as long as there is no formal specification... > >>> > >>> chris > >>> > >>> On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: > >>> > >>>> -----BEGIN PGP SIGNED MESSAGE----- > >>>> Hash: SHA1 > >>>> > >>>> There is no formal specification for what goes where in BioSQL, but > >>>> you > >>>> can refer to the BioJava documentation for a good approximation of > >>>> where > >>>> a GenBank file should end up. The BioJava objects share similar > >>>> names to > >>>> the BioSQL tables and are mapped using Hibernate. > >>>> > >>>> The most useful parts of the docs are probably: > >>>> > >>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank > >>>> > >>>> and: > >>>> > >>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- > >>>> relational_mappings. > >>>> > >>>> cheers, > >>>> Richard > >>>> > >>>> Paul Davis wrote: > >>>>> I've been going over the biosql schema and I was wondering if > >>>>> there > >>>>> was a good place to read about examples of actual data that goes > >>>>> into > >>>>> each table. Specifically, I'm a bit confused about which parts > >>>>> of a > >>>>> genbank record go in which tables. > >>>>> > >>>>> Thanks, > >>>>> Paul Davis > >>>>> _______________________________________________ > >>>>> BioSQL-l mailing list > >>>>> BioSQL-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >>>>> > >>>> -----BEGIN PGP SIGNATURE----- > >>>> Version: GnuPG v1.4.2.2 (GNU/Linux) > >>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > >>>> > >>>> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd > >>>> Q8i8g2bUyB17L++fuSKXa+0= > >>>> =q8G2 > >>>> -----END PGP SIGNATURE----- > >>>> _______________________________________________ > >>>> BioSQL-l mailing list > >>>> BioSQL-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >>> > >>> Christopher Fields > >>> Postdoctoral Researcher > >>> Lab of Dr. Robert Switzer > >>> Dept of Biochemistry > >>> University of Illinois Urbana-Champaign > >>> > >>> > >>> > >>> _______________________________________________ > >>> BioSQL-l mailing list > >>> BioSQL-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >> > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > From hlapp at gmx.net Thu Sep 13 00:19:24 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 12 Sep 2007 20:19:24 -0400 Subject: [BioSQL-l] TAXON,TAXON_NAME, was Re: Description In-Reply-To: References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu>

Message-ID: <96E80B9C-2F46-43E0-80E4-30160AEDABD7@gmx.net> The code is in bioperl-db (which is a sub-repository of bioperl, as is bioperl-live). It makes no attempt at updating the nested-set values. That raises a good point - there is currently no script that would update that; the load_ncbi_taxonomy.pl script does recompute it, but will also want to load or update the NCBI taxonomy. It should be relatively easy to factor out the nested-set computing code into a separate stand-alone script. -hilmar On Sep 12, 2007, at 8:13 PM, Paul Davis wrote: > I glanced through the bioperl cvs a bit but couldn't find the part > where it tries to load a new taxonomy name. Does this go and try to > rebuild the nested sets information, or basically leave any inserted > taxonomic data (non-NCBI data) as nodes dangling outside the nested > sets information? > > Paul > > On 9/12/07, Hilmar Lapp wrote: >> The species/taxon handling shouldn't be a problem if you have the >> NCBI taxonID and have preloaded the NCBI taxonomy. >> >> However, if it's a new species (i.e., the lookup of the NCBI taxonID >> in the taxon table fails), then bioperl-db tries to create the >> lineage based on what it finds in the species object. >> >> As the bug report says, the issue can be fixed, but it also looks >> like the fix will break compatibility with earlier versions of >> BioPerl. I think at some point that's fine, but I was wondering >> whether that's the way it needs to be. >> >> -hilmar >> >> On Sep 11, 2007, at 12:16 PM, Chris Fields wrote: >> >>> I think one area of possible headache will be TAXON/TAXON_NAME. For >>> instance, with BioPerl we kept running into genus/species parsing >>> problems (virus, bacterial names) when going from seqrecord->object. >>> Due to that we decided to greatly simplify Species parsing in >>> Bioperl >>> so there isn't any 'guessing' as to genus/species names; you get >>> what's already there, nothing more. If one wants extra taxonomic >>> information then one must use NCBI Taxonomy somehow. >>> >>> However, currently bioperl-db still splits into genus/species (acts >>> like older BioPerl), which obviously clashes with current Bioperl >>> behavior. Not sure how the other Bio* store this data; Richard? >>> >>> There is a BioPerl bug filed on this: >>> >>> http://bugzilla.open-bio.org/show_bug.cgi?id=2092 >>> >>> chris >>> >>> On Sep 11, 2007, at 10:49 AM, Barry Moore wrote: >>> >>>> Well, the schema is the formal specification as to what goes where >>>> and as long as your BioJava and BioPerl DB interface plays by the >>>> rules of the schema, then yes you should be able to use both >>>> languages on the same database. Of course the devil is in the >>>> details and since I've only worked with the BioPerl interface I >>>> don't know if that is in fact reality right now. I think what >>>> Richard meant was there is not detailed human documentation about >>>> where each bit of a GenBank record goes into what table and >>>> column. Paul, I think you will find this document to be what you >>>> are looking for - or at least as good as you'll get: go to http:// >>>> cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/doc/? >>>> cvsroot=biosql and look for schema-overview.txt. There is also a >>>> ERD in pdf format which can help you get your head around the >>>> schema. If you end up with specific questions about what's where, >>>> send another e-mail or just load some files and go exploring. >>>> >>>> Barry >>>> >>>> On Sep 11, 2007, at 9:10 AM, Chris Fields wrote: >>>> >>>>> Here's a question I couldn't find the answer to: should any >>>>> BioSQL- >>>>> loaded data (via BioJava, BioPerl, etc) be expected to fully round >>>>> trip across any BioSQL-utilizing language? In other words, if >>>>> I use >>>>> BioJava/Hibernate to load sequence data in to a BioSQL database >>>>> and >>>>> use BioPerl to work with the data, can one expect it to work? >>>>> >>>>> My guess is no, as long as there is no formal specification... >>>>> >>>>> chris >>>>> >>>>> On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: >>>>> >>>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>>> Hash: SHA1 >>>>>> >>>>>> There is no formal specification for what goes where in >>>>>> BioSQL, but >>>>>> you >>>>>> can refer to the BioJava documentation for a good >>>>>> approximation of >>>>>> where >>>>>> a GenBank file should end up. The BioJava objects share similar >>>>>> names to >>>>>> the BioSQL tables and are mapped using Hibernate. >>>>>> >>>>>> The most useful parts of the docs are probably: >>>>>> >>>>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank >>>>>> >>>>>> and: >>>>>> >>>>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- >>>>>> relational_mappings. >>>>>> >>>>>> cheers, >>>>>> Richard >>>>>> >>>>>> Paul Davis wrote: >>>>>>> I've been going over the biosql schema and I was wondering if >>>>>>> there >>>>>>> was a good place to read about examples of actual data that goes >>>>>>> into >>>>>>> each table. Specifically, I'm a bit confused about which parts >>>>>>> of a >>>>>>> genbank record go in which tables. >>>>>>> >>>>>>> Thanks, >>>>>>> Paul Davis >>>>>>> _______________________________________________ >>>>>>> BioSQL-l mailing list >>>>>>> BioSQL-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>>>>> >>>>>> -----BEGIN PGP SIGNATURE----- >>>>>> Version: GnuPG v1.4.2.2 (GNU/Linux) >>>>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >>>>>> >>>>>> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd >>>>>> Q8i8g2bUyB17L++fuSKXa+0= >>>>>> =q8G2 >>>>>> -----END PGP SIGNATURE----- >>>>>> _______________________________________________ >>>>>> BioSQL-l mailing list >>>>>> BioSQL-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>>> >>>>> Christopher Fields >>>>> Postdoctoral Researcher >>>>> Lab of Dr. Robert Switzer >>>>> Dept of Biochemistry >>>>> University of Illinois Urbana-Champaign >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> BioSQL-l mailing list >>>>> BioSQL-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>> >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From paul.joseph.davis at gmail.com Thu Sep 13 00:24:08 2007 From: paul.joseph.davis at gmail.com (Paul Davis) Date: Wed, 12 Sep 2007 20:24:08 -0400 Subject: [BioSQL-l] TAXON,TAXON_NAME, was Re: Description In-Reply-To: <96E80B9C-2F46-43E0-80E4-30160AEDABD7@gmx.net> References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu>

<96E80B9C-2F46-43E0-80E4-30160AEDABD7@gmx.net> Message-ID: I was more wondering if there was an efficient way to recompute that information. As you seem to be confirming, I was faily certain that to update those values would require recalculating all values. Paul On 9/12/07, Hilmar Lapp wrote: > The code is in bioperl-db (which is a sub-repository of bioperl, as > is bioperl-live). > > It makes no attempt at updating the nested-set values. That raises a > good point - there is currently no script that would update that; the > load_ncbi_taxonomy.pl script does recompute it, but will also want to > load or update the NCBI taxonomy. It should be relatively easy to > factor out the nested-set computing code into a separate stand-alone > script. > > -hilmar > > On Sep 12, 2007, at 8:13 PM, Paul Davis wrote: > > > I glanced through the bioperl cvs a bit but couldn't find the part > > where it tries to load a new taxonomy name. Does this go and try to > > rebuild the nested sets information, or basically leave any inserted > > taxonomic data (non-NCBI data) as nodes dangling outside the nested > > sets information? > > > > Paul > > > > On 9/12/07, Hilmar Lapp wrote: > >> The species/taxon handling shouldn't be a problem if you have the > >> NCBI taxonID and have preloaded the NCBI taxonomy. > >> > >> However, if it's a new species (i.e., the lookup of the NCBI taxonID > >> in the taxon table fails), then bioperl-db tries to create the > >> lineage based on what it finds in the species object. > >> > >> As the bug report says, the issue can be fixed, but it also looks > >> like the fix will break compatibility with earlier versions of > >> BioPerl. I think at some point that's fine, but I was wondering > >> whether that's the way it needs to be. > >> > >> -hilmar > >> > >> On Sep 11, 2007, at 12:16 PM, Chris Fields wrote: > >> > >>> I think one area of possible headache will be TAXON/TAXON_NAME. For > >>> instance, with BioPerl we kept running into genus/species parsing > >>> problems (virus, bacterial names) when going from seqrecord->object. > >>> Due to that we decided to greatly simplify Species parsing in > >>> Bioperl > >>> so there isn't any 'guessing' as to genus/species names; you get > >>> what's already there, nothing more. If one wants extra taxonomic > >>> information then one must use NCBI Taxonomy somehow. > >>> > >>> However, currently bioperl-db still splits into genus/species (acts > >>> like older BioPerl), which obviously clashes with current Bioperl > >>> behavior. Not sure how the other Bio* store this data; Richard? > >>> > >>> There is a BioPerl bug filed on this: > >>> > >>> http://bugzilla.open-bio.org/show_bug.cgi?id=2092 > >>> > >>> chris > >>> > >>> On Sep 11, 2007, at 10:49 AM, Barry Moore wrote: > >>> > >>>> Well, the schema is the formal specification as to what goes where > >>>> and as long as your BioJava and BioPerl DB interface plays by the > >>>> rules of the schema, then yes you should be able to use both > >>>> languages on the same database. Of course the devil is in the > >>>> details and since I've only worked with the BioPerl interface I > >>>> don't know if that is in fact reality right now. I think what > >>>> Richard meant was there is not detailed human documentation about > >>>> where each bit of a GenBank record goes into what table and > >>>> column. Paul, I think you will find this document to be what you > >>>> are looking for - or at least as good as you'll get: go to http:// > >>>> cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/doc/? > >>>> cvsroot=biosql and look for schema-overview.txt. There is also a > >>>> ERD in pdf format which can help you get your head around the > >>>> schema. If you end up with specific questions about what's where, > >>>> send another e-mail or just load some files and go exploring. > >>>> > >>>> Barry > >>>> > >>>> On Sep 11, 2007, at 9:10 AM, Chris Fields wrote: > >>>> > >>>>> Here's a question I couldn't find the answer to: should any > >>>>> BioSQL- > >>>>> loaded data (via BioJava, BioPerl, etc) be expected to fully round > >>>>> trip across any BioSQL-utilizing language? In other words, if > >>>>> I use > >>>>> BioJava/Hibernate to load sequence data in to a BioSQL database > >>>>> and > >>>>> use BioPerl to work with the data, can one expect it to work? > >>>>> > >>>>> My guess is no, as long as there is no formal specification... > >>>>> > >>>>> chris > >>>>> > >>>>> On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: > >>>>> > >>>>>> -----BEGIN PGP SIGNED MESSAGE----- > >>>>>> Hash: SHA1 > >>>>>> > >>>>>> There is no formal specification for what goes where in > >>>>>> BioSQL, but > >>>>>> you > >>>>>> can refer to the BioJava documentation for a good > >>>>>> approximation of > >>>>>> where > >>>>>> a GenBank file should end up. The BioJava objects share similar > >>>>>> names to > >>>>>> the BioSQL tables and are mapped using Hibernate. > >>>>>> > >>>>>> The most useful parts of the docs are probably: > >>>>>> > >>>>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank > >>>>>> > >>>>>> and: > >>>>>> > >>>>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- > >>>>>> relational_mappings. > >>>>>> > >>>>>> cheers, > >>>>>> Richard > >>>>>> > >>>>>> Paul Davis wrote: > >>>>>>> I've been going over the biosql schema and I was wondering if > >>>>>>> there > >>>>>>> was a good place to read about examples of actual data that goes > >>>>>>> into > >>>>>>> each table. Specifically, I'm a bit confused about which parts > >>>>>>> of a > >>>>>>> genbank record go in which tables. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Paul Davis > >>>>>>> _______________________________________________ > >>>>>>> BioSQL-l mailing list > >>>>>>> BioSQL-l at lists.open-bio.org > >>>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >>>>>>> > >>>>>> -----BEGIN PGP SIGNATURE----- > >>>>>> Version: GnuPG v1.4.2.2 (GNU/Linux) > >>>>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > >>>>>> > >>>>>> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd > >>>>>> Q8i8g2bUyB17L++fuSKXa+0= > >>>>>> =q8G2 > >>>>>> -----END PGP SIGNATURE----- > >>>>>> _______________________________________________ > >>>>>> BioSQL-l mailing list > >>>>>> BioSQL-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >>>>> > >>>>> Christopher Fields > >>>>> Postdoctoral Researcher > >>>>> Lab of Dr. Robert Switzer > >>>>> Dept of Biochemistry > >>>>> University of Illinois Urbana-Champaign > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> BioSQL-l mailing list > >>>>> BioSQL-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >>>> > >>> > >>> Christopher Fields > >>> Postdoctoral Researcher > >>> Lab of Dr. Robert Switzer > >>> Dept of Biochemistry > >>> University of Illinois Urbana-Champaign > >>> > >>> > >>> > >>> _______________________________________________ > >>> BioSQL-l mailing list > >>> BioSQL-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > >> > >> > >> _______________________________________________ > >> BioSQL-l mailing list > >> BioSQL-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biosql-l > >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > From cjfields at uiuc.edu Thu Sep 13 01:42:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 12 Sep 2007 20:42:03 -0500 Subject: [BioSQL-l] TAXON,TAXON_NAME, was Re: Description In-Reply-To: References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu>

Message-ID: <9FF304F7-A318-487B-8346-1A4CA5E424B6@uiuc.edu> If one were using bioperl versions up to 1.5.1 the Bio::Species class doesn't implement a specific interface, whereas in 1.5.2 it inherits the new Bio::Taxon (and all methods are reimplemented to work with Bio::Taxon methods). Acc. to Sendu the long-term plan was to eventually deprecate Bio::Species and just use Bio::Taxon, with no 'guessing' of the genus/species that always borked seqrcord parsing. That 'guessing' is essentially what is going on with SpeciesAaptor now (Sendu's suggestion of 'old behavior', which triggered the exception in the bug report). I'll try to look into it in a few weeks when I have some more time; there are a number of bioperl-db bugs in bugzilla that need sorting through. My thought is still to use a transition module (TaxonAdaptor) which would eventually replace SpeciesAdaptor once Bio::Species is no more. chris On Sep 12, 2007, at 6:15 PM, Hilmar Lapp wrote: > The species/taxon handling shouldn't be a problem if you have the > NCBI taxonID and have preloaded the NCBI taxonomy. > > However, if it's a new species (i.e., the lookup of the NCBI > taxonID in the taxon table fails), then bioperl-db tries to create > the lineage based on what it finds in the species object. > > As the bug report says, the issue can be fixed, but it also looks > like the fix will break compatibility with earlier versions of > BioPerl. I think at some point that's fine, but I was wondering > whether that's the way it needs to be. > > -hilmar > > On Sep 11, 2007, at 12:16 PM, Chris Fields wrote: > >> I think one area of possible headache will be TAXON/TAXON_NAME. For >> instance, with BioPerl we kept running into genus/species parsing >> problems (virus, bacterial names) when going from seqrecord->object. >> Due to that we decided to greatly simplify Species parsing in Bioperl >> so there isn't any 'guessing' as to genus/species names; you get >> what's already there, nothing more. If one wants extra taxonomic >> information then one must use NCBI Taxonomy somehow. >> >> However, currently bioperl-db still splits into genus/species (acts >> like older BioPerl), which obviously clashes with current Bioperl >> behavior. Not sure how the other Bio* store this data; Richard? >> >> There is a BioPerl bug filed on this: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2092 >> >> chris >> >> On Sep 11, 2007, at 10:49 AM, Barry Moore wrote: >> >>> Well, the schema is the formal specification as to what goes where >>> and as long as your BioJava and BioPerl DB interface plays by the >>> rules of the schema, then yes you should be able to use both >>> languages on the same database. Of course the devil is in the >>> details and since I've only worked with the BioPerl interface I >>> don't know if that is in fact reality right now. I think what >>> Richard meant was there is not detailed human documentation about >>> where each bit of a GenBank record goes into what table and >>> column. Paul, I think you will find this document to be what you >>> are looking for - or at least as good as you'll get: go to http:// >>> cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/doc/? >>> cvsroot=biosql and look for schema-overview.txt. There is also a >>> ERD in pdf format which can help you get your head around the >>> schema. If you end up with specific questions about what's where, >>> send another e-mail or just load some files and go exploring. >>> >>> Barry >>> >>> On Sep 11, 2007, at 9:10 AM, Chris Fields wrote: >>> >>>> Here's a question I couldn't find the answer to: should any BioSQL- >>>> loaded data (via BioJava, BioPerl, etc) be expected to fully round >>>> trip across any BioSQL-utilizing language? In other words, if I >>>> use >>>> BioJava/Hibernate to load sequence data in to a BioSQL database and >>>> use BioPerl to work with the data, can one expect it to work? >>>> >>>> My guess is no, as long as there is no formal specification... >>>> >>>> chris >>>> >>>> On Sep 11, 2007, at 9:54 AM, Richard Holland wrote: >>>> >>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>> Hash: SHA1 >>>>> >>>>> There is no formal specification for what goes where in BioSQL, >>>>> but >>>>> you >>>>> can refer to the BioJava documentation for a good approximation of >>>>> where >>>>> a GenBank file should end up. The BioJava objects share similar >>>>> names to >>>>> the BioSQL tables and are mapped using Hibernate. >>>>> >>>>> The most useful parts of the docs are probably: >>>>> >>>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank >>>>> >>>>> and: >>>>> >>>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object- >>>>> relational_mappings. >>>>> >>>>> cheers, >>>>> Richard >>>>> >>>>> Paul Davis wrote: >>>>>> I've been going over the biosql schema and I was wondering if >>>>>> there >>>>>> was a good place to read about examples of actual data that goes >>>>>> into >>>>>> each table. Specifically, I'm a bit confused about which parts >>>>>> of a >>>>>> genbank record go in which tables. >>>>>> >>>>>> Thanks, >>>>>> Paul Davis >>>>>> _______________________________________________ >>>>>> BioSQL-l mailing list >>>>>> BioSQL-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>>>> >>>>> -----BEGIN PGP SIGNATURE----- >>>>> Version: GnuPG v1.4.2.2 (GNU/Linux) >>>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >>>>> >>>>> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd >>>>> Q8i8g2bUyB17L++fuSKXa+0= >>>>> =q8G2 >>>>> -----END PGP SIGNATURE----- >>>>> _______________________________________________ >>>>> BioSQL-l mailing list >>>>> BioSQL-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Thu Sep 13 14:37:44 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 13 Sep 2007 10:37:44 -0400 Subject: [BioSQL-l] TAXON,TAXON_NAME, was Re: Description In-Reply-To: References: <46E6AC3A.5000203@ebi.ac.uk> <0F048B18-F029-4176-AD9E-3795BB020B2D@uiuc.edu> <542AD8D9-E7ED-41E8-AEA7-1426536EA6A8@genetics.utah.edu>