From hlapp at gmx.net Mon Feb 2 10:58:03 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 2 Feb 2009 10:58:03 -0500 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <93b45ca50901310303t37905e8ak3819c05f4b94c287@mail.gmail.com> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> <49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu> <4981EAEC.4070508@compbio.dundee.ac.uk> <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> <903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net> <5B046A75-AFD3-4CEB-B190-A27106828E9C@illinois.edu> <99475964-CFB3-4A27-8024-8A14876533E0@illinois.edu> <93b45ca50901310303t37905e8ak3819c05f4b94c287@mail.gmail.com> Message-ID: I agree, but I also think one of the questions that should be devoted significant thought to is in which ways and to what extend such a web- service API needs to be or has to be different from DAS (or DAS/2, which allows write-back). The DAS model is a bit different from BioSQL's in that it doesn't distinguish between sequences and sequence features. But I'm not that alone suffices to motivate a completely different API. -hilmar On Jan 31, 2009, at 6:03 AM, Mark Schreiber wrote: > Hi - > > My feeling is that the diversity of languages and frameworks within > languages would mean that a generic web front end to BioSQL will and > should never materialize. What would be a lot more sensible is a > generic API in the form of a webservice or collection of webservices > that could be used by (theoretically) any web frame work to generate a > website. > > User preferences and requirements will be far too diverse for a > generic web front end. > > - Mark > > On 1/31/09, Chris Fields wrote: >> Another article (as pointed out by Heikki on bioperl-l): >> >> http://www.heise-online.co.uk/open/Healthcheck-Perl-The-Perl-Future--/features/112388/0 >> >> The last section is all on MVC-oriented frameworks. >> >> chris >> >> On Jan 30, 2009, at 1:57 PM, Gudmundur A. Thorisson wrote: >> >>> We use Catalyst MVC framework for our project (http://www.hgvbaseg2p.org >>> ). Very good stuff, we combine it with the DBIx::Class ORM and >>> Template Toolkit as the templating engine. Totally recommended. >>> >>> >>> Mummi >>> >>> On 30 Jan 2009, at 19:45, Chris Fields wrote: >>>>> >>>> >>>> Perl web application framework: Catalyst and Jifty (have not tried >>>> them myself). RoR gets a lot of press, but I understand the RoR >>>> devs tend not to listen to the core ruby devs and (as a >>>> consequence) had recently run into issues with the 1.8.7 ruby >>>> release, detailed by the always-entertaining chromatic here: >>>> >>>> http://use.perl.org/~chromatic/journal/37125 >>>> >>>> chris >>>> >>>>> My $0.02, and I'd be keen so see what comes out of this. If >>>>> there's something I can do to tip the balance towards something >>>>> tangible happening, let me know. >>>>> >>>>> -hilmar >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lpritc at scri.ac.uk Tue Feb 3 04:50:12 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Tue, 03 Feb 2009 09:50:12 +0000 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: Message-ID: On 30/01/2009 19:57, "Gudmundur A. Thorisson" wrote: > We use Catalyst MVC framework for our project (http:// > www.hgvbaseg2p.org). Very good stuff, we combine it with the > DBIx::Class ORM and Template Toolkit as the templating engine. Totally > recommended. Just my twopenn'orth... We're using a mixture of Biopython, Turbogears and SQLAlchemy here to provide a web interface to BioSQL. We have a specific use-case (viewing comparative genomic data) which is read only, for now. I agree with Mark and Hilmar in general - a consistent, generic API could benefit and make easier a wide range of possible web-based interactions with the database. I think that this would be particularly useful for webservice-based db updates, as an interface that's well-tested by the community is potentially more robust than a home-rolled solution. There are cases (such as ours) where additional tables have been grafted onto BioSQL for local use where such an API would likely also need to be extended for consistency. For this and other reasons people may still be motivated to use one of the several other options available on their own projects. This doesn't negate the worth of a common web API, though. L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From markjschreiber at gmail.com Tue Feb 3 06:24:34 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 3 Feb 2009 19:24:34 +0800 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> <4981EAEC.4070508@compbio.dundee.ac.uk> <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> <903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net> <5B046A75-AFD3-4CEB-B190-A27106828E9C@illinois.edu> <99475964-CFB3-4A27-8024-8A14876533E0@illinois.edu> <93b45ca50901310303t37905e8ak3819c05f4b94c287@mail.gmail.com> Message-ID: <93b45ca50902030324k1bb4891elc47edd4a765a805a@mail.gmail.com> I'm not against a DAS API to BioSQL but one strong point for a webservices API is the number of generic programing tools and workflow tools that can consume webservices. Maybe the DAS API could be a wrapper to a webservices API? Could this be done by intercepting the DAS calls and reformatting them as webservice calls? - Mark On Mon, Feb 2, 2009 at 11:58 PM, Hilmar Lapp wrote: > I agree, but I also think one of the questions that should be devoted > significant thought to is in which ways and to what extend such a > web-service API needs to be or has to be different from DAS (or DAS/2, which > allows write-back). > > The DAS model is a bit different from BioSQL's in that it doesn't > distinguish between sequences and sequence features. But I'm not that alone > suffices to motivate a completely different API. > > -hilmar > > > On Jan 31, 2009, at 6:03 AM, Mark Schreiber wrote: > > Hi - >> >> My feeling is that the diversity of languages and frameworks within >> languages would mean that a generic web front end to BioSQL will and >> should never materialize. What would be a lot more sensible is a >> generic API in the form of a webservice or collection of webservices >> that could be used by (theoretically) any web frame work to generate a >> website. >> >> User preferences and requirements will be far too diverse for a >> generic web front end. >> >> - Mark >> >> On 1/31/09, Chris Fields wrote: >> >>> Another article (as pointed out by Heikki on bioperl-l): >>> >>> >>> http://www.heise-online.co.uk/open/Healthcheck-Perl-The-Perl-Future--/features/112388/0 >>> >>> The last section is all on MVC-oriented frameworks. >>> >>> chris >>> >>> On Jan 30, 2009, at 1:57 PM, Gudmundur A. Thorisson wrote: >>> >>> We use Catalyst MVC framework for our project ( >>>> http://www.hgvbaseg2p.org >>>> ). Very good stuff, we combine it with the DBIx::Class ORM and >>>> Template Toolkit as the templating engine. Totally recommended. >>>> >>>> >>>> Mummi >>>> >>>> On 30 Jan 2009, at 19:45, Chris Fields wrote: >>>> >>>>> >>>>>> >>>>> Perl web application framework: Catalyst and Jifty (have not tried >>>>> them myself). RoR gets a lot of press, but I understand the RoR >>>>> devs tend not to listen to the core ruby devs and (as a >>>>> consequence) had recently run into issues with the 1.8.7 ruby >>>>> release, detailed by the always-entertaining chromatic here: >>>>> >>>>> http://use.perl.org/~chromatic/journal/37125 >>>>> >>>>> chris >>>>> >>>>> My $0.02, and I'd be keen so see what comes out of this. If >>>>>> there's something I can do to tip the balance towards something >>>>>> tangible happening, let me know. >>>>>> >>>>>> -hilmar >>>>>> >>>>> _______________________________________________ >>>>> BioSQL-l mailing list >>>>> BioSQL-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>>> >>>> >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>> >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >>> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > From michael.watson at bbsrc.ac.uk Wed Feb 4 06:01:36 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed, 4 Feb 2009 11:01:36 -0000 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <93b45ca50901310303t37905e8ak3819c05f4b94c287@mail.gmail.com> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk><49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu><4981EAEC.4070508@compbio.dundee.ac.uk><982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu><8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk><903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net><5B046A75-AFD3-4CEB-B190-A27106828E9C@illinois.edu><99475964-CFB3-4A27-8024-8A14876533E0@illinois.edu> <93b45ca50901310303t37905e8ak3819c05f4b94c287@mail.gmail.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9507E27183@iahce2ksrv1.iah.bbsrc.ac.uk> Hi I think the conversation is splitting in two, which is great! First of all, I can see a need for an API, and web-services are a very interesting way forward, especially as they can be used in many different systems. I'd certainly like to see something develop along those lines. However, I would like to continue the front-end conversation. A web-services API isn't a front-end, it's a means to a front-end, and I disagree that there is not enough commonality to develop a web-based front-end. There are a huge number of groups who want to manage a sequence collection, they want to be able to search that sequence collection, list and browse it, export them as EMBL/GenBank, import from EMBL/GenBank. Now, if someone was to write import.php, export.php, search.php and browse.php - well, on top of a BioSQL database, I think that would be an incredibly powerful app. Mick -----Original Message----- From: biosql-l-bounces at lists.open-bio.org [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of Mark Schreiber Sent: 31 January 2009 11:04 To: Chris Fields Cc: biosql-l at lists.open-bio.org Subject: Re: [BioSQL-l] Web front-ends to BioSQL Hi - My feeling is that the diversity of languages and frameworks within languages would mean that a generic web front end to BioSQL will and should never materialize. What would be a lot more sensible is a generic API in the form of a webservice or collection of webservices that could be used by (theoretically) any web frame work to generate a website. User preferences and requirements will be far too diverse for a generic web front end. - Mark On 1/31/09, Chris Fields wrote: > Another article (as pointed out by Heikki on bioperl-l): > > http://www.heise-online.co.uk/open/Healthcheck-Perl-The-Perl-Future--/fe atures/112388/0 > > The last section is all on MVC-oriented frameworks. > > chris > > On Jan 30, 2009, at 1:57 PM, Gudmundur A. Thorisson wrote: > >> We use Catalyst MVC framework for our project (http://www.hgvbaseg2p.org >> ). Very good stuff, we combine it with the DBIx::Class ORM and >> Template Toolkit as the templating engine. Totally recommended. >> >> >> Mummi >> >> On 30 Jan 2009, at 19:45, Chris Fields wrote: >>>> >>> >>> Perl web application framework: Catalyst and Jifty (have not tried >>> them myself). RoR gets a lot of press, but I understand the RoR >>> devs tend not to listen to the core ruby devs and (as a >>> consequence) had recently run into issues with the 1.8.7 ruby >>> release, detailed by the always-entertaining chromatic here: >>> >>> http://use.perl.org/~chromatic/journal/37125 >>> >>> chris >>> >>>> My $0.02, and I'd be keen so see what comes out of this. If >>>> there's something I can do to tip the balance towards something >>>> tangible happening, let me know. >>>> >>>> -hilmar >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > _______________________________________________ BioSQL-l mailing list BioSQL-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biosql-l From markjschreiber at gmail.com Wed Feb 4 06:33:19 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 4 Feb 2009 19:33:19 +0800 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9507E27183@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> <4981EAEC.4070508@compbio.dundee.ac.uk> <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> <903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net> <5B046A75-AFD3-4CEB-B190-A27106828E9C@illinois.edu> <99475964-CFB3-4A27-8024-8A14876533E0@illinois.edu> <93b45ca50901310303t37905e8ak3819c05f4b94c287@mail.gmail.com> <8975119BCD0AC5419D61A9CF1A923E9507E27183@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <93b45ca50902040333r3cb95bd8xe242504af3c7587a@mail.gmail.com> Interestingly a few years back Richard Holland and I tried to publish a web front to BioSQL that used Java, Struts and some bits of BioJava. It got shot down by the reviewers on the basis that there was already too many of these kinds of things like GMOD etc. Unfortunately no-one agreed with the idea that a front end to BioSQL was the unique part. Not that one or two opinionated reviewers (possibly reading this list) should be enough to put you off. I think they missed the point. - Mark On Wed, Feb 4, 2009 at 7:01 PM, michael watson (IAH-C) < michael.watson at bbsrc.ac.uk> wrote: > Hi > > I think the conversation is splitting in two, which is great! > > First of all, I can see a need for an API, and web-services are a very > interesting way forward, especially as they can be used in many > different systems. I'd certainly like to see something develop along > those lines. > > However, I would like to continue the front-end conversation. A > web-services API isn't a front-end, it's a means to a front-end, and I > disagree that there is not enough commonality to develop a web-based > front-end. There are a huge number of groups who want to manage a > sequence collection, they want to be able to search that sequence > collection, list and browse it, export them as EMBL/GenBank, import from > EMBL/GenBank. Now, if someone was to write import.php, export.php, > search.php and browse.php - well, on top of a BioSQL database, I think > that would be an incredibly powerful app. > > Mick > > -----Original Message----- > From: biosql-l-bounces at lists.open-bio.org > [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of Mark Schreiber > Sent: 31 January 2009 11:04 > To: Chris Fields > Cc: biosql-l at lists.open-bio.org > Subject: Re: [BioSQL-l] Web front-ends to BioSQL > > Hi - > > My feeling is that the diversity of languages and frameworks within > languages would mean that a generic web front end to BioSQL will and > should never materialize. What would be a lot more sensible is a > generic API in the form of a webservice or collection of webservices > that could be used by (theoretically) any web frame work to generate a > website. > > User preferences and requirements will be far too diverse for a > generic web front end. > > - Mark > > On 1/31/09, Chris Fields wrote: > > Another article (as pointed out by Heikki on bioperl-l): > > > > > http://www.heise-online.co.uk/open/Healthcheck-Perl-The-Perl-Future--/fe > atures/112388/0 > > > > The last section is all on MVC-oriented frameworks. > > > > chris > > > > On Jan 30, 2009, at 1:57 PM, Gudmundur A. Thorisson wrote: > > > >> We use Catalyst MVC framework for our project > (http://www.hgvbaseg2p.org > >> ). Very good stuff, we combine it with the DBIx::Class ORM and > >> Template Toolkit as the templating engine. Totally recommended. > >> > >> > >> Mummi > >> > >> On 30 Jan 2009, at 19:45, Chris Fields wrote: > >>>> > >>> > >>> Perl web application framework: Catalyst and Jifty (have not tried > >>> them myself). RoR gets a lot of press, but I understand the RoR > >>> devs tend not to listen to the core ruby devs and (as a > >>> consequence) had recently run into issues with the 1.8.7 ruby > >>> release, detailed by the always-entertaining chromatic here: > >>> > >>> http://use.perl.org/~chromatic/journal/37125 > >>> > >>> chris > >>> > >>>> My $0.02, and I'd be keen so see what comes out of this. If > >>>> there's something I can do to tip the balance towards something > >>>> tangible happening, let me know. > >>>> > >>>> -hilmar > >>> _______________________________________________ > >>> BioSQL-l mailing list > >>> BioSQL-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >> > >> _______________________________________________ > >> BioSQL-l mailing list > >> BioSQL-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > From jimp at compbio.dundee.ac.uk Tue Feb 10 14:14:01 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Tue, 10 Feb 2009 19:14:01 +0000 Subject: [BioSQL-l] adding non-positional features via bioperl to a biosql database Message-ID: <4991D1F9.30806@compbio.dundee.ac.uk> Hi. Apologies if this is not the right place to post Bioperl/BioSQL issues, feel free to tell me where to go, after you've read the following: I have been using a sequence pipeline to add in non-positional features to sequences whilst uploading to a bioSQL database. A fragment of the code I tried to use is below: sub process_seq { my ($self, $seq) = @_; my ($dbid, $id) = extract_dbid($seq); my $tags = {'label'=>"".$dbid."_$id", 'notes'=>["".$dbid.":$id"]}; my $feat = Bio::SeqFeature::Generic->new( -start=>'0',-end=>'0', -primary_tag => 'dbref', -tag=>$tags, -strand => 0, -source_tag => 'ATB'); $seq->add_SeqFeature($feat); $seq->version('1') $seq->alphabet('protein') return $seq; } When I use this, the sequences are uploaded fine, and they have the correct non-positional features when I look at the tables, and when I access the database via Biojava. However, when I try to dump any of the features with Bioperl I get the following warning : --------------------- WARNING --------------------- MSG: Calling end without a defined start position --------------------------------------------------- And if I try to add any more features to the sequence and then store the updated object I get the following exception in addition to the above warning. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: number of slots must equal the number of values STACK: Error::throw STACK: Bio::Root::Root::throw /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/Root/Root.pm:357 STACK: Bio::DB::BioSQL::BaseDriver::update_object /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/DB/BioSQL/BaseDriver.pm:1075 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:269 STACK: Bio::DB::Persistent::PersistentObject::store /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/DB/Persistent/PersistentObject.pm:271 STACK: Bio::DB::BioSQL::SeqFeatureAdaptor::store_children /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/DB/BioSQL/SeqFeatureAdaptor.pm:278 Can someone help me out here ? It seems that bioperl doesn't like features with a start/end of '0' - in which case, how do I create non-positional sequence features in a way that bioperl likes ? I'm using a nightly build from December 2008 - but there have been (afaict) no patches to the biosql or Feature::Generic which would fix this behaviour. thanks. jim -- ------------------------------------------------------------------- J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk The University of Dundee is a Scottish Registered Charity, No. SC015096. From hlapp at gmx.net Tue Feb 10 15:41:02 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 10 Feb 2009 15:41:02 -0500 Subject: [BioSQL-l] adding non-positional features via bioperl to a biosql database In-Reply-To: <4991D1F9.30806@compbio.dundee.ac.uk> References: <4991D1F9.30806@compbio.dundee.ac.uk> Message-ID: <5D0999B6-F67D-4C33-9A96-B0216135FC8D@gmx.net> Hi James, BioPerl (and BioSQL) use 1-based coordinates for features, so coordinate 0 risks being treated as undefined, but just not consistently so. If what you want is non-positional features, have you tried not specifying the positional attributes at all? Furthermore, in BioPerl and BioSQL lingo a feature really is an annotation that has a position. So, whereas that's not strictly enforced I think (and may in fact be different in Biojava), did you conclude that natively non-positional sequence annotation using one of the Bio::Annotation classes (and adding it through $seq->annotation- >add_Annotation()) wouldn't work for your purposes? -hilmar On Feb 10, 2009, at 2:14 PM, James Procter wrote: > > Hi. Apologies if this is not the right place to post Bioperl/BioSQL > issues, feel free to tell me where to go, after you've read the > following: > > I have been using a sequence pipeline to add in non-positional > features > to sequences whilst uploading to a bioSQL database. A fragment of the > code I tried to use is below: > > sub process_seq { > my ($self, $seq) = @_; > my ($dbid, $id) = extract_dbid($seq); > my $tags = {'label'=>"".$dbid."_$id", > 'notes'=>["".$dbid.":$id"]}; > my $feat = Bio::SeqFeature::Generic->new( > -start=>'0',-end=>'0', > -primary_tag => > 'dbref', > -tag=>$tags, > -strand => 0, > -source_tag => 'ATB'); > $seq->add_SeqFeature($feat); > $seq->version('1') > $seq->alphabet('protein') > return $seq; > } > > When I use this, the sequences are uploaded fine, and they have the > correct non-positional features when I look at the tables, and when I > access the database via Biojava. However, when I try to dump any of > the > features with Bioperl I get the following warning : > > --------------------- WARNING --------------------- > MSG: Calling end without a defined start position > --------------------------------------------------- > > And if I try to add any more features to the sequence and then store > the > updated object I get the following exception in addition to the above > warning. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: number of slots must equal the number of values > STACK: Error::throw > STACK: Bio::Root::Root::throw > /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/Root/Root.pm:357 > STACK: Bio::DB::BioSQL::BaseDriver::update_object > /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/DB/BioSQL/ > BaseDriver.pm:1075 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store > /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:269 > STACK: Bio::DB::Persistent::PersistentObject::store > /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/DB/Persistent/ > PersistentObject.pm:271 > STACK: Bio::DB::BioSQL::SeqFeatureAdaptor::store_children > /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/DB/BioSQL/ > SeqFeatureAdaptor.pm:278 > > Can someone help me out here ? It seems that bioperl doesn't like > features with a start/end of '0' - in which case, how do I create > non-positional sequence features in a way that bioperl likes ? > > I'm using a nightly build from December 2008 - but there have been > (afaict) no patches to the biosql or Feature::Generic which would fix > this behaviour. > > thanks. > jim > > > -- > ------------------------------------------------------------------- > J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group > Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk > The University of Dundee is a Scottish Registered Charity, No. > SC015096. > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jimp at compbio.dundee.ac.uk Tue Feb 10 18:25:14 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Tue, 10 Feb 2009 23:25:14 +0000 Subject: [BioSQL-l] adding non-positional features via bioperl to a biosql database In-Reply-To: <5D0999B6-F67D-4C33-9A96-B0216135FC8D@gmx.net> References: <4991D1F9.30806@compbio.dundee.ac.uk> <5D0999B6-F67D-4C33-9A96-B0216135FC8D@gmx.net> Message-ID: <49920CDA.40601@compbio.dundee.ac.uk> Thanks for the reply, Hilmar. Hilmar Lapp wrote: > BioPerl (and BioSQL) use 1-based coordinates for features, so coordinate > 0 risks being treated as undefined, but just not consistently so. erm. yes. But there lies the problem. There's a convention in gff (and DAS) that '0' corresponds to '.' and both mean 'non-position', bioperl's GFF output module actually follows that convention (ie start of '0' leads to a '.' in the start column of the gff feature). BioSQL and BioJava also do this just fine, but whilst BioPerl allows features with a '0' start position to be persisted, it then cannot work with the feature after it's been recovered from the DB. This looks like a bug to me. . BioSQL and > If what you want is non-positional features, have you tried not > specifying the positional attributes at all? Yep. No positional attributes raises an error on store, which is what I'd expect. > Furthermore, in BioPerl and BioSQL lingo a feature really is an > annotation that has a position. So, whereas that's not strictly enforced > I think (and may in fact be different in Biojava), did you conclude that > natively non-positional sequence annotation using one of the > Bio::Annotation classes (and adding it through > $seq->annotation->add_Annotation()) wouldn't work for your purposes? I'm actually trying to find that happy common ground where existing biojava and bioperl binding interpretations of BioSQL meet. I tried adding annotation directly to the sequence object, but couldn't work out where they appeared in Biojava. I then discovered that start=end='0' did what I wanted, and stopped without checking that I could then add more features afterwards. Seems like I should have tried harder :^/ If the above seems like a bug, then I'm happy to raise one. I'd like to see this fixed/cleared up before the next bioperl release if possible. I'll also try and make a start on that ontology map page that we discussed on list last December. Cheers. Jim. From hlapp at gmx.net Tue Feb 10 18:45:40 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 10 Feb 2009 18:45:40 -0500 Subject: [BioSQL-l] adding non-positional features via bioperl to a biosql database In-Reply-To: <49920CDA.40601@compbio.dundee.ac.uk> References: <4991D1F9.30806@compbio.dundee.ac.uk> <5D0999B6-F67D-4C33-9A96-B0216135FC8D@gmx.net> <49920CDA.40601@compbio.dundee.ac.uk> Message-ID: On Feb 10, 2009, at 6:25 PM, James Procter wrote: > BioSQL and BioJava also do this just fine, but whilst BioPerl allows > features with > a '0' start position to be persisted, it then cannot work with the > feature after it's been recovered from the DB. This looks like a bug > to me. I think the fact that it doesn't raise an error speaks much more to the fact that 1-based coordinates aren't fully enforced than that a coordinate of 0 is fully supported. If the coordinate is undef you probably get the same translation to '.' in GFF, and the fact that you get that also for 0 is probably simply due to 0 and undef both evaluating to false in Perl if in an if clause. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Feb 10 18:47:06 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 10 Feb 2009 18:47:06 -0500 Subject: [BioSQL-l] adding non-positional features via bioperl to a biosql database In-Reply-To: <49920CDA.40601@compbio.dundee.ac.uk> References: <4991D1F9.30806@compbio.dundee.ac.uk> <5D0999B6-F67D-4C33-9A96-B0216135FC8D@gmx.net> <49920CDA.40601@compbio.dundee.ac.uk> Message-ID: <53AAB3D4-AD43-42E3-AB74-8DC760019BC9@gmx.net> On Feb 10, 2009, at 6:25 PM, James Procter wrote: > I tried adding annotation directly to the sequence object, but > couldn't work out > where they appeared in Biojava. I think this is where the bug is more likely, in that this is something that I think we will all agree that it should work but apparently doesn't. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jimp at compbio.dundee.ac.uk Wed Feb 11 04:03:04 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Wed, 11 Feb 2009 09:03:04 +0000 Subject: [BioSQL-l] adding non-positional features via bioperl to a biosql database In-Reply-To: <53AAB3D4-AD43-42E3-AB74-8DC760019BC9@gmx.net> References: <4991D1F9.30806@compbio.dundee.ac.uk> <5D0999B6-F67D-4C33-9A96-B0216135FC8D@gmx.net> <49920CDA.40601@compbio.dundee.ac.uk> <53AAB3D4-AD43-42E3-AB74-8DC760019BC9@gmx.net> Message-ID: <49929448.5050301@compbio.dundee.ac.uk> Hilmar Lapp wrote: > > On Feb 10, 2009, at 6:25 PM, James Procter wrote: > >> I tried adding annotation directly to the sequence object, but >> couldn't work out >> where they appeared in Biojava. > > I think this is where the bug is more likely, in that this is something > that I think we will all agree that it should work but apparently doesn't. Yes. I would have used that method if it appeared to work with the existing biojava codebase for retrieving annotation from BioSQL and serving it over DAS. I have to check here, but I think that the other reason I didn't use the annotation object (in the current bioperl 1.6 state) was that it does not support all the attributes needed for this - and that was probably why it was not used in the biojava implementation, either. I'll verify this and post again later. Jim. -- ------------------------------------------------------------------- J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk The University of Dundee is a Scottish Registered Charity, No. SC015096. From jimp at compbio.dundee.ac.uk Wed Feb 11 10:22:32 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Wed, 11 Feb 2009 15:22:32 +0000 Subject: [BioSQL-l] adding non-positional features via bioperl to a biosql database In-Reply-To: References: <4991D1F9.30806@compbio.dundee.ac.uk> <5D0999B6-F67D-4C33-9A96-B0216135FC8D@gmx.net> <49920CDA.40601@compbio.dundee.ac.uk> <49929318.4000700@compbio.dundee.ac.uk> Message-ID: <4992ED38.4030602@compbio.dundee.ac.uk> More on this - see below for off list discussion : Hilmar Lapp wrote: > > On Feb 11, 2009, at 3:58 AM, James Procter wrote: > >> Hilmar Lapp wrote: >>> I think the fact that it doesn't raise an error speaks much more to the >>> fact that 1-based coordinates aren't fully enforced than that a >>> coordinate of 0 is fully supported. If the coordinate is undef you >>> probably get the same translation to '.' in GFF, and the fact that you >>> get that also for 0 is probably simply due to 0 and undef both >>> evaluating to false in Perl if in an if clause. >> The latter is a specific bug and should really be a testcase. Bioperl-DB >> should not allow a feature with invalid start/end to be persisted - >> because it makes that sequence feature inaccessible, and throws fatal >> errors when any additional information is persisted on the sequence. > > > I see your reasoning, and I agree that there is a bug here somewhere, > and that there should be a test case. I guess the way I see it is that > Bioperl-db is an ORM. It doesn't judge Bioperl - it assumes that if you > give it a BioPerl object to persist that BioPerl allowed you to create > then either you or BioPerl or both knew what they were doing and will > just try to obey the order to persist the object. So I think Bioperl-db > is the wrong place to put semantic validity checks on BioPerl objects. yes. The problem is earlier on - bioperl's generic feature module allowing a feature to be created with start==0. > > That said, maybe BioSQL shouldn't permit entries that don't comply with > the model definition, as in BioSQL aiming to be the Bio* interoperable > reference model. So maybe what we should do is put a constraint on > location that prohibits the start and end to be zero. I agree. I even had a discussion today with a colleague about ensuring a schema is sufficiently specified that different user groups do not use it in incompatible ways. I think its important that the schema is sufficiently constrained... at least, when everyone can agree on the constraint! > Not sure why you thought this isn't relevant to BioSQL anymore :-) Heh - I thought I was going into the realms of bioperl/db test cases, rather than purely biosql relevant discussion. Jim From rjalves at igc.gulbenkian.pt Wed Feb 11 13:15:16 2009 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Wed, 11 Feb 2009 18:15:16 +0000 Subject: [BioSQL-l] BioSQL - MySQL index usage Message-ID: <499315B4.6020001@igc.gulbenkian.pt> Greetings everyone, I have been using BioSQL as the current platform to explore taxonomy and I must say I'm quite happy with it. Recently however the complexity/ammount of queries we are using is raising an additional constraint, time. After looking into way of optimizing the queries I noticed that in most cases the left_value, right_value (UNIQUE) indexes are not used even though they exist. I did some quick tests and these were the results: mysql> reset query cache; select * from taxon where left_value < 97224 AND right_value > 97225; (...) 7 rows in set (0.31 sec) mysql> reset query cache; select * from taxon FORCE INDEX (left_value, right_value) where left_value < 97224 AND right_value > 97225; (...) 7 rows in set (0.15 sec) So my question is, does anyone know a faster way to achieve the same result? Also if anyone knows how to make MySQL use the index without the explicit FORCE INDEX syntax I would be happy to give it a try. Thanks, Renato From giles.weaver at googlemail.com Thu Feb 12 04:47:00 2009 From: giles.weaver at googlemail.com (Giles Weaver) Date: Thu, 12 Feb 2009 09:47:00 +0000 Subject: [BioSQL-l] Fwd: BioSQL - MySQL index usage In-Reply-To: <1d06cd5d0902120141r7d9b79d1rf8f0d4f1cc842175@mail.gmail.com> References: <499315B4.6020001@igc.gulbenkian.pt> <1d06cd5d0902120141r7d9b79d1rf8f0d4f1cc842175@mail.gmail.com> Message-ID: <1d06cd5d0902120147g7fbaaf27h94d610b30e85ffc0@mail.gmail.com> I've also found the performance of queries across the taxon and taxon_name tables to be inadequate on occasion. My solutions were: a) to remove all taxa outside of my area of interest from the database, and remove non scientific names from taxon_name b) to create a temporary table (join of taxon and taxon_name) of my taxa of interest and query against that instead of the taxon and taxon_name tables. I can't remember which of these was more effective, performance wise, but obviously neither is an ideal solution. 2009/2/11 Renato Alves Greetings everyone, > > I have been using BioSQL as the current platform to explore taxonomy and I > must say I'm quite happy with it. > > Recently however the complexity/ammount of queries we are using is raising > an additional constraint, time. > > After looking into way of optimizing the queries I noticed that in most > cases the left_value, right_value (UNIQUE) indexes are not used even though > they exist. > > I did some quick tests and these were the results: > > mysql> reset query cache; select * from taxon where left_value < 97224 AND > right_value > 97225; > (...) > 7 rows in set (0.31 sec) > > mysql> reset query cache; select * from taxon FORCE INDEX (left_value, > right_value) where left_value < 97224 AND right_value > 97225; > (...) > 7 rows in set (0.15 sec) > > So my question is, does anyone know a faster way to achieve the same > result? > > Also if anyone knows how to make MySQL use the index without the explicit > FORCE INDEX syntax I would be happy to give it a try. > > Thanks, > Renato > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > From biopython at maubp.freeserve.co.uk Wed Feb 18 09:27:55 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Feb 2009 14:27:55 +0000 Subject: [BioSQL-l] Importing GFF3 files into a BioSQL database? Message-ID: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> Hi everyone, Do any of the Bio* projects currently let you import a GFF3 (or even a GFF2) file into a BioSQL database? Looking at some of the examples on http://www.sequenceontology.org/gff3.shtml this looks possible. I assume each GFF file normally describes features on a single plasmid/chromosome, meaning a single bioentry table entry. I would expect each GFF feature to become a seqfeature table entry (with a location table entry for each line describing its location), and the main sequence (if present in the GFF file), would be a biosequence table entry. So far this isn't too complicated. The GFF3 documentation gives some example of "parent" or rather "part-of" relationships between features (e.g. an exon which is part of three parent mRNA features). Perhaps three entries in the seqfeature_relationship table could record this. Also, GFF3 files seem to be very organised with regards ontologies - something we have touched on before on this mailing list. My reason for asking regards adding GFF parsing to Biopython. Biopython has a parsing framework (Bio.SeqIO) which turns various file formats (e.g. GenBank) into objects (SeqRecord objects, with optional SeqFeature objects), which we can map onto the BioSQL tables. If we manage to integrate GFF parsing into Biopython's Bio.SeqIO framework (non-trivial), then Biopython would as a consequence be able to load a GFF file into BioSQL. If any of the other Bio* projects can already import GFF files into BioSQL, I'd like Biopython to load the data into the database in the same way. Essentially this would give a recipe for how we should model the GFF data in our objects in order to achieve this intra-project BioSQL compatibility. Peter From biopython at maubp.freeserve.co.uk Wed Feb 18 10:31:26 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Feb 2009 15:31:26 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <20D70002-B512-4EA0-8755-1CF00310ADC6@gmx.net> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> <320fb6e00811281041y2b60867en8366b01df286e92b@mail.gmail.com> <20D70002-B512-4EA0-8755-1CF00310ADC6@gmx.net> Message-ID: <320fb6e00902180731w50a7704asc095b866842bcad3@mail.gmail.com> On Fri, Nov 28, 2008 at 6:46 PM, Hilmar Lapp wrote: > > On Nov 28, 2008, at 1:41 PM, Peter wrote: >>> >>> It's part of the changes planned for the next release indeed. >> >> By next release, do you mean BioSQL v1.0.2 or v1.1.0 here? > > That would be 1.0.2. Otherwise there would be no need to worry about > backward compatibility (as 1.1x won't be by definition). > > -hilmar Hi Hilmar, A few months ago we talked about BioSQL v1.0.2 adding explicit composite primary keys to tables like taxon_name which currently have only a uniqueness constraint. Nothing has shown up on SVN, and I''m not sure how you track plans for the next release so I just filed this issue on bugzilla: http://bugzilla.open-bio.org/show_bug.cgi?id=2765 Thanks, Peter From crackeur at comcast.net Wed Feb 18 22:21:22 2009 From: crackeur at comcast.net (crackeur at comcast.net) Date: Thu, 19 Feb 2009 03:21:22 +0000 (UTC) Subject: [BioSQL-l] [ANN] VTD-XML 2.5 In-Reply-To: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> Message-ID: <822284529.1630891235013682870.JavaMail.root@sz0167a.emeryville.ca.mail.comcast.net> VTD-XML 2.5 is now released. Please go to https://sourceforge.net/project/showfiles.php?group_id=110612&package_id=120172&release_id=661376 ?to download the latest version. Changes from Version 2.4 (2/2009) * Added separate VTD indexing generating and loading (see http://vtd-xml.sf.net/persistence.html for further info) * Integrated extended VTD supporting 256 GB doc (In Java only). * Added duplicateNav() for replicate multiple VTDNav instances sharing XML, VTD and LC buffer (availabe in Java and C#). * Various bug fixes and enhancements From hlapp at gmx.net Wed Feb 18 23:52:26 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 18 Feb 2009 23:52:26 -0500 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <320fb6e00902180731w50a7704asc095b866842bcad3@mail.gmail.com> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> <320fb6e00811281041y2b60867en8366b01df286e92b@mail.gmail.com> <20D70002-B512-4EA0-8755-1CF00310ADC6@gmx.net> <320fb6e00902180731w50a7704asc095b866842bcad3@mail.gmail.com> Message-ID: <8426858D-A04B-456B-920A-BFB9851C6486@gmx.net> Thanks, that's great, appreciate it! The plans for the next release should otherwise be on http://www.biosql.org/wiki/Enhancement_Requests Feel free to add to that as well. I'm hoping to take on a release over the next 2-3 weeks but am completely swamped right now. If it happens, you'll hear about that shortly. -hilmar On Feb 18, 2009, at 10:31 AM, Peter wrote: > On Fri, Nov 28, 2008 at 6:46 PM, Hilmar Lapp wrote: >> >> On Nov 28, 2008, at 1:41 PM, Peter wrote: >>>> >>>> It's part of the changes planned for the next release indeed. >>> >>> By next release, do you mean BioSQL v1.0.2 or v1.1.0 here? >> >> That would be 1.0.2. Otherwise there would be no need to worry about >> backward compatibility (as 1.1x won't be by definition). >> >> -hilmar > > > Hi Hilmar, > > A few months ago we talked about BioSQL v1.0.2 adding explicit > composite primary keys to tables like taxon_name which currently have > only a uniqueness constraint. Nothing has shown up on SVN, and I''m > not sure how you track plans for the next release so I just filed this > issue on bugzilla: > http://bugzilla.open-bio.org/show_bug.cgi?id=2765 > > Thanks, > > Peter -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From chapmanb at 50mail.com Sun Feb 22 16:18:06 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 22 Feb 2009 16:18:06 -0500 Subject: [BioSQL-l] Importing GFF3 files into a BioSQL database? In-Reply-To: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> References: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> Message-ID: <20090222211806.GA58076@kunkel> Hi Peter; > Do any of the Bio* projects currently let you import a GFF3 (or even a > GFF2) file into a BioSQL database? Normalizing the GFF and standard SeqIO representations is a great idea. I use BioSQL quite a bit, and it would be nice to be able to output GFF formatted files directly from bioentries. To get more familiar with BioPerl GFF mappings, I took a look at how GenBank files get converted to GFF files with BioPerl. Generally, most things map as you'd expect but a few items are left behind. I wrote up the details on the current mappings, along with some proposals for expanding them, here: http://bcbio.wordpress.com/2009/02/22/exploring-bioperl-genbank-to-gff-mapping/ I think for the Biopython mapping we could try and follow what BioPerl does where it makes good sense, and introduce the other items in a way that is consistent and could be followed by other projects. Hope this helps move things forward, Brad > > Looking at some of the examples on > http://www.sequenceontology.org/gff3.shtml this looks possible. I > assume each GFF file normally describes features on a single > plasmid/chromosome, meaning a single bioentry table entry. I would > expect each GFF feature to become a seqfeature table entry (with a > location table entry for each line describing its location), and the > main sequence (if present in the GFF file), would be a biosequence > table entry. So far this isn't too complicated. The GFF3 > documentation gives some example of "parent" or rather "part-of" > relationships between features (e.g. an exon which is part of three > parent mRNA features). Perhaps three entries in the > seqfeature_relationship table could record this. > > Also, GFF3 files seem to be very organised with regards ontologies - > something we have touched on before on this mailing list. > > My reason for asking regards adding GFF parsing to Biopython. > Biopython has a parsing framework (Bio.SeqIO) which turns various file > formats (e.g. GenBank) into objects (SeqRecord objects, with optional > SeqFeature objects), which we can map onto the BioSQL tables. If we > manage to integrate GFF parsing into Biopython's Bio.SeqIO framework > (non-trivial), then Biopython would as a consequence be able to load a > GFF file into BioSQL. If any of the other Bio* projects can already > import GFF files into BioSQL, I'd like Biopython to load the data into > the database in the same way. Essentially this would give a recipe > for how we should model the GFF data in our objects in order to > achieve this intra-project BioSQL compatibility. > > Peter > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From cjfields at illinois.edu Sun Feb 22 18:18:54 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 22 Feb 2009 17:18:54 -0600 Subject: [BioSQL-l] Importing GFF3 files into a BioSQL database? In-Reply-To: <20090222211806.GA58076@kunkel> References: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> <20090222211806.GA58076@kunkel> Message-ID: <6AB45A7E-B78C-4E2E-B701-3C8704799A24@illinois.edu> Re: bioperl; we're planning on refactoring several bits in BioPerl for consistency. http://www.bioperl.org/wiki/GFF_Refactor The problem is there are several different methods to parse and generate GFF strings, some only partially implemented. I would like to coalesce around a central mode of generating such output, or at least have a way to validate such data. Another issue is that the typical bioperl seqfeature comes flattened (non-hierarchal) and untyped (no check against SO) by default when parsed from a GenBank file. The bp_genbank2gff3.pl attempts to rectify this, but I think an integrated optional way of generating unflattened (hierarchal) typed feature data within the SeqIO parsers would be better. There is a simple way we could implement this, just need time to work it in. chris On Feb 22, 2009, at 3:18 PM, Brad Chapman wrote: > Hi Peter; > >> Do any of the Bio* projects currently let you import a GFF3 (or >> even a >> GFF2) file into a BioSQL database? > > Normalizing the GFF and standard SeqIO representations is a great > idea. I use BioSQL quite a bit, and it would be nice to be able to > output GFF formatted files directly from bioentries. > > To get more familiar with BioPerl GFF mappings, I took a look at how > GenBank files get converted to GFF files with BioPerl. Generally, > most things map as you'd expect but a few items are left behind. I > wrote up the details on the current mappings, along with some > proposals for expanding them, here: > > http://bcbio.wordpress.com/2009/02/22/exploring-bioperl-genbank-to-gff-mapping/ > > I think for the Biopython mapping we could try and follow what > BioPerl does where it makes good sense, and introduce the other > items in a way that is consistent and could be followed by other > projects. > > Hope this helps move things forward, > Brad > > > >> >> Looking at some of the examples on >> http://www.sequenceontology.org/gff3.shtml this looks possible. I >> assume each GFF file normally describes features on a single >> plasmid/chromosome, meaning a single bioentry table entry. I would >> expect each GFF feature to become a seqfeature table entry (with a >> location table entry for each line describing its location), and the >> main sequence (if present in the GFF file), would be a biosequence >> table entry. So far this isn't too complicated. The GFF3 >> documentation gives some example of "parent" or rather "part-of" >> relationships between features (e.g. an exon which is part of three >> parent mRNA features). Perhaps three entries in the >> seqfeature_relationship table could record this. >> >> Also, GFF3 files seem to be very organised with regards ontologies - >> something we have touched on before on this mailing list. >> >> My reason for asking regards adding GFF parsing to Biopython. >> Biopython has a parsing framework (Bio.SeqIO) which turns various >> file >> formats (e.g. GenBank) into objects (SeqRecord objects, with optional >> SeqFeature objects), which we can map onto the BioSQL tables. If we >> manage to integrate GFF parsing into Biopython's Bio.SeqIO framework >> (non-trivial), then Biopython would as a consequence be able to >> load a >> GFF file into BioSQL. If any of the other Bio* projects can already >> import GFF files into BioSQL, I'd like Biopython to load the data >> into >> the database in the same way. Essentially this would give a recipe >> for how we should model the GFF data in our objects in order to >> achieve this intra-project BioSQL compatibility. >> >> Peter >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From biopython at maubp.freeserve.co.uk Mon Feb 23 05:23:07 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 23 Feb 2009 10:23:07 +0000 Subject: [BioSQL-l] Importing GFF3 files into a BioSQL database? In-Reply-To: <6AB45A7E-B78C-4E2E-B701-3C8704799A24@illinois.edu> References: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> <20090222211806.GA58076@kunkel> <6AB45A7E-B78C-4E2E-B701-3C8704799A24@illinois.edu> Message-ID: <320fb6e00902230223o6aaf489cj77dd31028906dd0d@mail.gmail.com> >>> Do any of the Bio* projects currently let you import a GFF3 (or even a >>> GFF2) file into a BioSQL database? >> >> To get more familiar with BioPerl GFF mappings, I took a look at how >> GenBank files get converted to GFF files with BioPerl. Generally, >> most things map as you'd expect but a few items are left behind. I >> wrote up the details on the current mappings, along with some >> proposals for expanding them, here: >> >> >> http://bcbio.wordpress.com/2009/02/22/exploring-bioperl-genbank-to-gff-mapping/ That looks interesting Brad. You seem to be focusing on the top-level record annotation. From my point of view, the feature annotation (or feature qualifiers as Biopython calls them) is more interesting. Brad wrote: >> Normalizing the GFF and standard SeqIO representations is a great >> idea. I use BioSQL quite a bit, and it would be nice to be able to >> output GFF formatted files directly from bioentries. I was thinking just loading GFF files with Biopython's SeqIO to start with (and thus GFF into BioSQL). In order to dump a BioSQL entry into a GFF file we'd need to have Biopython's SeqIO be able to write GFF files. I'm not sure if we can do that if the parsing is lossy - it is the relationships between features that strike me as most work (storing these as simple strings may be good enough). On Sun, Feb 22, 2009 at 11:18 PM, Chris Fields wrote: > > Re: bioperl; we're planning on refactoring several bits in BioPerl for > consistency. > > http://www.bioperl.org/wiki/GFF_Refactor > > The problem is there are several different methods to parse and generate GFF > strings, some only partially implemented. ?I would like to coalesce around a > central mode of generating such output, or at least have a way to validate > such data. > > Another issue is that the typical bioperl seqfeature comes flattened > (non-hierarchal) and untyped (no check against SO) by default when parsed > from a GenBank file. ?The bp_genbank2gff3.pl attempts to rectify this, but I > think an integrated optional way of generating unflattened (hierarchal) > typed feature data within the SeqIO parsers would be better. ?There is a > simple way we could implement this, just need time to work it in. Something we've touched on before on this mailing list is loading GenBank files into BioSQL while checking them against an ontology (rather than the current ad-hoc ontology where new terms (even spelling errors) get recorded as new ontology terms). That seems to be a related point. If I understand correctly from Chris and Brad's posts, with BioPerl you could do GFF file to GenBank file to BioSQL, but not directly? Peter From jimp at compbio.dundee.ac.uk Mon Feb 23 09:06:23 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Mon, 23 Feb 2009 14:06:23 +0000 Subject: [BioSQL-l] Importing GFF3 files into a BioSQL database? In-Reply-To: <320fb6e00902230223o6aaf489cj77dd31028906dd0d@mail.gmail.com> References: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> <20090222211806.GA58076@kunkel> <6AB45A7E-B78C-4E2E-B701-3C8704799A24@illinois.edu> <320fb6e00902230223o6aaf489cj77dd31028906dd0d@mail.gmail.com> Message-ID: <49A2AD5F.5030700@compbio.dundee.ac.uk> Hello all. I'd like to stick my oar in here and say that regularising the GFF3<>BioSQL mapping accross Bio* is an essential step towards round-tripping between the different language bindings, and also towards portable export of bioentry annotation via DAS (which has its roots in GFF, and is isomorphic for most practical purposes). Some of you might remember that back in Dec'08 I made the promise that I'd try and document the DAS<>BioSQL mapping. I've finally started to make good on this and have added a preliminary GFF section to the BioSQL wiki page describing Annotation Mapping conventions : http://www.biosql.org/wiki/Annotation_Mapping#GFF3 Its very rough and ready, probably biased and not quite correct at the moment. It does, however, provide a central home for the information that Brad has put up on his blog, and the various other notes that have been made in this thread. Please add and amend as appropriate. Jim From biopython at maubp.freeserve.co.uk Mon Feb 23 09:27:38 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 23 Feb 2009 14:27:38 +0000 Subject: [BioSQL-l] Importing GFF3 files into a BioSQL database? In-Reply-To: <49A2AD5F.5030700@compbio.dundee.ac.uk> References: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> <20090222211806.GA58076@kunkel> <6AB45A7E-B78C-4E2E-B701-3C8704799A24@illinois.edu> <320fb6e00902230223o6aaf489cj77dd31028906dd0d@mail.gmail.com> <49A2AD5F.5030700@compbio.dundee.ac.uk> Message-ID: <320fb6e00902230627i374035co9fe955b9e5dd0f4@mail.gmail.com> > Hello all. > > I'd like to stick my oar in here and say that regularising the > GFF3<>BioSQL mapping accross Bio* is an essential step towards > round-tripping between the different language bindings, ... Absolutely - that's why I started this thread. On a related note, there is probably still room for improvement in the GenBank<>BioSQL mapping across the Bio* projects, where the "gold standard" also needs documenting. > ... and also towards portable export of bioentry annotation via DAS > (which has its roots in GFF, and is isomorphic for most practical > purposes). > > Some of you might remember that back in Dec'08 I made the promise that > I'd try and document the DAS<>BioSQL mapping. I've finally started to > make good on this and have added a preliminary GFF section to the BioSQL > wiki page describing Annotation Mapping conventions : > http://www.biosql.org/wiki/Annotation_Mapping#GFF3 > > Its very rough and ready, probably biased and not quite correct at the > moment. It does, however, provide a central home for the information > that Brad has put up on his blog, and the various other notes that have > been made in this thread. Please add and amend as appropriate. Is this based on the current behaviour of BioPerl code - or just what you think would be sensible? Peter From hlapp at gmx.net Mon Feb 23 10:07:09 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 23 Feb 2009 10:07:09 -0500 Subject: [BioSQL-l] Importing GFF3 files into a BioSQL database? In-Reply-To: <49A2AD5F.5030700@compbio.dundee.ac.uk> References: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> <20090222211806.GA58076@kunkel> <6AB45A7E-B78C-4E2E-B701-3C8704799A24@illinois.edu> <320fb6e00902230223o6aaf489cj77dd31028906dd0d@mail.gmail.com> <49A2AD5F.5030700@compbio.dundee.ac.uk> Message-ID: <69DD7713-7351-4275-B96D-2C0B4198DDF5@gmx.net> On Feb 23, 2009, at 9:06 AM, James Procter wrote: > I've finally started to make good on this and have added a > preliminary GFF section to the BioSQL wiki page describing > Annotation Mapping conventions : > http://www.biosql.org/wiki/Annotation_Mapping#GFF3 Awesome! Thanks so much James for putting this in. I'll look it over shortly. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jimp at compbio.dundee.ac.uk Mon Feb 23 10:10:35 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Mon, 23 Feb 2009 15:10:35 +0000 Subject: [BioSQL-l] Importing GFF3 files into a BioSQL database? In-Reply-To: <320fb6e00902230627i374035co9fe955b9e5dd0f4@mail.gmail.com> References: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> <20090222211806.GA58076@kunkel> <6AB45A7E-B78C-4E2E-B701-3C8704799A24@illinois.edu> <320fb6e00902230223o6aaf489cj77dd31028906dd0d@mail.gmail.com> <49A2AD5F.5030700@compbio.dundee.ac.uk> <320fb6e00902230627i374035co9fe955b9e5dd0f4@mail.gmail.com> Message-ID: <49A2BC6B.3080809@compbio.dundee.ac.uk> Hi Peter. Peter wrote: >> I'd like to stick my oar in here and say that regularising the >> GFF3<>BioSQL mapping accross Bio* is an essential step towards >> round-tripping between the different language bindings, ... > > Absolutely - that's why I started this thread. On a related note, > there is probably still room for improvement in the GenBank<>BioSQL > mapping across the Bio* projects, where the "gold standard" also needs > documenting. definitely. An awful lot of effort has been expended on standardisation in the last 4-5 years, so it shouldn't be a tough job for someone with experience of the GenBank annotation model. >> Its very rough and ready, probably biased and not quite correct at the >> moment. It does, however, provide a central home for the information >> that Brad has put up on his blog, and the various other notes that have >> been made in this thread. Please add and amend as appropriate. > > Is this based on the current behaviour of BioPerl code - or just what > you think would be sensible? Current behaviour, where possible, with notes where current behaviour seems broken/crazy, based on my mishmash of experience over the last couple of months with nightly builds of the 1.6 code. You'll also notice that I've tried to stick to the well defined basics, leaving the detail to be filled in by the experts. The aim of the page is to allow some consensus regarding each aspect of mapping the GFF spec to be reached, and thence enable us to verify or note where each language binding differs from that consensus. I'm interested in this because I'm exploring the DASBioSQL mapping in a situation where I need to import via BioPerl, and retrieve features via BioJava. I'll add in my own notes for this special case (based on biojavax bindings and an extensively patched version of Dazzle's BioSQL datasource) over the next few weeks. I'll also try to encourage some input from other DAS developers at the next DAS Developers meeting in March. Jim. From hlapp at gmx.net Mon Feb 2 15:58:03 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 2 Feb 2009 10:58:03 -0500 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <93b45ca50901310303t37905e8ak3819c05f4b94c287@mail.gmail.com> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> <49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu> <4981EAEC.4070508@compbio.dundee.ac.uk> <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> <903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net> <5B046A75-AFD3-4CEB-B190-A27106828E9C@illinois.edu> <99475964-CFB3-4A27-8024-8A14876533E0@illinois.edu> <93b45ca50901310303t37905e8ak3819c05f4b94c287@mail.gmail.com> Message-ID: I agree, but I also think one of the questions that should be devoted significant thought to is in which ways and to what extend such a web- service API needs to be or has to be different from DAS (or DAS/2, which allows write-back). The DAS model is a bit different from BioSQL's in that it doesn't distinguish between sequences and sequence features. But I'm not that alone suffices to motivate a completely different API. -hilmar On Jan 31, 2009, at 6:03 AM, Mark Schreiber wrote: > Hi - > > My feeling is that the diversity of languages and frameworks within > languages would mean that a generic web front end to BioSQL will and > should never materialize. What would be a lot more sensible is a > generic API in the form of a webservice or collection of webservices > that could be used by (theoretically) any web frame work to generate a > website. > > User preferences and requirements will be far too diverse for a > generic web front end. > > - Mark > > On 1/31/09, Chris Fields wrote: >> Another article (as pointed out by Heikki on bioperl-l): >> >> http://www.heise-online.co.uk/open/Healthcheck-Perl-The-Perl-Future--/features/112388/0 >> >> The last section is all on MVC-oriented frameworks. >> >> chris >> >> On Jan 30, 2009, at 1:57 PM, Gudmundur A. Thorisson wrote: >> >>> We use Catalyst MVC framework for our project (http://www.hgvbaseg2p.org >>> ). Very good stuff, we combine it with the DBIx::Class ORM and >>> Template Toolkit as the templating engine. Totally recommended. >>> >>> >>> Mummi >>> >>> On 30 Jan 2009, at 19:45, Chris Fields wrote: >>>>> >>>> >>>> Perl web application framework: Catalyst and Jifty (have not tried >>>> them myself). RoR gets a lot of press, but I understand the RoR >>>> devs tend not to listen to the core ruby devs and (as a >>>> consequence) had recently run into issues with the 1.8.7 ruby >>>> release, detailed by the always-entertaining chromatic here: >>>> >>>> http://use.perl.org/~chromatic/journal/37125 >>>> >>>> chris >>>> >>>>> My $0.02, and I'd be keen so see what comes out of this. If >>>>> there's something I can do to tip the balance towards something >>>>> tangible happening, let me know. >>>>> >>>>> -hilmar >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lpritc at scri.ac.uk Tue Feb 3 09:50:12 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Tue, 03 Feb 2009 09:50:12 +0000 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: Message-ID: On 30/01/2009 19:57, "Gudmundur A. Thorisson" wrote: > We use Catalyst MVC framework for our project (http:// > www.hgvbaseg2p.org). Very good stuff, we combine it with the > DBIx::Class ORM and Template Toolkit as the templating engine. Totally > recommended. Just my twopenn'orth... We're using a mixture of Biopython, Turbogears and SQLAlchemy here to provide a web interface to BioSQL. We have a specific use-case (viewing comparative genomic data) which is read only, for now. I agree with Mark and Hilmar in general - a consistent, generic API could benefit and make easier a wide range of possible web-based interactions with the database. I think that this would be particularly useful for webservice-based db updates, as an interface that's well-tested by the community is potentially more robust than a home-rolled solution. There are cases (such as ours) where additional tables have been grafted onto BioSQL for local use where such an API would likely also need to be extended for consistency. For this and other reasons people may still be motivated to use one of the several other options available on their own projects. This doesn't negate the worth of a common web API, though. L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From markjschreiber at gmail.com Tue Feb 3 11:24:34 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 3 Feb 2009 19:24:34 +0800 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> <4981EAEC.4070508@compbio.dundee.ac.uk> <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> <903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net> <5B046A75-AFD3-4CEB-B190-A27106828E9C@illinois.edu> <99475964-CFB3-4A27-8024-8A14876533E0@illinois.edu> <93b45ca50901310303t37905e8ak3819c05f4b94c287@mail.gmail.com> Message-ID: <93b45ca50902030324k1bb4891elc47edd4a765a805a@mail.gmail.com> I'm not against a DAS API to BioSQL but one strong point for a webservices API is the number of generic programing tools and workflow tools that can consume webservices. Maybe the DAS API could be a wrapper to a webservices API? Could this be done by intercepting the DAS calls and reformatting them as webservice calls? - Mark On Mon, Feb 2, 2009 at 11:58 PM, Hilmar Lapp wrote: > I agree, but I also think one of the questions that should be devoted > significant thought to is in which ways and to what extend such a > web-service API needs to be or has to be different from DAS (or DAS/2, which > allows write-back). > > The DAS model is a bit different from BioSQL's in that it doesn't > distinguish between sequences and sequence features. But I'm not that alone > suffices to motivate a completely different API. > > -hilmar > > > On Jan 31, 2009, at 6:03 AM, Mark Schreiber wrote: > > Hi - >> >> My feeling is that the diversity of languages and frameworks within >> languages would mean that a generic web front end to BioSQL will and >> should never materialize. What would be a lot more sensible is a >> generic API in the form of a webservice or collection of webservices >> that could be used by (theoretically) any web frame work to generate a >> website. >> >> User preferences and requirements will be far too diverse for a >> generic web front end. >> >> - Mark >> >> On 1/31/09, Chris Fields wrote: >> >>> Another article (as pointed out by Heikki on bioperl-l): >>> >>> >>> http://www.heise-online.co.uk/open/Healthcheck-Perl-The-Perl-Future--/features/112388/0 >>> >>> The last section is all on MVC-oriented frameworks. >>> >>> chris >>> >>> On Jan 30, 2009, at 1:57 PM, Gudmundur A. Thorisson wrote: >>> >>> We use Catalyst MVC framework for our project ( >>>> http://www.hgvbaseg2p.org >>>> ). Very good stuff, we combine it with the DBIx::Class ORM and >>>> Template Toolkit as the templating engine. Totally recommended. >>>> >>>> >>>> Mummi >>>> >>>> On 30 Jan 2009, at 19:45, Chris Fields wrote: >>>> >>>>> >>>>>> >>>>> Perl web application framework: Catalyst and Jifty (have not tried >>>>> them myself). RoR gets a lot of press, but I understand the RoR >>>>> devs tend not to listen to the core ruby devs and (as a >>>>> consequence) had recently run into issues with the 1.8.7 ruby >>>>> release, detailed by the always-entertaining chromatic here: >>>>> >>>>> http://use.perl.org/~chromatic/journal/37125 >>>>> >>>>> chris >>>>> >>>>> My $0.02, and I'd be keen so see what comes out of this. If >>>>>> there's something I can do to tip the balance towards something >>>>>> tangible happening, let me know. >>>>>> >>>>>> -hilmar >>>>>> >>>>> _______________________________________________ >>>>> BioSQL-l mailing list >>>>> BioSQL-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>>> >>>> >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>>> >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >>> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > From michael.watson at bbsrc.ac.uk Wed Feb 4 11:01:36 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed, 4 Feb 2009 11:01:36 -0000 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <93b45ca50901310303t37905e8ak3819c05f4b94c287@mail.gmail.com> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk><49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu><4981EAEC.4070508@compbio.dundee.ac.uk><982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu><8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk><903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net><5B046A75-AFD3-4CEB-B190-A27106828E9C@illinois.edu><99475964-CFB3-4A27-8024-8A14876533E0@illinois.edu> <93b45ca50901310303t37905e8ak3819c05f4b94c287@mail.gmail.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9507E27183@iahce2ksrv1.iah.bbsrc.ac.uk> Hi I think the conversation is splitting in two, which is great! First of all, I can see a need for an API, and web-services are a very interesting way forward, especially as they can be used in many different systems. I'd certainly like to see something develop along those lines. However, I would like to continue the front-end conversation. A web-services API isn't a front-end, it's a means to a front-end, and I disagree that there is not enough commonality to develop a web-based front-end. There are a huge number of groups who want to manage a sequence collection, they want to be able to search that sequence collection, list and browse it, export them as EMBL/GenBank, import from EMBL/GenBank. Now, if someone was to write import.php, export.php, search.php and browse.php - well, on top of a BioSQL database, I think that would be an incredibly powerful app. Mick -----Original Message----- From: biosql-l-bounces at lists.open-bio.org [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of Mark Schreiber Sent: 31 January 2009 11:04 To: Chris Fields Cc: biosql-l at lists.open-bio.org Subject: Re: [BioSQL-l] Web front-ends to BioSQL Hi - My feeling is that the diversity of languages and frameworks within languages would mean that a generic web front end to BioSQL will and should never materialize. What would be a lot more sensible is a generic API in the form of a webservice or collection of webservices that could be used by (theoretically) any web frame work to generate a website. User preferences and requirements will be far too diverse for a generic web front end. - Mark On 1/31/09, Chris Fields wrote: > Another article (as pointed out by Heikki on bioperl-l): > > http://www.heise-online.co.uk/open/Healthcheck-Perl-The-Perl-Future--/fe atures/112388/0 > > The last section is all on MVC-oriented frameworks. > > chris > > On Jan 30, 2009, at 1:57 PM, Gudmundur A. Thorisson wrote: > >> We use Catalyst MVC framework for our project (http://www.hgvbaseg2p.org >> ). Very good stuff, we combine it with the DBIx::Class ORM and >> Template Toolkit as the templating engine. Totally recommended. >> >> >> Mummi >> >> On 30 Jan 2009, at 19:45, Chris Fields wrote: >>>> >>> >>> Perl web application framework: Catalyst and Jifty (have not tried >>> them myself). RoR gets a lot of press, but I understand the RoR >>> devs tend not to listen to the core ruby devs and (as a >>> consequence) had recently run into issues with the 1.8.7 ruby >>> release, detailed by the always-entertaining chromatic here: >>> >>> http://use.perl.org/~chromatic/journal/37125 >>> >>> chris >>> >>>> My $0.02, and I'd be keen so see what comes out of this. If >>>> there's something I can do to tip the balance towards something >>>> tangible happening, let me know. >>>> >>>> -hilmar >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > _______________________________________________ BioSQL-l mailing list BioSQL-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biosql-l From markjschreiber at gmail.com Wed Feb 4 11:33:19 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 4 Feb 2009 19:33:19 +0800 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9507E27183@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> <4981EAEC.4070508@compbio.dundee.ac.uk> <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> <903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net> <5B046A75-AFD3-4CEB-B190-A27106828E9C@illinois.edu> <99475964-CFB3-4A27-8024-8A14876533E0@illinois.edu> <93b45ca50901310303t37905e8ak3819c05f4b94c287@mail.gmail.com> <8975119BCD0AC5419D61A9CF1A923E9507E27183@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <93b45ca50902040333r3cb95bd8xe242504af3c7587a@mail.gmail.com> Interestingly a few years back Richard Holland and I tried to publish a web front to BioSQL that used Java, Struts and some bits of BioJava. It got shot down by the reviewers on the basis that there was already too many of these kinds of things like GMOD etc. Unfortunately no-one agreed with the idea that a front end to BioSQL was the unique part. Not that one or two opinionated reviewers (possibly reading this list) should be enough to put you off. I think they missed the point. - Mark On Wed, Feb 4, 2009 at 7:01 PM, michael watson (IAH-C) < michael.watson at bbsrc.ac.uk> wrote: > Hi > > I think the conversation is splitting in two, which is great! > > First of all, I can see a need for an API, and web-services are a very > interesting way forward, especially as they can be used in many > different systems. I'd certainly like to see something develop along > those lines. > > However, I would like to continue the front-end conversation. A > web-services API isn't a front-end, it's a means to a front-end, and I > disagree that there is not enough commonality to develop a web-based > front-end. There are a huge number of groups who want to manage a > sequence collection, they want to be able to search that sequence > collection, list and browse it, export them as EMBL/GenBank, import from > EMBL/GenBank. Now, if someone was to write import.php, export.php, > search.php and browse.php - well, on top of a BioSQL database, I think > that would be an incredibly powerful app. > > Mick > > -----Original Message----- > From: biosql-l-bounces at lists.open-bio.org > [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of Mark Schreiber > Sent: 31 January 2009 11:04 > To: Chris Fields > Cc: biosql-l at lists.open-bio.org > Subject: Re: [BioSQL-l] Web front-ends to BioSQL > > Hi - > > My feeling is that the diversity of languages and frameworks within > languages would mean that a generic web front end to BioSQL will and > should never materialize. What would be a lot more sensible is a > generic API in the form of a webservice or collection of webservices > that could be used by (theoretically) any web frame work to generate a > website. > > User preferences and requirements will be far too diverse for a > generic web front end. > > - Mark > > On 1/31/09, Chris Fields wrote: > > Another article (as pointed out by Heikki on bioperl-l): > > > > > http://www.heise-online.co.uk/open/Healthcheck-Perl-The-Perl-Future--/fe > atures/112388/0 > > > > The last section is all on MVC-oriented frameworks. > > > > chris > > > > On Jan 30, 2009, at 1:57 PM, Gudmundur A. Thorisson wrote: > > > >> We use Catalyst MVC framework for our project > (http://www.hgvbaseg2p.org > >> ). Very good stuff, we combine it with the DBIx::Class ORM and > >> Template Toolkit as the templating engine. Totally recommended. > >> > >> > >> Mummi > >> > >> On 30 Jan 2009, at 19:45, Chris Fields wrote: > >>>> > >>> > >>> Perl web application framework: Catalyst and Jifty (have not tried > >>> them myself). RoR gets a lot of press, but I understand the RoR > >>> devs tend not to listen to the core ruby devs and (as a > >>> consequence) had recently run into issues with the 1.8.7 ruby > >>> release, detailed by the always-entertaining chromatic here: > >>> > >>> http://use.perl.org/~chromatic/journal/37125 > >>> > >>> chris > >>> > >>>> My $0.02, and I'd be keen so see what comes out of this. If > >>>> there's something I can do to tip the balance towards something > >>>> tangible happening, let me know. > >>>> > >>>> -hilmar > >>> _______________________________________________ > >>> BioSQL-l mailing list > >>> BioSQL-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biosql-l > >> > >> _______________________________________________ > >> BioSQL-l mailing list > >> BioSQL-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > From jimp at compbio.dundee.ac.uk Tue Feb 10 19:14:01 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Tue, 10 Feb 2009 19:14:01 +0000 Subject: [BioSQL-l] adding non-positional features via bioperl to a biosql database Message-ID: <4991D1F9.30806@compbio.dundee.ac.uk> Hi. Apologies if this is not the right place to post Bioperl/BioSQL issues, feel free to tell me where to go, after you've read the following: I have been using a sequence pipeline to add in non-positional features to sequences whilst uploading to a bioSQL database. A fragment of the code I tried to use is below: sub process_seq { my ($self, $seq) = @_; my ($dbid, $id) = extract_dbid($seq); my $tags = {'label'=>"".$dbid."_$id", 'notes'=>["".$dbid.":$id"]}; my $feat = Bio::SeqFeature::Generic->new( -start=>'0',-end=>'0', -primary_tag => 'dbref', -tag=>$tags, -strand => 0, -source_tag => 'ATB'); $seq->add_SeqFeature($feat); $seq->version('1') $seq->alphabet('protein') return $seq; } When I use this, the sequences are uploaded fine, and they have the correct non-positional features when I look at the tables, and when I access the database via Biojava. However, when I try to dump any of the features with Bioperl I get the following warning : --------------------- WARNING --------------------- MSG: Calling end without a defined start position --------------------------------------------------- And if I try to add any more features to the sequence and then store the updated object I get the following exception in addition to the above warning. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: number of slots must equal the number of values STACK: Error::throw STACK: Bio::Root::Root::throw /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/Root/Root.pm:357 STACK: Bio::DB::BioSQL::BaseDriver::update_object /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/DB/BioSQL/BaseDriver.pm:1075 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:269 STACK: Bio::DB::Persistent::PersistentObject::store /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/DB/Persistent/PersistentObject.pm:271 STACK: Bio::DB::BioSQL::SeqFeatureAdaptor::store_children /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/DB/BioSQL/SeqFeatureAdaptor.pm:278 Can someone help me out here ? It seems that bioperl doesn't like features with a start/end of '0' - in which case, how do I create non-positional sequence features in a way that bioperl likes ? I'm using a nightly build from December 2008 - but there have been (afaict) no patches to the biosql or Feature::Generic which would fix this behaviour. thanks. jim -- ------------------------------------------------------------------- J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk The University of Dundee is a Scottish Registered Charity, No. SC015096. From hlapp at gmx.net Tue Feb 10 20:41:02 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 10 Feb 2009 15:41:02 -0500 Subject: [BioSQL-l] adding non-positional features via bioperl to a biosql database In-Reply-To: <4991D1F9.30806@compbio.dundee.ac.uk> References: <4991D1F9.30806@compbio.dundee.ac.uk> Message-ID: <5D0999B6-F67D-4C33-9A96-B0216135FC8D@gmx.net> Hi James, BioPerl (and BioSQL) use 1-based coordinates for features, so coordinate 0 risks being treated as undefined, but just not consistently so. If what you want is non-positional features, have you tried not specifying the positional attributes at all? Furthermore, in BioPerl and BioSQL lingo a feature really is an annotation that has a position. So, whereas that's not strictly enforced I think (and may in fact be different in Biojava), did you conclude that natively non-positional sequence annotation using one of the Bio::Annotation classes (and adding it through $seq->annotation- >add_Annotation()) wouldn't work for your purposes? -hilmar On Feb 10, 2009, at 2:14 PM, James Procter wrote: > > Hi. Apologies if this is not the right place to post Bioperl/BioSQL > issues, feel free to tell me where to go, after you've read the > following: > > I have been using a sequence pipeline to add in non-positional > features > to sequences whilst uploading to a bioSQL database. A fragment of the > code I tried to use is below: > > sub process_seq { > my ($self, $seq) = @_; > my ($dbid, $id) = extract_dbid($seq); > my $tags = {'label'=>"".$dbid."_$id", > 'notes'=>["".$dbid.":$id"]}; > my $feat = Bio::SeqFeature::Generic->new( > -start=>'0',-end=>'0', > -primary_tag => > 'dbref', > -tag=>$tags, > -strand => 0, > -source_tag => 'ATB'); > $seq->add_SeqFeature($feat); > $seq->version('1') > $seq->alphabet('protein') > return $seq; > } > > When I use this, the sequences are uploaded fine, and they have the > correct non-positional features when I look at the tables, and when I > access the database via Biojava. However, when I try to dump any of > the > features with Bioperl I get the following warning : > > --------------------- WARNING --------------------- > MSG: Calling end without a defined start position > --------------------------------------------------- > > And if I try to add any more features to the sequence and then store > the > updated object I get the following exception in addition to the above > warning. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: number of slots must equal the number of values > STACK: Error::throw > STACK: Bio::Root::Root::throw > /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/Root/Root.pm:357 > STACK: Bio::DB::BioSQL::BaseDriver::update_object > /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/DB/BioSQL/ > BaseDriver.pm:1075 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store > /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:269 > STACK: Bio::DB::Persistent::PersistentObject::store > /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/DB/Persistent/ > PersistentObject.pm:271 > STACK: Bio::DB::BioSQL::SeqFeatureAdaptor::store_children > /gpfs/gjb_lab/ws-dev1/servers/lib/perl/lib/perl5/Bio/DB/BioSQL/ > SeqFeatureAdaptor.pm:278 > > Can someone help me out here ? It seems that bioperl doesn't like > features with a start/end of '0' - in which case, how do I create > non-positional sequence features in a way that bioperl likes ? > > I'm using a nightly build from December 2008 - but there have been > (afaict) no patches to the biosql or Feature::Generic which would fix > this behaviour. > > thanks. > jim > > > -- > ------------------------------------------------------------------- > J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group > Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk > The University of Dundee is a Scottish Registered Charity, No. > SC015096. > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jimp at compbio.dundee.ac.uk Tue Feb 10 23:25:14 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Tue, 10 Feb 2009 23:25:14 +0000 Subject: [BioSQL-l] adding non-positional features via bioperl to a biosql database In-Reply-To: <5D0999B6-F67D-4C33-9A96-B0216135FC8D@gmx.net> References: <4991D1F9.30806@compbio.dundee.ac.uk> <5D0999B6-F67D-4C33-9A96-B0216135FC8D@gmx.net> Message-ID: <49920CDA.40601@compbio.dundee.ac.uk> Thanks for the reply, Hilmar. Hilmar Lapp wrote: > BioPerl (and BioSQL) use 1-based coordinates for features, so coordinate > 0 risks being treated as undefined, but just not consistently so. erm. yes. But there lies the problem. There's a convention in gff (and DAS) that '0' corresponds to '.' and both mean 'non-position', bioperl's GFF output module actually follows that convention (ie start of '0' leads to a '.' in the start column of the gff feature). BioSQL and BioJava also do this just fine, but whilst BioPerl allows features with a '0' start position to be persisted, it then cannot work with the feature after it's been recovered from the DB. This looks like a bug to me. . BioSQL and > If what you want is non-positional features, have you tried not > specifying the positional attributes at all? Yep. No positional attributes raises an error on store, which is what I'd expect. > Furthermore, in BioPerl and BioSQL lingo a feature really is an > annotation that has a position. So, whereas that's not strictly enforced > I think (and may in fact be different in Biojava), did you conclude that > natively non-positional sequence annotation using one of the > Bio::Annotation classes (and adding it through > $seq->annotation->add_Annotation()) wouldn't work for your purposes? I'm actually trying to find that happy common ground where existing biojava and bioperl binding interpretations of BioSQL meet. I tried adding annotation directly to the sequence object, but couldn't work out where they appeared in Biojava. I then discovered that start=end='0' did what I wanted, and stopped without checking that I could then add more features afterwards. Seems like I should have tried harder :^/ If the above seems like a bug, then I'm happy to raise one. I'd like to see this fixed/cleared up before the next bioperl release if possible. I'll also try and make a start on that ontology map page that we discussed on list last December. Cheers. Jim. From hlapp at gmx.net Tue Feb 10 23:45:40 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 10 Feb 2009 18:45:40 -0500 Subject: [BioSQL-l] adding non-positional features via bioperl to a biosql database In-Reply-To: <49920CDA.40601@compbio.dundee.ac.uk> References: <4991D1F9.30806@compbio.dundee.ac.uk> <5D0999B6-F67D-4C33-9A96-B0216135FC8D@gmx.net> <49920CDA.40601@compbio.dundee.ac.uk> Message-ID: On Feb 10, 2009, at 6:25 PM, James Procter wrote: > BioSQL and BioJava also do this just fine, but whilst BioPerl allows > features with > a '0' start position to be persisted, it then cannot work with the > feature after it's been recovered from the DB. This looks like a bug > to me. I think the fact that it doesn't raise an error speaks much more to the fact that 1-based coordinates aren't fully enforced than that a coordinate of 0 is fully supported. If the coordinate is undef you probably get the same translation to '.' in GFF, and the fact that you get that also for 0 is probably simply due to 0 and undef both evaluating to false in Perl if in an if clause. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Feb 10 23:47:06 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 10 Feb 2009 18:47:06 -0500 Subject: [BioSQL-l] adding non-positional features via bioperl to a biosql database In-Reply-To: <49920CDA.40601@compbio.dundee.ac.uk> References: <4991D1F9.30806@compbio.dundee.ac.uk> <5D0999B6-F67D-4C33-9A96-B0216135FC8D@gmx.net> <49920CDA.40601@compbio.dundee.ac.uk> Message-ID: <53AAB3D4-AD43-42E3-AB74-8DC760019BC9@gmx.net> On Feb 10, 2009, at 6:25 PM, James Procter wrote: > I tried adding annotation directly to the sequence object, but > couldn't work out > where they appeared in Biojava. I think this is where the bug is more likely, in that this is something that I think we will all agree that it should work but apparently doesn't. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jimp at compbio.dundee.ac.uk Wed Feb 11 09:03:04 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Wed, 11 Feb 2009 09:03:04 +0000 Subject: [BioSQL-l] adding non-positional features via bioperl to a biosql database In-Reply-To: <53AAB3D4-AD43-42E3-AB74-8DC760019BC9@gmx.net> References: <4991D1F9.30806@compbio.dundee.ac.uk> <5D0999B6-F67D-4C33-9A96-B0216135FC8D@gmx.net> <49920CDA.40601@compbio.dundee.ac.uk> <53AAB3D4-AD43-42E3-AB74-8DC760019BC9@gmx.net> Message-ID: <49929448.5050301@compbio.dundee.ac.uk> Hilmar Lapp wrote: > > On Feb 10, 2009, at 6:25 PM, James Procter wrote: > >> I tried adding annotation directly to the sequence object, but >> couldn't work out >> where they appeared in Biojava. > > I think this is where the bug is more likely, in that this is something > that I think we will all agree that it should work but apparently doesn't. Yes. I would have used that method if it appeared to work with the existing biojava codebase for retrieving annotation from BioSQL and serving it over DAS. I have to check here, but I think that the other reason I didn't use the annotation object (in the current bioperl 1.6 state) was that it does not support all the attributes needed for this - and that was probably why it was not used in the biojava implementation, either. I'll verify this and post again later. Jim. -- ------------------------------------------------------------------- J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk The University of Dundee is a Scottish Registered Charity, No. SC015096. From jimp at compbio.dundee.ac.uk Wed Feb 11 15:22:32 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Wed, 11 Feb 2009 15:22:32 +0000 Subject: [BioSQL-l] adding non-positional features via bioperl to a biosql database In-Reply-To: References: <4991D1F9.30806@compbio.dundee.ac.uk> <5D0999B6-F67D-4C33-9A96-B0216135FC8D@gmx.net> <49920CDA.40601@compbio.dundee.ac.uk> <49929318.4000700@compbio.dundee.ac.uk> Message-ID: <4992ED38.4030602@compbio.dundee.ac.uk> More on this - see below for off list discussion : Hilmar Lapp wrote: > > On Feb 11, 2009, at 3:58 AM, James Procter wrote: > >> Hilmar Lapp wrote: >>> I think the fact that it doesn't raise an error speaks much more to the >>> fact that 1-based coordinates aren't fully enforced than that a >>> coordinate of 0 is fully supported. If the coordinate is undef you >>> probably get the same translation to '.' in GFF, and the fact that you >>> get that also for 0 is probably simply due to 0 and undef both >>> evaluating to false in Perl if in an if clause. >> The latter is a specific bug and should really be a testcase. Bioperl-DB >> should not allow a feature with invalid start/end to be persisted - >> because it makes that sequence feature inaccessible, and throws fatal >> errors when any additional information is persisted on the sequence. > > > I see your reasoning, and I agree that there is a bug here somewhere, > and that there should be a test case. I guess the way I see it is that > Bioperl-db is an ORM. It doesn't judge Bioperl - it assumes that if you > give it a BioPerl object to persist that BioPerl allowed you to create > then either you or BioPerl or both knew what they were doing and will > just try to obey the order to persist the object. So I think Bioperl-db > is the wrong place to put semantic validity checks on BioPerl objects. yes. The problem is earlier on - bioperl's generic feature module allowing a feature to be created with start==0. > > That said, maybe BioSQL shouldn't permit entries that don't comply with > the model definition, as in BioSQL aiming to be the Bio* interoperable > reference model. So maybe what we should do is put a constraint on > location that prohibits the start and end to be zero. I agree. I even had a discussion today with a colleague about ensuring a schema is sufficiently specified that different user groups do not use it in incompatible ways. I think its important that the schema is sufficiently constrained... at least, when everyone can agree on the constraint! > Not sure why you thought this isn't relevant to BioSQL anymore :-) Heh - I thought I was going into the realms of bioperl/db test cases, rather than purely biosql relevant discussion. Jim From rjalves at igc.gulbenkian.pt Wed Feb 11 18:15:16 2009 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Wed, 11 Feb 2009 18:15:16 +0000 Subject: [BioSQL-l] BioSQL - MySQL index usage Message-ID: <499315B4.6020001@igc.gulbenkian.pt> Greetings everyone, I have been using BioSQL as the current platform to explore taxonomy and I must say I'm quite happy with it. Recently however the complexity/ammount of queries we are using is raising an additional constraint, time. After looking into way of optimizing the queries I noticed that in most cases the left_value, right_value (UNIQUE) indexes are not used even though they exist. I did some quick tests and these were the results: mysql> reset query cache; select * from taxon where left_value < 97224 AND right_value > 97225; (...) 7 rows in set (0.31 sec) mysql> reset query cache; select * from taxon FORCE INDEX (left_value, right_value) where left_value < 97224 AND right_value > 97225; (...) 7 rows in set (0.15 sec) So my question is, does anyone know a faster way to achieve the same result? Also if anyone knows how to make MySQL use the index without the explicit FORCE INDEX syntax I would be happy to give it a try. Thanks, Renato From giles.weaver at googlemail.com Thu Feb 12 09:47:00 2009 From: giles.weaver at googlemail.com (Giles Weaver) Date: Thu, 12 Feb 2009 09:47:00 +0000 Subject: [BioSQL-l] Fwd: BioSQL - MySQL index usage In-Reply-To: <1d06cd5d0902120141r7d9b79d1rf8f0d4f1cc842175@mail.gmail.com> References: <499315B4.6020001@igc.gulbenkian.pt> <1d06cd5d0902120141r7d9b79d1rf8f0d4f1cc842175@mail.gmail.com> Message-ID: <1d06cd5d0902120147g7fbaaf27h94d610b30e85ffc0@mail.gmail.com> I've also found the performance of queries across the taxon and taxon_name tables to be inadequate on occasion. My solutions were: a) to remove all taxa outside of my area of interest from the database, and remove non scientific names from taxon_name b) to create a temporary table (join of taxon and taxon_name) of my taxa of interest and query against that instead of the taxon and taxon_name tables. I can't remember which of these was more effective, performance wise, but obviously neither is an ideal solution. 2009/2/11 Renato Alves Greetings everyone, > > I have been using BioSQL as the current platform to explore taxonomy and I > must say I'm quite happy with it. > > Recently however the complexity/ammount of queries we are using is raising > an additional constraint, time. > > After looking into way of optimizing the queries I noticed that in most > cases the left_value, right_value (UNIQUE) indexes are not used even though > they exist. > > I did some quick tests and these were the results: > > mysql> reset query cache; select * from taxon where left_value < 97224 AND > right_value > 97225; > (...) > 7 rows in set (0.31 sec) > > mysql> reset query cache; select * from taxon FORCE INDEX (left_value, > right_value) where left_value < 97224 AND right_value > 97225; > (...) > 7 rows in set (0.15 sec) > > So my question is, does anyone know a faster way to achieve the same > result? > > Also if anyone knows how to make MySQL use the index without the explicit > FORCE INDEX syntax I would be happy to give it a try. > > Thanks, > Renato > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > From biopython at maubp.freeserve.co.uk Wed Feb 18 14:27:55 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Feb 2009 14:27:55 +0000 Subject: [BioSQL-l] Importing GFF3 files into a BioSQL database? Message-ID: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> Hi everyone, Do any of the Bio* projects currently let you import a GFF3 (or even a GFF2) file into a BioSQL database? Looking at some of the examples on http://www.sequenceontology.org/gff3.shtml this looks possible. I assume each GFF file normally describes features on a single plasmid/chromosome, meaning a single bioentry table entry. I would expect each GFF feature to become a seqfeature table entry (with a location table entry for each line describing its location), and the main sequence (if present in the GFF file), would be a biosequence table entry. So far this isn't too complicated. The GFF3 documentation gives some example of "parent" or rather "part-of" relationships between features (e.g. an exon which is part of three parent mRNA features). Perhaps three entries in the seqfeature_relationship table could record this. Also, GFF3 files seem to be very organised with regards ontologies - something we have touched on before on this mailing list. My reason for asking regards adding GFF parsing to Biopython. Biopython has a parsing framework (Bio.SeqIO) which turns various file formats (e.g. GenBank) into objects (SeqRecord objects, with optional SeqFeature objects), which we can map onto the BioSQL tables. If we manage to integrate GFF parsing into Biopython's Bio.SeqIO framework (non-trivial), then Biopython would as a consequence be able to load a GFF file into BioSQL. If any of the other Bio* projects can already import GFF files into BioSQL, I'd like Biopython to load the data into the database in the same way. Essentially this would give a recipe for how we should model the GFF data in our objects in order to achieve this intra-project BioSQL compatibility. Peter From biopython at maubp.freeserve.co.uk Wed Feb 18 15:31:26 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Feb 2009 15:31:26 +0000 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <20D70002-B512-4EA0-8755-1CF00310ADC6@gmx.net> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> <320fb6e00811281041y2b60867en8366b01df286e92b@mail.gmail.com> <20D70002-B512-4EA0-8755-1CF00310ADC6@gmx.net> Message-ID: <320fb6e00902180731w50a7704asc095b866842bcad3@mail.gmail.com> On Fri, Nov 28, 2008 at 6:46 PM, Hilmar Lapp wrote: > > On Nov 28, 2008, at 1:41 PM, Peter wrote: >>> >>> It's part of the changes planned for the next release indeed. >> >> By next release, do you mean BioSQL v1.0.2 or v1.1.0 here? > > That would be 1.0.2. Otherwise there would be no need to worry about > backward compatibility (as 1.1x won't be by definition). > > -hilmar Hi Hilmar, A few months ago we talked about BioSQL v1.0.2 adding explicit composite primary keys to tables like taxon_name which currently have only a uniqueness constraint. Nothing has shown up on SVN, and I''m not sure how you track plans for the next release so I just filed this issue on bugzilla: http://bugzilla.open-bio.org/show_bug.cgi?id=2765 Thanks, Peter From crackeur at comcast.net Thu Feb 19 03:21:22 2009 From: crackeur at comcast.net (crackeur at comcast.net) Date: Thu, 19 Feb 2009 03:21:22 +0000 (UTC) Subject: [BioSQL-l] [ANN] VTD-XML 2.5 In-Reply-To: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> Message-ID: <822284529.1630891235013682870.JavaMail.root@sz0167a.emeryville.ca.mail.comcast.net> VTD-XML 2.5 is now released. Please go to https://sourceforge.net/project/showfiles.php?group_id=110612&package_id=120172&release_id=661376 ?to download the latest version. Changes from Version 2.4 (2/2009) * Added separate VTD indexing generating and loading (see http://vtd-xml.sf.net/persistence.html for further info) * Integrated extended VTD supporting 256 GB doc (In Java only). * Added duplicateNav() for replicate multiple VTDNav instances sharing XML, VTD and LC buffer (availabe in Java and C#). * Various bug fixes and enhancements From hlapp at gmx.net Thu Feb 19 04:52:26 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 18 Feb 2009 23:52:26 -0500 Subject: [BioSQL-l] Python ORM mapping for BioSQL In-Reply-To: <320fb6e00902180731w50a7704asc095b866842bcad3@mail.gmail.com> References: <20081125211622.GE83220@sobchak.mgh.harvard.edu> <320fb6e00811261037l407cccf1q98220c8e09df4bab@mail.gmail.com> <161E9681-9B2B-4F10-A36E-086534E4F257@gmx.net> <320fb6e00811280243i5d7354b6g5a7f93d42363c9d@mail.gmail.com> <320fb6e00811281041y2b60867en8366b01df286e92b@mail.gmail.com> <20D70002-B512-4EA0-8755-1CF00310ADC6@gmx.net> <320fb6e00902180731w50a7704asc095b866842bcad3@mail.gmail.com> Message-ID: <8426858D-A04B-456B-920A-BFB9851C6486@gmx.net> Thanks, that's great, appreciate it! The plans for the next release should otherwise be on http://www.biosql.org/wiki/Enhancement_Requests Feel free to add to that as well. I'm hoping to take on a release over the next 2-3 weeks but am completely swamped right now. If it happens, you'll hear about that shortly. -hilmar On Feb 18, 2009, at 10:31 AM, Peter wrote: > On Fri, Nov 28, 2008 at 6:46 PM, Hilmar Lapp wrote: >> >> On Nov 28, 2008, at 1:41 PM, Peter wrote: >>>> >>>> It's part of the changes planned for the next release indeed. >>> >>> By next release, do you mean BioSQL v1.0.2 or v1.1.0 here? >> >> That would be 1.0.2. Otherwise there would be no need to worry about >> backward compatibility (as 1.1x won't be by definition). >> >> -hilmar > > > Hi Hilmar, > > A few months ago we talked about BioSQL v1.0.2 adding explicit > composite primary keys to tables like taxon_name which currently have > only a uniqueness constraint. Nothing has shown up on SVN, and I''m > not sure how you track plans for the next release so I just filed this > issue on bugzilla: > http://bugzilla.open-bio.org/show_bug.cgi?id=2765 > > Thanks, > > Peter -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From chapmanb at 50mail.com Sun Feb 22 21:18:06 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 22 Feb 2009 16:18:06 -0500 Subject: [BioSQL-l] Importing GFF3 files into a BioSQL database? In-Reply-To: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> References: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> Message-ID: <20090222211806.GA58076@kunkel> Hi Peter; > Do any of the Bio* projects currently let you import a GFF3 (or even a > GFF2) file into a BioSQL database? Normalizing the GFF and standard SeqIO representations is a great idea. I use BioSQL quite a bit, and it would be nice to be able to output GFF formatted files directly from bioentries. To get more familiar with BioPerl GFF mappings, I took a look at how GenBank files get converted to GFF files with BioPerl. Generally, most things map as you'd expect but a few items are left behind. I wrote up the details on the current mappings, along with some proposals for expanding them, here: http://bcbio.wordpress.com/2009/02/22/exploring-bioperl-genbank-to-gff-mapping/ I think for the Biopython mapping we could try and follow what BioPerl does where it makes good sense, and introduce the other items in a way that is consistent and could be followed by other projects. Hope this helps move things forward, Brad > > Looking at some of the examples on > http://www.sequenceontology.org/gff3.shtml this looks possible. I > assume each GFF file normally describes features on a single > plasmid/chromosome, meaning a single bioentry table entry. I would > expect each GFF feature to become a seqfeature table entry (with a > location table entry for each line describing its location), and the > main sequence (if present in the GFF file), would be a biosequence > table entry. So far this isn't too complicated. The GFF3 > documentation gives some example of "parent" or rather "part-of" > relationships between features (e.g. an exon which is part of three > parent mRNA features). Perhaps three entries in the > seqfeature_relationship table could record this. > > Also, GFF3 files seem to be very organised with regards ontologies - > something we have touched on before on this mailing list. > > My reason for asking regards adding GFF parsing to Biopython. > Biopython has a parsing framework (Bio.SeqIO) which turns various file > formats (e.g. GenBank) into objects (SeqRecord objects, with optional > SeqFeature objects), which we can map onto the BioSQL tables. If we > manage to integrate GFF parsing into Biopython's Bio.SeqIO framework > (non-trivial), then Biopython would as a consequence be able to load a > GFF file into BioSQL. If any of the other Bio* projects can already > import GFF files into BioSQL, I'd like Biopython to load the data into > the database in the same way. Essentially this would give a recipe > for how we should model the GFF data in our objects in order to > achieve this intra-project BioSQL compatibility. > > Peter > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From cjfields at illinois.edu Sun Feb 22 23:18:54 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 22 Feb 2009 17:18:54 -0600 Subject: [BioSQL-l] Importing GFF3 files into a BioSQL database? In-Reply-To: <20090222211806.GA58076@kunkel> References: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> <20090222211806.GA58076@kunkel> Message-ID: <6AB45A7E-B78C-4E2E-B701-3C8704799A24@illinois.edu> Re: bioperl; we're planning on refactoring several bits in BioPerl for consistency. http://www.bioperl.org/wiki/GFF_Refactor The problem is there are several different methods to parse and generate GFF strings, some only partially implemented. I would like to coalesce around a central mode of generating such output, or at least have a way to validate such data. Another issue is that the typical bioperl seqfeature comes flattened (non-hierarchal) and untyped (no check against SO) by default when parsed from a GenBank file. The bp_genbank2gff3.pl attempts to rectify this, but I think an integrated optional way of generating unflattened (hierarchal) typed feature data within the SeqIO parsers would be better. There is a simple way we could implement this, just need time to work it in. chris On Feb 22, 2009, at 3:18 PM, Brad Chapman wrote: > Hi Peter; > >> Do any of the Bio* projects currently let you import a GFF3 (or >> even a >> GFF2) file into a BioSQL database? > > Normalizing the GFF and standard SeqIO representations is a great > idea. I use BioSQL quite a bit, and it would be nice to be able to > output GFF formatted files directly from bioentries. > > To get more familiar with BioPerl GFF mappings, I took a look at how > GenBank files get converted to GFF files with BioPerl. Generally, > most things map as you'd expect but a few items are left behind. I > wrote up the details on the current mappings, along with some > proposals for expanding them, here: > > http://bcbio.wordpress.com/2009/02/22/exploring-bioperl-genbank-to-gff-mapping/ > > I think for the Biopython mapping we could try and follow what > BioPerl does where it makes good sense, and introduce the other > items in a way that is consistent and could be followed by other > projects. > > Hope this helps move things forward, > Brad > > > >> >> Looking at some of the examples on >> http://www.sequenceontology.org/gff3.shtml this looks possible. I >> assume each GFF file normally describes features on a single >> plasmid/chromosome, meaning a single bioentry table entry. I would >> expect each GFF feature to become a seqfeature table entry (with a >> location table entry for each line describing its location), and the >> main sequence (if present in the GFF file), would be a biosequence >> table entry. So far this isn't too complicated. The GFF3 >> documentation gives some example of "parent" or rather "part-of" >> relationships between features (e.g. an exon which is part of three >> parent mRNA features). Perhaps three entries in the >> seqfeature_relationship table could record this. >> >> Also, GFF3 files seem to be very organised with regards ontologies - >> something we have touched on before on this mailing list. >> >> My reason for asking regards adding GFF parsing to Biopython. >> Biopython has a parsing framework (Bio.SeqIO) which turns various >> file >> formats (e.g. GenBank) into objects (SeqRecord objects, with optional >> SeqFeature objects), which we can map onto the BioSQL tables. If we >> manage to integrate GFF parsing into Biopython's Bio.SeqIO framework >> (non-trivial), then Biopython would as a consequence be able to >> load a >> GFF file into BioSQL. If any of the other Bio* projects can already >> import GFF files into BioSQL, I'd like Biopython to load the data >> into >> the database in the same way. Essentially this would give a recipe >> for how we should model the GFF data in our objects in order to >> achieve this intra-project BioSQL compatibility. >> >> Peter >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From biopython at maubp.freeserve.co.uk Mon Feb 23 10:23:07 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 23 Feb 2009 10:23:07 +0000 Subject: [BioSQL-l] Importing GFF3 files into a BioSQL database? In-Reply-To: <6AB45A7E-B78C-4E2E-B701-3C8704799A24@illinois.edu> References: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> <20090222211806.GA58076@kunkel> <6AB45A7E-B78C-4E2E-B701-3C8704799A24@illinois.edu> Message-ID: <320fb6e00902230223o6aaf489cj77dd31028906dd0d@mail.gmail.com> >>> Do any of the Bio* projects currently let you import a GFF3 (or even a >>> GFF2) file into a BioSQL database? >> >> To get more familiar with BioPerl GFF mappings, I took a look at how >> GenBank files get converted to GFF files with BioPerl. Generally, >> most things map as you'd expect but a few items are left behind. I >> wrote up the details on the current mappings, along with some >> proposals for expanding them, here: >> >> >> http://bcbio.wordpress.com/2009/02/22/exploring-bioperl-genbank-to-gff-mapping/ That looks interesting Brad. You seem to be focusing on the top-level record annotation. From my point of view, the feature annotation (or feature qualifiers as Biopython calls them) is more interesting. Brad wrote: >> Normalizing the GFF and standard SeqIO representations is a great >> idea. I use BioSQL quite a bit, and it would be nice to be able to >> output GFF formatted files directly from bioentries. I was thinking just loading GFF files with Biopython's SeqIO to start with (and thus GFF into BioSQL). In order to dump a BioSQL entry into a GFF file we'd need to have Biopython's SeqIO be able to write GFF files. I'm not sure if we can do that if the parsing is lossy - it is the relationships between features that strike me as most work (storing these as simple strings may be good enough). On Sun, Feb 22, 2009 at 11:18 PM, Chris Fields wrote: > > Re: bioperl; we're planning on refactoring several bits in BioPerl for > consistency. > > http://www.bioperl.org/wiki/GFF_Refactor > > The problem is there are several different methods to parse and generate GFF > strings, some only partially implemented. ?I would like to coalesce around a > central mode of generating such output, or at least have a way to validate > such data. > > Another issue is that the typical bioperl seqfeature comes flattened > (non-hierarchal) and untyped (no check against SO) by default when parsed > from a GenBank file. ?The bp_genbank2gff3.pl attempts to rectify this, but I > think an integrated optional way of generating unflattened (hierarchal) > typed feature data within the SeqIO parsers would be better. ?There is a > simple way we could implement this, just need time to work it in. Something we've touched on before on this mailing list is loading GenBank files into BioSQL while checking them against an ontology (rather than the current ad-hoc ontology where new terms (even spelling errors) get recorded as new ontology terms). That seems to be a related point. If I understand correctly from Chris and Brad's posts, with BioPerl you could do GFF file to GenBank file to BioSQL, but not directly? Peter From jimp at compbio.dundee.ac.uk Mon Feb 23 14:06:23 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Mon, 23 Feb 2009 14:06:23 +0000 Subject: [BioSQL-l] Importing GFF3 files into a BioSQL database? In-Reply-To: <320fb6e00902230223o6aaf489cj77dd31028906dd0d@mail.gmail.com> References: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> <20090222211806.GA58076@kunkel> <6AB45A7E-B78C-4E2E-B701-3C8704799A24@illinois.edu> <320fb6e00902230223o6aaf489cj77dd31028906dd0d@mail.gmail.com> Message-ID: <49A2AD5F.5030700@compbio.dundee.ac.uk> Hello all. I'd like to stick my oar in here and say that regularising the GFF3<>BioSQL mapping accross Bio* is an essential step towards round-tripping between the different language bindings, and also towards portable export of bioentry annotation via DAS (which has its roots in GFF, and is isomorphic for most practical purposes). Some of you might remember that back in Dec'08 I made the promise that I'd try and document the DAS<>BioSQL mapping. I've finally started to make good on this and have added a preliminary GFF section to the BioSQL wiki page describing Annotation Mapping conventions : http://www.biosql.org/wiki/Annotation_Mapping#GFF3 Its very rough and ready, probably biased and not quite correct at the moment. It does, however, provide a central home for the information that Brad has put up on his blog, and the various other notes that have been made in this thread. Please add and amend as appropriate. Jim From biopython at maubp.freeserve.co.uk Mon Feb 23 14:27:38 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 23 Feb 2009 14:27:38 +0000 Subject: [BioSQL-l] Importing GFF3 files into a BioSQL database? In-Reply-To: <49A2AD5F.5030700@compbio.dundee.ac.uk> References: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> <20090222211806.GA58076@kunkel> <6AB45A7E-B78C-4E2E-B701-3C8704799A24@illinois.edu> <320fb6e00902230223o6aaf489cj77dd31028906dd0d@mail.gmail.com> <49A2AD5F.5030700@compbio.dundee.ac.uk> Message-ID: <320fb6e00902230627i374035co9fe955b9e5dd0f4@mail.gmail.com> > Hello all. > > I'd like to stick my oar in here and say that regularising the > GFF3<>BioSQL mapping accross Bio* is an essential step towards > round-tripping between the different language bindings, ... Absolutely - that's why I started this thread. On a related note, there is probably still room for improvement in the GenBank<>BioSQL mapping across the Bio* projects, where the "gold standard" also needs documenting. > ... and also towards portable export of bioentry annotation via DAS > (which has its roots in GFF, and is isomorphic for most practical > purposes). > > Some of you might remember that back in Dec'08 I made the promise that > I'd try and document the DAS<>BioSQL mapping. I've finally started to > make good on this and have added a preliminary GFF section to the BioSQL > wiki page describing Annotation Mapping conventions : > http://www.biosql.org/wiki/Annotation_Mapping#GFF3 > > Its very rough and ready, probably biased and not quite correct at the > moment. It does, however, provide a central home for the information > that Brad has put up on his blog, and the various other notes that have > been made in this thread. Please add and amend as appropriate. Is this based on the current behaviour of BioPerl code - or just what you think would be sensible? Peter From hlapp at gmx.net Mon Feb 23 15:07:09 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 23 Feb 2009 10:07:09 -0500 Subject: [BioSQL-l] Importing GFF3 files into a BioSQL database? In-Reply-To: <49A2AD5F.5030700@compbio.dundee.ac.uk> References: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> <20090222211806.GA58076@kunkel> <6AB45A7E-B78C-4E2E-B701-3C8704799A24@illinois.edu> <320fb6e00902230223o6aaf489cj77dd31028906dd0d@mail.gmail.com> <49A2AD5F.5030700@compbio.dundee.ac.uk> Message-ID: <69DD7713-7351-4275-B96D-2C0B4198DDF5@gmx.net> On Feb 23, 2009, at 9:06 AM, James Procter wrote: > I've finally started to make good on this and have added a > preliminary GFF section to the BioSQL wiki page describing > Annotation Mapping conventions : > http://www.biosql.org/wiki/Annotation_Mapping#GFF3 Awesome! Thanks so much James for putting this in. I'll look it over shortly. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jimp at compbio.dundee.ac.uk Mon Feb 23 15:10:35 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Mon, 23 Feb 2009 15:10:35 +0000 Subject: [BioSQL-l] Importing GFF3 files into a BioSQL database? In-Reply-To: <320fb6e00902230627i374035co9fe955b9e5dd0f4@mail.gmail.com> References: <320fb6e00902180627g5d60ee93p348f88fd6b924ada@mail.gmail.com> <20090222211806.GA58076@kunkel> <6AB45A7E-B78C-4E2E-B701-3C8704799A24@illinois.edu> <320fb6e00902230223o6aaf489cj77dd31028906dd0d@mail.gmail.com> <49A2AD5F.5030700@compbio.dundee.ac.uk> <320fb6e00902230627i374035co9fe955b9e5dd0f4@mail.gmail.com> Message-ID: <49A2BC6B.3080809@compbio.dundee.ac.uk> Hi Peter. Peter wrote: >> I'd like to stick my oar in here and say that regularising the >> GFF3<>BioSQL mapping accross Bio* is an essential step towards >> round-tripping between the different language bindings, ... > > Absolutely - that's why I started this thread. On a related note, > there is probably still room for improvement in the GenBank<>BioSQL > mapping across the Bio* projects, where the "gold standard" also needs > documenting. definitely. An awful lot of effort has been expended on standardisation in the last 4-5 years, so it shouldn't be a tough job for someone with experience of the GenBank annotation model. >> Its very rough and ready, probably biased and not quite correct at the >> moment. It does, however, provide a central home for the information >> that Brad has put up on his blog, and the various other notes that have >> been made in this thread. Please add and amend as appropriate. > > Is this based on the current behaviour of BioPerl code - or just what > you think would be sensible? Current behaviour, where possible, with notes where current behaviour seems broken/crazy, based on my mishmash of experience over the last couple of months with nightly builds of the 1.6 code. You'll also notice that I've tried to stick to the well defined basics, leaving the detail to be filled in by the experts. The aim of the page is to allow some consensus regarding each aspect of mapping the GFF spec to be reached, and thence enable us to verify or note where each language binding differs from that consensus. I'm interested in this because I'm exploring the DASBioSQL mapping in a situation where I need to import via BioPerl, and retrieve features via BioJava. I'll add in my own notes for this special case (based on biojavax bindings and an extensively patched version of Dazzle's BioSQL datasource) over the next few weeks. I'll also try to encourage some input from other DAS developers at the next DAS Developers meeting in March. Jim.