From gwu at molbio.mgh.harvard.edu Tue Jan 27 17:39:57 2009 From: gwu at molbio.mgh.harvard.edu (gwu) Date: Tue, 27 Jan 2009 17:39:57 -0500 Subject: [BioSQL-l] Genbank loading time Message-ID: <497F8D3D.5060907@molbio.mgh.harvard.edu> Hi Everyone, I recently visited the BioWarehouse web site and the document shows loading the whole Genbank into their database takes the data loader 68 hours for MySQL, and 27.5 hours for Oracle. So I wonder if there is a similar test done with BioSQL? Gang Wu From holland at eaglegenomics.com Tue Jan 27 17:57:59 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 27 Jan 2009 22:57:59 +0000 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <497F8D3D.5060907@molbio.mgh.harvard.edu> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> Message-ID: <497F9177.7040309@eaglegenomics.com> It would depend on the toolkit you use. BioWarehouse is a complete API, whereas BioSQL is just a schema and the way in which it is populated (and therefore how long that takes) depends on your toolkit. Currently I'm aware of loaders existing for BioJava, BioPerl, and possibly also BioPython. However each of them load the same data in subtly different ways, so can't be directly compared in terms of which one is faster than the other. I vaguely remember seeing some performance figures for the BioJava/Genbank/BioSQL combination somewhere, but it's been a while! I'm not sure where they were documented though - I certainly haven't got them written down anywhere. Mark Schreiber might know as he definitely did some testing of this - Mark, can you remember what the figures were for BioJava? As for BioPerl/BioPython/etc. I expect their respective project authors will respond to this thread accordingly with the figures from their own domains! cheers, Richard gwu wrote: > Hi Everyone, > > I recently visited the BioWarehouse web site and the document shows > loading the whole Genbank into their database takes the data loader 68 > hours for MySQL, and 27.5 hours for Oracle. So I wonder if there is a > similar test done with BioSQL? > > Gang Wu > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From hlapp at gmx.net Wed Jan 28 00:09:04 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 28 Jan 2009 00:09:04 -0500 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <497F9177.7040309@eaglegenomics.com> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> Message-ID: <72E5157F-02BC-40F6-A59D-3E887A5207C8@gmx.net> The loader for BioPerl is load_seqdatabase.pl, which is part of bioperl-db. With machines current as of 3-4 years ago, I saw upload speeds of between 5 and 15 sequences per second for richly annotated sequences (human/mouse RefSeqs). If you are talking about all of GenBank, the far majority of that will be ESTs and sequencing reads (do you really want to load those?), which are typically sparsely annotated if at all, and so should be faster. mRNA and cDNA sequences will be more in the above range. I have never loaded all of GenBank into a database (and I'm not sure why anyone would want to do this) and so don't have a comparison figure for the total for that. Finally, several instances of load_seqdatabase.pl can be nicely run in parallel on multi-core machines. -hilmar On Jan 27, 2009, at 5:57 PM, Richard Holland wrote: > It would depend on the toolkit you use. BioWarehouse is a complete > API, > whereas BioSQL is just a schema and the way in which it is populated > (and therefore how long that takes) depends on your toolkit. > > Currently I'm aware of loaders existing for BioJava, BioPerl, and > possibly also BioPython. However each of them load the same data in > subtly different ways, so can't be directly compared in terms of which > one is faster than the other. > > I vaguely remember seeing some performance figures for the > BioJava/Genbank/BioSQL combination somewhere, but it's been a while! > I'm > not sure where they were documented though - I certainly haven't got > them written down anywhere. Mark Schreiber might know as he definitely > did some testing of this - Mark, can you remember what the figures > were > for BioJava? > > As for BioPerl/BioPython/etc. I expect their respective project > authors > will respond to this thread accordingly with the figures from their > own > domains! > > cheers, > Richard > > gwu wrote: >> Hi Everyone, >> >> I recently visited the BioWarehouse web site and the document shows >> loading the whole Genbank into their database takes the data loader >> 68 >> hours for MySQL, and 27.5 hours for Oracle. So I wonder if there is a >> similar test done with BioSQL? >> >> Gang Wu >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Wed Jan 28 06:50:50 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 28 Jan 2009 11:50:50 +0000 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <497F9177.7040309@eaglegenomics.com> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> Message-ID: <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> On Tue, Jan 27, 2009 at 10:57 PM, Richard Holland wrote: > > As for BioPerl/BioPython/etc. I expect their respective project authors > will respond to this thread accordingly with the figures from their own > domains! I can tell you importing GenBank files into BioSQL with Biopython is faster than BioPerl, sometimes several times faster, but this will depend on the nature of the files (e.g. genomes versus ESTs). http://lists.open-bio.org/pipermail/biosql-l/2008-August/001320.html http://lists.open-bio.org/pipermail/biopython-dev/2008-April/003625.html I don't have any BioJava comparison figures. In any case, as Richard points out, there will be slight differences in the different Bio* tools how exactly how the data is parsed and stored. I've never tries to import the whole of GenBank, so I don't have any numbers for you there. Peter (Biopython) From biopython at maubp.freeserve.co.uk Wed Jan 28 11:40:55 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 28 Jan 2009 16:40:55 +0000 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> Message-ID: <320fb6e00901280840q796bf5cawf085ad3a7c18bbdd@mail.gmail.com> On Wed, Jan 28, 2009 at 4:29 PM, Chris Fields wrote: > > I don't think sequence loading via load_seqdatabase.pl uses BioPerl. If one > uses BioPerl and bioperl-db the following can explain at least some of the > reason why loading is slow: > http://www.bioperl.org/wiki/Why_BioPerl_is_slow > We also go through the extra hand-wringing with Bio::Species objects > (something I don't think the other Bio* worry about). Looking at the source code for the load_seqdatabase.pl script included with bioperl-db, my impression is it uses Bio::DB::BioDB to talk to the database, and Bio::SeqIO to parse the input sequence files (in this case, Bio::SeqIO::genbank is used). See: http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-db/trunk/scripts/biosql/load_seqdatabase.pl > Regardless, it's not an easy problem to work around. There are such things > as Moose, and Perl6 is now in alpha... I'll take your word for it - I'm in no position to improve anyone's Perl code ;) Peter From cjfields at illinois.edu Wed Jan 28 11:29:50 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 28 Jan 2009 10:29:50 -0600 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> Message-ID: <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> On Jan 28, 2009, at 5:50 AM, Peter wrote: > On Tue, Jan 27, 2009 at 10:57 PM, Richard Holland wrote: >> >> As for BioPerl/BioPython/etc. I expect their respective project >> authors >> will respond to this thread accordingly with the figures from their >> own >> domains! > > I can tell you importing GenBank files into BioSQL with Biopython is > faster than BioPerl, sometimes several times faster, but this will > depend on the nature of the files (e.g. genomes versus ESTs). > http://lists.open-bio.org/pipermail/biosql-l/2008-August/001320.html > http://lists.open-bio.org/pipermail/biopython-dev/2008-April/003625.html I don't think sequence loading via load_seqdatabase.pl uses BioPerl. If one uses BioPerl and bioperl-db the following can explain at least some of the reason why loading is slow: http://www.bioperl.org/wiki/Why_BioPerl_is_slow We also go through the extra hand-wringing with Bio::Species objects (something I don't think the other Bio* worry about). Regardless, it's not an easy problem to work around. There are such things as Moose, and Perl6 is now in alpha... chris > I don't have any BioJava comparison figures. In any case, as Richard > points out, there will be slight differences in the different Bio* > tools how exactly how the data is parsed and stored. > > I've never tries to import the whole of GenBank, so I don't have any > numbers for you there. > > Peter > (Biopython) From cjfields at illinois.edu Wed Jan 28 11:53:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 28 Jan 2009 10:53:49 -0600 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <320fb6e00901280840q796bf5cawf085ad3a7c18bbdd@mail.gmail.com> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> <320fb6e00901280840q796bf5cawf085ad3a7c18bbdd@mail.gmail.com> Message-ID: <37CEB7ED-ECD6-4186-BF84-72B704B3A5E8@illinois.edu> On Jan 28, 2009, at 10:40 AM, Peter wrote: > On Wed, Jan 28, 2009 at 4:29 PM, Chris Fields > wrote: >> >> I don't think sequence loading via load_seqdatabase.pl uses >> BioPerl. If one >> uses BioPerl and bioperl-db the following can explain at least some >> of the >> reason why loading is slow: >> http://www.bioperl.org/wiki/Why_BioPerl_is_slow >> We also go through the extra hand-wringing with Bio::Species objects >> (something I don't think the other Bio* worry about). > > Looking at the source code for the load_seqdatabase.pl script included > with bioperl-db, my impression is it uses Bio::DB::BioDB to talk to > the database, and Bio::SeqIO to parse the input sequence files (in > this case, Bio::SeqIO::genbank is used). See: > > http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-db/trunk/scripts/biosql/load_seqdatabase.pl My bad, I'm thinking of the taxonomy loader (need more coffee). I'm wondering, though, whether it would be feasible to have a direct loader for the most common database formats (GenBank/EMBL/Swiss), something similar to the taxonomy loader that doesn't rely on any specific Bio* package. >> Regardless, it's not an easy problem to work around. There are >> such things >> as Moose, and Perl6 is now in alpha... > > I'll take your word for it - I'm in no position to improve anyone's > Perl code ;) > > Peter Well, the problem lies with perl5's welded-on OO which isn't easy to work around, particularly inheritance issues. Supposedly Moose helps speed things up a bit; it doesn't hurt that it is based somewhat on perl6's Objects: http://feather.perl6.nl/syn/S12.html chris From hlapp at gmx.net Wed Jan 28 12:06:01 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 28 Jan 2009 12:06:01 -0500 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> Message-ID: <0BD2B914-3E57-4266-AE4E-EA8B2F1DD307@gmx.net> On Jan 28, 2009, at 11:29 AM, Chris Fields wrote: > I don't think sequence loading via load_seqdatabase.pl uses BioPerl. It does, actually. All the input parsing is done by BioPerl. Bioperl- db only does the persistence, and the script itself handles all the command line options, opens files, yadda yadda ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Wed Jan 28 12:17:57 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 28 Jan 2009 17:17:57 +0000 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <37CEB7ED-ECD6-4186-BF84-72B704B3A5E8@illinois.edu> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> <320fb6e00901280840q796bf5cawf085ad3a7c18bbdd@mail.gmail.com> <37CEB7ED-ECD6-4186-BF84-72B704B3A5E8@illinois.edu> Message-ID: <320fb6e00901280917q42c39590jf54e0144c0e6bc28@mail.gmail.com> On 1/28/09, Chris Fields wrote: > > My bad, I'm thinking of the taxonomy loader (need more coffee). I'm > wondering, though, whether it would be feasible to have a direct loader for > the most common database formats (GenBank/EMBL/Swiss), something > similar to the taxonomy loader that doesn't rely on any specific Bio* package. > You could re-invent the wheel, and write yet another GenBank/EMBL/Swiss parser in standalone perl for use within load_seqdatabase.pl but I really don't see any point to this. Reusing the BioPerl parser seems most sensible, especially given that bioperl-db is an extension to bioperl in the first place - and the BioPerl parsers already exist and are well tested. Peter From cjfields at illinois.edu Wed Jan 28 12:47:20 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 28 Jan 2009 11:47:20 -0600 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <320fb6e00901280917q42c39590jf54e0144c0e6bc28@mail.gmail.com> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> <320fb6e00901280840q796bf5cawf085ad3a7c18bbdd@mail.gmail.com> <37CEB7ED-ECD6-4186-BF84-72B704B3A5E8@illinois.edu> <320fb6e00901280917q42c39590jf54e0144c0e6bc28@mail.gmail.com> Message-ID: On Jan 28, 2009, at 11:17 AM, Peter wrote: > On 1/28/09, Chris Fields wrote: >> >> My bad, I'm thinking of the taxonomy loader (need more coffee). I'm >> wondering, though, whether it would be feasible to have a direct >> loader for >> the most common database formats (GenBank/EMBL/Swiss), something >> similar to the taxonomy loader that doesn't rely on any specific >> Bio* package. >> > > You could re-invent the wheel, and write yet another > GenBank/EMBL/Swiss parser in standalone perl for use within > load_seqdatabase.pl but I really don't see any point to this. Reusing > the BioPerl parser seems most sensible, especially given that > bioperl-db is an extension to bioperl in the first place - and the > BioPerl parsers already exist and are well tested. > > Peter My point is, instead of first mapping record data to a specific object/ class then mapping the object data to the database, bypass the object completely and generically map relevant data directly in the database according to the BioSQL schema. If anything this may force some consistency between the various Bio* languages. chris From biopython at maubp.freeserve.co.uk Wed Jan 28 13:18:03 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 28 Jan 2009 18:18:03 +0000 Subject: [BioSQL-l] Genbank loading time In-Reply-To: References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> <320fb6e00901280840q796bf5cawf085ad3a7c18bbdd@mail.gmail.com> <37CEB7ED-ECD6-4186-BF84-72B704B3A5E8@illinois.edu> <320fb6e00901280917q42c39590jf54e0144c0e6bc28@mail.gmail.com> Message-ID: <320fb6e00901281018t3148af9exda473c101c15bcc8@mail.gmail.com> >> You could re-invent the wheel, and write yet another >> GenBank/EMBL/Swiss parser in standalone perl for use within >> load_seqdatabase.pl but I really don't see any point to this. Reusing >> the BioPerl parser seems most sensible, especially given that >> bioperl-db is an extension to bioperl in the first place - and the >> BioPerl parsers already exist and are well tested. >> >> Peter > > My point is, instead of first mapping record data to a specific object/class > then mapping the object data to the database, bypass the object completely > and generically map relevant data directly in the database according to the > BioSQL schema. > > If anything this may force some consistency between the various Bio* > languages. > > chris Ah - so rather than using BioPerl/Biopython/BioJava to import your sequence files into a BioSQL database, you'd like BioSQL to come with its own script that does the job? It would "solve" any inconsistencies for getting files of data into the database if this where the only sanctioned way to add records to the database. However, there are a number of downsides - in addition to the considerable extra effort needed to write and support another set of parsers just for BioSQL (without reusing BioPerl/Biopython/BioJava). What about BioPerl/Biopython/BioJava users who have sequence-record objects in memory they want to record in the database? These could have been loaded from GenBank files originally and then manipulated (e.g. adding additional crude annotation from running BLAST). How would they get them into the database - write them to a GenBank file and then invoke the project neutral BioSQL provided script? I think each project needs their own ORM bindings for both loading data into and from the database. Improving any inconsistencies in how each ends up storing sequence files (e.g. GenBank files) can be worked on gradually. [Perhaps I have read more into your comment than you intended - if I have got the wrong end of the stick, please clarify - thanks] Still, a project neutral BioSQL bundled script (not depending on any of BioPerl/Biopython/BioJava) for importing a GenBank file into a database could serve as a "reference implementation" (the role I currently assign to BioPerl's load_seqdatabase.pl). And if this proves faster than load_seqdatabase.pl that's a nice bonus. Peter From cjfields at illinois.edu Wed Jan 28 13:57:25 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 28 Jan 2009 12:57:25 -0600 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <320fb6e00901281018t3148af9exda473c101c15bcc8@mail.gmail.com> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> <320fb6e00901280840q796bf5cawf085ad3a7c18bbdd@mail.gmail.com> <37CEB7ED-ECD6-4186-BF84-72B704B3A5E8@illinois.edu> <320fb6e00901280917q42c39590jf54e0144c0e6bc28@mail.gmail.com> <320fb6e00901281018t3148af9exda473c101c15bcc8@mail.gmail.com> Message-ID: <770D510F-C6EA-455E-B017-766587E1B23F@illinois.edu> On Jan 28, 2009, at 12:18 PM, Peter wrote: >>> You could re-invent the wheel, and write yet another >>> GenBank/EMBL/Swiss parser in standalone perl for use within >>> load_seqdatabase.pl but I really don't see any point to this. >>> Reusing >>> the BioPerl parser seems most sensible, especially given that >>> bioperl-db is an extension to bioperl in the first place - and the >>> BioPerl parsers already exist and are well tested. >>> >>> Peter >> >> My point is, instead of first mapping record data to a specific >> object/class >> then mapping the object data to the database, bypass the object >> completely >> and generically map relevant data directly in the database >> according to the >> BioSQL schema. >> >> If anything this may force some consistency between the various Bio* >> languages. >> >> chris > > Ah - so rather than using BioPerl/Biopython/BioJava to import your > sequence files into a BioSQL database, you'd like BioSQL to come with > its own script that does the job? It would "solve" any > inconsistencies for getting files of data into the database if this > where the only sanctioned way to add records to the database. > However, there are a number of downsides - in addition to the > considerable extra effort needed to write and support another set of > parsers just for BioSQL (without reusing BioPerl/Biopython/BioJava). > > What about BioPerl/Biopython/BioJava users who have sequence-record > objects in memory they want to record in the database? These could > have been loaded from GenBank files originally and then manipulated > (e.g. adding additional crude annotation from running BLAST). How > would they get them into the database - write them to a GenBank file > and then invoke the project neutral BioSQL provided script? No, one would use the same adaptors as before (bioperl-db for BioPerl, for instance). > I think each project needs their own ORM bindings for both loading > data into and from the database. Improving any inconsistencies in how > each ends up storing sequence files (e.g. GenBank files) can be worked > on gradually. > > [Perhaps I have read more into your comment than you intended - if I > have got the wrong end of the stick, please clarify - thanks] > > Still, a project neutral BioSQL bundled script (not depending on any > of BioPerl/Biopython/BioJava) for importing a GenBank file into a > database could serve as a "reference implementation" (the role I > currently assign to BioPerl's load_seqdatabase.pl). And if this > proves faster than load_seqdatabase.pl that's a nice bonus. > > Peter That's what I'm thinking, essentially; something that is Bio*-neutral that can be tested against. And it should be faster at least from the standpoint of not having to generate tons of objects. It's icing if it evolves past the point of a simple reference implementation into something that is useful as a fast BioSQL loader. chris From cjfields at illinois.edu Thu Jan 29 08:37:31 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 29 Jan 2009 07:37:31 -0600 Subject: [BioSQL-l] [Bioperl-l] [ANNOUNCEMENT] Alpha 1.6 releases of BioPerl-db In-Reply-To: References: Message-ID: That one may be database-dependent; it passes for mysql 5.1.26-rc. What is your db (mysql, Pg, oracle) and version? Hilmar, any ideas? chris On Jan 29, 2009, at 6:28 AM, Johann PELLET wrote: > Dear Chris, > > I have the following error on my Mac machine: (BioPerl 1.6, BioPerl- > run > 1.6) when I try to install Bioperl-db ( biosql-1.0.1): > > t/01dbadaptor.....1/23 > # Failed test in t/01dbadaptor.t at line 44. > # got: undef > # expected: '' > # Looks like you failed 1 test of 23. > t/01dbadaptor..... Dubious, test returned 1 (wstat 256, 0x100) > Failed 1/23 subtests > t/02species.......ok > t/03simpleseq.....ok > t/04swiss.........ok > t/05seqfeature....ok > t/06comment.......ok > t/07dblink........ok > t/08genbank.......ok > t/09fuzzy2........5/23 > # Failed (TODO) test in t/09fuzzy2.t at line 64. > # got: undef > # expected: 'Q9QYG8' > t/09fuzzy2........ok > t/10ensembl.......ok > t/11locuslink.....ok > t/12ontology......ok > t/13remove........ok > t/14query.........ok > t/15cluster.......ok > t/16obda..........ok > > Test Summary Report > ------------------- > t/01dbadaptor (Wstat: 256 Tests: 23 Failed: 1) > Failed test: 16 > Non-zero exit status: 1 > Files=16, Tests=1479, 15 wallclock secs ( 0.27 usr 0.10 sys + 11.15 > cusr 1.11 csys = 12.63 CPU) > Result: FAIL > Failed 1/16 test programs. 1/1479 subtests failed. > > -- -- > > Johann Pellet > IE Bioinformatique > INSERM U851, I-MAP CERVI > 21, Avenue Tony Garnier > 69365 Lyon cedex 07 France > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From michael.watson at bbsrc.ac.uk Thu Jan 29 09:41:05 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 29 Jan 2009 14:41:05 -0000 Subject: [BioSQL-l] Web front-ends to BioSQL Message-ID: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> Hi I am thinking about a project involving storage of sequences in a relational DB and of course thought of BioSQL - but I wondered if anyone has written a very quick and simple front end to the database (submission and searching) in something like CGI, mod_perl or PHP? Thanks Mick Head of Informatics Institute for Animal Health Compton Berks RG20 7NN 01635 578411 http://www.iah.ac.uk/research/bioinformatics/bioinf.shtml The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From cjfields at illinois.edu Thu Jan 29 09:54:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 29 Jan 2009 08:54:46 -0600 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu> Gbrowse, maybe? There is a BioSQL plugin for it (Bio::DB::Das::BioSQL): http://gmod.org/wiki/GBrowse#About_Databases chris On Jan 29, 2009, at 8:41 AM, michael watson (IAH-C) wrote: > Hi > > I am thinking about a project involving storage of sequences in a > relational DB and of course thought of BioSQL - but I wondered if > anyone > has written a very quick and simple front end to the database > (submission and searching) in something like CGI, mod_perl or PHP? > > Thanks > Mick > > Head of Informatics > Institute for Animal Health > Compton > Berks > RG20 7NN > 01635 578411 > > http://www.iah.ac.uk/research/bioinformatics/bioinf.shtml > > The information contained in this message may be confidential or > legally > privileged and is intended solely for the addressee. > If you have received this message in error please delete it & notify > the > originator immediately. > Unauthorised use, disclosure, copying or alteration of this message is > forbidden & may be unlawful. > The contents of this e-mail are the views of the sender and do not > necessarily represent the views of the Institute. > This email and associated attachments has been checked locally for > viruses but we can accept no responsibility once it has left our > systems. > Communications on Institute computers are monitored to secure the > effective operation of the systems and for other lawful purposes. > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From holland at eaglegenomics.com Thu Jan 29 11:10:42 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 29 Jan 2009 16:10:42 +0000 Subject: [BioSQL-l] Eagle Genomics is hiring Message-ID: <4981D502.1000905@eaglegenomics.com> Hi all, Apologies if this is inappropriate for the list, but I thought it would be a good way to reach the kind of people we're looking for. Richard ===== Senior Bioinformatics Software Developer Eagle Genomics Ltd., Cambridge, UK http://www.eaglegenomics.com/ We are a young and exciting bioinformatics company looking to revolutionise the way in which industry and academia work together. We are based at the heart of Europe's largest biotech cluster in Cambridge, UK. As we expand our client base, we're looking to build a talented and committed team of experts. We are currently looking for a software developer to work on a wide range of complex projects, and who is happy to work face-to-face with our customers. Ideally you will have had substantial prior experience working in a life science company or research institute, however we will also consider graduates with a track record in bioinformatics. In addition to your superb technical skills, you will also: * have the ability to quickly translate scientific problems into real software solutions, * be able to put technical concepts into simple language for end users to understand, * be able to pick up new skills and techniques in record time, * work well in a collaborative team environment, * be creative, innovative, and forward-thinking. You will have hands-on experience in some of the following: * Java, * Perl, * SQL query design, * Relational database schema design, * Open-source bioinformatics toolkits such as BioJava, BioPerl, BioSQL, etc., * Ensembl, * BioMart, * DAS, * Taverna, * Oracle Life Sciences Platform, * Oracle database administration, * MySQL database administration, * VMware virtual machines, * Grid computing and parallelisation. The preferred candidate will be able to work from our offices in Cambridge, but we would also consider telecommuting arrangements. We offer a competitive salary and a range of company benefits. To apply, please send your CV and cover letter as PDF documents to jobs at eaglegenomics.com. If you have any questions about the position or would like to discuss it further before applying, please use the same email address. We are only able to offer positions to EEA citizens and permanent residents, or Tier 1 migrants under the new UK points-based immigration scheme. Individual contracting arrangements could be considered but we will prefer those candidates who can work with us as employees. No agencies please. -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From jimp at compbio.dundee.ac.uk Thu Jan 29 12:44:12 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Thu, 29 Jan 2009 17:44:12 +0000 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> <49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu> Message-ID: <4981EAEC.4070508@compbio.dundee.ac.uk> Chris Fields wrote: > Gbrowse, maybe? There is a BioSQL plugin for it (Bio::DB::Das::BioSQL): > > http://gmod.org/wiki/GBrowse#About_Databases I'm also in the market for a quick and easy front end - from what I've heard from a colleague, GBrowse can be tricky to install. Also - for my application we'd like to easily gather sets of proteins and then explore their annotation. This is a little out of the scope of GBrowse. I think there might be a niche needing filling here - would anyone be interested in pooling code/resources ? Jim. -- ------------------------------------------------------------------- J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk The University of Dundee is a Scottish Registered Charity, No. SC015096. From raoul.bonnal at itb.cnr.it Thu Jan 29 10:06:37 2009 From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal) Date: Thu, 29 Jan 2009 16:06:37 +0100 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <200901291606.37472.raoul.bonnal@itb.cnr.it> Il gioved? 29 gennaio 2009 15:41:05 michael watson (IAH-C) ha scritto: > Hi > > I am thinking about a project involving storage of sequences in a > relational DB and of course thought of BioSQL - but I wondered if anyone > has written a very quick and simple front end to the database > (submission and searching) in something like CGI, mod_perl or PHP? I'm did some tests with ActiveRecord + Rails, and DataMapper + Merb, using Ruby. Using that orm the difficult is that the schema doesn't agree with their names conventions. -- Ra From gthorisson at gmail.com Thu Jan 29 13:29:08 2009 From: gthorisson at gmail.com (Gudmundur A. Thorisson) Date: Thu, 29 Jan 2009 18:29:08 +0000 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <4981EAEC.4070508@compbio.dundee.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> <49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu> <4981EAEC.4070508@compbio.dundee.ac.uk> Message-ID: <50326857-0614-4B43-909A-466403669E52@gmail.com> Jim. If a Java web-app would be acceptable as the platform for this, there is something called Molgenis developed by a group in the Netherlands that we are collaborating with. It's a Java-based code- generation framework used by several mouse genomics groups for microarray data and the like, and is under consideration by ourselves for use in our project: http://molgenis.sourceforge.net We were thinking of mixing this in with BioSQL/BioJava for certain management & curation tasks. Here's a couple of papers if you care to have a closer look: Smedley et al. Solutions for data integration in functional genomics: a critical assessment and case study. Brief Bioinformatics (2008) vol. 9 (6) pp. 532-44 Swertz et al. Beyond standardization: dynamic software infrastructures for systems biology. Nat Rev Genet (2007) vol. 8 (3) pp. 235-43 Best regards , Mummi, Leicester ----------------------------------------------------------- Gudmundur A. Thorisson, PhD student, Brookes lab Department of Genetics University of Leicester University Road Leicester, LE1 7RH, UK E-mail: gthorisson at gmail.com Tel: +44 (0)116 229 7273 On 29 Jan 2009, at 17:44, James Procter wrote: > > Chris Fields wrote: >> Gbrowse, maybe? There is a BioSQL plugin for it >> (Bio::DB::Das::BioSQL): >> >> http://gmod.org/wiki/GBrowse#About_Databases > I'm also in the market for a quick and easy front end - from what I've > heard from a colleague, GBrowse can be tricky to install. Also - for > my > application we'd like to easily gather sets of proteins and then > explore > their annotation. This is a little out of the scope of GBrowse. > > I think there might be a niche needing filling here - would anyone be > interested in pooling code/resources ? > > Jim. > > -- > ------------------------------------------------------------------- > J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group > Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk > The University of Dundee is a Scottish Registered Charity, No. > SC015096. > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From cjfields at illinois.edu Thu Jan 29 13:45:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 29 Jan 2009 12:45:05 -0600 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <4981EAEC.4070508@compbio.dundee.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> <49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu> <4981EAEC.4070508@compbio.dundee.ac.uk> Message-ID: <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> On Jan 29, 2009, at 11:44 AM, James Procter wrote: > > Chris Fields wrote: >> Gbrowse, maybe? There is a BioSQL plugin for it >> (Bio::DB::Das::BioSQL): >> >> http://gmod.org/wiki/GBrowse#About_Databases > I'm also in the market for a quick and easy front end - from what I've > heard from a colleague, GBrowse can be tricky to install. Also - for > my > application we'd like to easily gather sets of proteins and then > explore > their annotation. This is a little out of the scope of GBrowse. I don't find Gbrowse itself tricky as much as getting BioPerl installed. One can use Gbrowse for what you want but there are probably better resources (Ensembl, maybe). chris > I think there might be a niche needing filling here - would anyone be > interested in pooling code/resources ? > > Jim. > > -- > ------------------------------------------------------------------- > J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group > Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk > The University of Dundee is a Scottish Registered Charity, No. > SC015096. > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From mark.schreiber at novartis.com Thu Jan 29 21:51:34 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 30 Jan 2009 10:51:34 +0800 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> Message-ID: Hi - I have partly auto and partly manually generated an EJB 3 binding to BioSQL that can be used with JPA. Notably this uses the new EJB model not the nasty old one so it is very easy to use. As all EJB's are now plain old java beans it is also very easy to use these objects in web services and JSP pages (maybe PHP too??). Also, because the EJB's and JPA is now more flexible you don't need a full java app container (JBOSS, Glassfish) but can instead use them in standalone programs although with a container you do get other benefits of transaction control/ security/ load balance etc for free. Also if you do use a web interface the web front end will probably be in Tomcat and you can use this as a light container for talking to the biosql entity beans. If you think there will be more than a few users I would probably advocate using Glassfish or similar app server because there are many advantages that out weigh the relatively small overhead. The EJB binding is not part of BioJava but is a candiate for inclusion in BioJava3. I can provide you with code if you are interested. I would also be keen to see this get some use. Best regards, - Mark biosql-l-bounces at lists.open-bio.org wrote on 01/30/2009 02:45:05 AM: > > On Jan 29, 2009, at 11:44 AM, James Procter wrote: > > > > > Chris Fields wrote: > >> Gbrowse, maybe? There is a BioSQL plugin for it > >> (Bio::DB::Das::BioSQL): > >> > >> http://gmod.org/wiki/GBrowse#About_Databases > > I'm also in the market for a quick and easy front end - from what I've > > heard from a colleague, GBrowse can be tricky to install. Also - for > > my > > application we'd like to easily gather sets of proteins and then > > explore > > their annotation. This is a little out of the scope of GBrowse. > > I don't find Gbrowse itself tricky as much as getting BioPerl > installed. One can use Gbrowse for what you want but there are > probably better resources (Ensembl, maybe). > > chris > > > I think there might be a niche needing filling here - would anyone be > > interested in pooling code/resources ? > > > > Jim. > > > > -- > > ------------------------------------------------------------------- > > J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group > > Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk > > The University of Dundee is a Scottish Registered Charity, No. > > SC015096. > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From michael.watson at bbsrc.ac.uk Fri Jan 30 06:03:12 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Fri, 30 Jan 2009 11:03:12 -0000 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk><49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu><4981EAEC.4070508@compbio.dundee.ac.uk> <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> Dear All Thank you for the responses. I think it is clear there is a need - all over the World there are groups of various sizes who try to collate and curate sequences for their organism of choice, from fish virus databases with 200 records, to flu databases with many thousands. I'm in contact with a tiny percentage of these groups, and there is a clear need for: - common DB schema (tick, we can use BioSQL) - Web app for: - submitting new sequences - curating and editing sequences - comparing sequences - align, draw trees etc - showing sequences on maps (i.e. location of sample) - submitting sequences to GenBank - retrieving sequences from GenBank With all of the Bio* projects, this shouldn't be too hard to do, but as ever it needs bodies to do it... I took a quick look at Galaxy but that isn't really what was needed. Thanks again Mick -----Original Message----- From: biosql-l-bounces at lists.open-bio.org [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields Sent: 29 January 2009 18:45 To: James Procter Cc: biosql-l at lists.open-bio.org Subject: Re: [BioSQL-l] Web front-ends to BioSQL On Jan 29, 2009, at 11:44 AM, James Procter wrote: > > Chris Fields wrote: >> Gbrowse, maybe? There is a BioSQL plugin for it >> (Bio::DB::Das::BioSQL): >> >> http://gmod.org/wiki/GBrowse#About_Databases > I'm also in the market for a quick and easy front end - from what I've > heard from a colleague, GBrowse can be tricky to install. Also - for > my > application we'd like to easily gather sets of proteins and then > explore > their annotation. This is a little out of the scope of GBrowse. I don't find Gbrowse itself tricky as much as getting BioPerl installed. One can use Gbrowse for what you want but there are probably better resources (Ensembl, maybe). chris > I think there might be a niche needing filling here - would anyone be > interested in pooling code/resources ? > > Jim. > > -- > ------------------------------------------------------------------- > J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group > Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk > The University of Dundee is a Scottish Registered Charity, No. > SC015096. > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l _______________________________________________ BioSQL-l mailing list BioSQL-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biosql-l From hlapp at gmx.net Fri Jan 30 10:23:24 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 30 Jan 2009 10:23:24 -0500 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk><49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu><4981EAEC.4070508@compbio.dundee.ac.uk> <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net> Having such a webapp would be pretty cool, and I agree with the argument below that there are numerous small groups or individuals with this need. (we have some ourselves here ...) One word of caution as to where to look for lessons I think is the infamous GMOD gene page and standard web front-end, which has been labored on in various incarnations for more than half a decade, without producing a compelling and broadly adopted result. People's needs and technology obsessions vary from place to place. One possibly hugely complicating factor for the GMOD web front-end was that the target audience were model organism websites, which themselves have a large and diverse stakeholder community, so flexibility and configurability became overriding requirements resulting in bloat of code stacks and features. My personal take is that for this to be broadly useful, the primary target audience should probably be programmers, or programming-savvy scientists, who can extend and customize a core application at will. In other words, much in line with the philosophy behind the Bio* libraries. Other than that, keep it simple so I don't have to learn yet another (namely your templating or clever XML configuration scheme) language to extend it. I sat next to Mark when he generated a bare-bones BioSQL- binding in EJB literally in minutes, which I thought was cool. People rave about Ruby and RoR too as for ease of getting started. By far the most people out there will be familiar with Perl, but I'm not sure what the web application framework would be there that would put me at ease. In the end what may count more than anything else is critical mass even if it's not everyone's darling language. My $0.02, and I'd be keen so see what comes out of this. If there's something I can do to tip the balance towards something tangible happening, let me know. -hilmar On Jan 30, 2009, at 6:03 AM, michael watson (IAH-C) wrote: > Dear All > > Thank you for the responses. I think it is clear there is a need - > all > over the World there are groups of various sizes who try to collate > and > curate sequences for their organism of choice, from fish virus > databases > with 200 records, to flu databases with many thousands. I'm in > contact > with a tiny percentage of these groups, and there is a clear need for: > > - common DB schema (tick, we can use BioSQL) > - Web app for: > - submitting new sequences > - curating and editing sequences > - comparing sequences - align, draw trees etc > - showing sequences on maps (i.e. location of sample) > - submitting sequences to GenBank > - retrieving sequences from GenBank > > With all of the Bio* projects, this shouldn't be too hard to do, but > as > ever it needs bodies to do it... I took a quick look at Galaxy but > that > isn't really what was needed. > > Thanks again > > Mick > > -----Original Message----- > From: biosql-l-bounces at lists.open-bio.org > [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: 29 January 2009 18:45 > To: James Procter > Cc: biosql-l at lists.open-bio.org > Subject: Re: [BioSQL-l] Web front-ends to BioSQL > > > On Jan 29, 2009, at 11:44 AM, James Procter wrote: > >> >> Chris Fields wrote: >>> Gbrowse, maybe? There is a BioSQL plugin for it >>> (Bio::DB::Das::BioSQL): >>> >>> http://gmod.org/wiki/GBrowse#About_Databases >> I'm also in the market for a quick and easy front end - from what >> I've >> heard from a colleague, GBrowse can be tricky to install. Also - for >> my >> application we'd like to easily gather sets of proteins and then >> explore >> their annotation. This is a little out of the scope of GBrowse. > > I don't find Gbrowse itself tricky as much as getting BioPerl > installed. One can use Gbrowse for what you want but there are > probably better resources (Ensembl, maybe). > > chris > >> I think there might be a niche needing filling here - would anyone be >> interested in pooling code/resources ? >> >> Jim. >> >> -- >> ------------------------------------------------------------------- >> J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group >> Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk >> The University of Dundee is a Scottish Registered Charity, No. >> SC015096. >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Fri Jan 30 14:45:30 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 30 Jan 2009 13:45:30 -0600 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk><49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu><4981EAEC.4070508@compbio.dundee.ac.uk> <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> <903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net> Message-ID: <5B046A75-AFD3-4CEB-B190-A27106828E9C@illinois.edu> On Jan 30, 2009, at 9:23 AM, Hilmar Lapp wrote: > Having such a webapp would be pretty cool, and I agree with the > argument below that there are numerous small groups or individuals > with this need. (we have some ourselves here ...) > > One word of caution as to where to look for lessons I think is the > infamous GMOD gene page and standard web front-end, which has been > labored on in various incarnations for more than half a decade, > without producing a compelling and broadly adopted result. People's > needs and technology obsessions vary from place to place. > > One possibly hugely complicating factor for the GMOD web front-end > was that the target audience were model organism websites, which > themselves have a large and diverse stakeholder community, so > flexibility and configurability became overriding requirements > resulting in bloat of code stacks and features. > > My personal take is that for this to be broadly useful, the primary > target audience should probably be programmers, or programming-savvy > scientists, who can extend and customize a core application at will. > In other words, much in line with the philosophy behind the Bio* > libraries. > > Other than that, keep it simple so I don't have to learn yet another > (namely your templating or clever XML configuration scheme) language > to extend it. I sat next to Mark when he generated a bare-bones > BioSQL-binding in EJB literally in minutes, which I thought was > cool. People rave about Ruby and RoR too as for ease of getting > started. By far the most people out there will be familiar with > Perl, but I'm not sure what the web application framework would be > there that would put me at ease. In the end what may count more than > anything else is critical mass even if it's not everyone's darling > language. Perl web application framework: Catalyst and Jifty (have not tried them myself). RoR gets a lot of press, but I understand the RoR devs tend not to listen to the core ruby devs and (as a consequence) had recently run into issues with the 1.8.7 ruby release, detailed by the always-entertaining chromatic here: http://use.perl.org/~chromatic/journal/37125 chris > My $0.02, and I'd be keen so see what comes out of this. If there's > something I can do to tip the balance towards something tangible > happening, let me know. > > -hilmar From gthorisson at gmail.com Fri Jan 30 14:57:42 2009 From: gthorisson at gmail.com (Gudmundur A. Thorisson) Date: Fri, 30 Jan 2009 19:57:42 +0000 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <5B046A75-AFD3-4CEB-B190-A27106828E9C@illinois.edu> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk><49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu><4981EAEC.4070508@compbio.dundee.ac.uk> <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> <903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net> <5B046A75-AFD3-4CEB-B190-A27106828E9C@illinois.edu> Message-ID: We use Catalyst MVC framework for our project (http:// www.hgvbaseg2p.org). Very good stuff, we combine it with the DBIx::Class ORM and Template Toolkit as the templating engine. Totally recommended. Mummi On 30 Jan 2009, at 19:45, Chris Fields wrote: >> > > Perl web application framework: Catalyst and Jifty (have not tried > them myself). RoR gets a lot of press, but I understand the RoR > devs tend not to listen to the core ruby devs and (as a consequence) > had recently run into issues with the 1.8.7 ruby release, detailed > by the always-entertaining chromatic here: > > http://use.perl.org/~chromatic/journal/37125 > > chris > >> My $0.02, and I'd be keen so see what comes out of this. If there's >> something I can do to tip the balance towards something tangible >> happening, let me know. >> >> -hilmar > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From cjfields at illinois.edu Fri Jan 30 15:08:11 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 30 Jan 2009 14:08:11 -0600 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk><49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu><4981EAEC.4070508@compbio.dundee.ac.uk> <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> <903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net> <5B046A75-AFD3-4CEB-B190-A27106828E9C@illinois.edu> Message-ID: <99475964-CFB3-4A27-8024-8A14876533E0@illinois.edu> Another article (as pointed out by Heikki on bioperl-l): http://www.heise-online.co.uk/open/Healthcheck-Perl-The-Perl-Future--/features/112388/0 The last section is all on MVC-oriented frameworks. chris On Jan 30, 2009, at 1:57 PM, Gudmundur A. Thorisson wrote: > We use Catalyst MVC framework for our project (http://www.hgvbaseg2p.org > ). Very good stuff, we combine it with the DBIx::Class ORM and > Template Toolkit as the templating engine. Totally recommended. > > > Mummi > > On 30 Jan 2009, at 19:45, Chris Fields wrote: >>> >> >> Perl web application framework: Catalyst and Jifty (have not tried >> them myself). RoR gets a lot of press, but I understand the RoR >> devs tend not to listen to the core ruby devs and (as a >> consequence) had recently run into issues with the 1.8.7 ruby >> release, detailed by the always-entertaining chromatic here: >> >> http://use.perl.org/~chromatic/journal/37125 >> >> chris >> >>> My $0.02, and I'd be keen so see what comes out of this. If >>> there's something I can do to tip the balance towards something >>> tangible happening, let me know. >>> >>> -hilmar >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From markjschreiber at gmail.com Sat Jan 31 06:03:53 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 31 Jan 2009 19:03:53 +0800 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <99475964-CFB3-4A27-8024-8A14876533E0@illinois.edu> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> <49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu> <4981EAEC.4070508@compbio.dundee.ac.uk> <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> <903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net> <5B046A75-AFD3-4CEB-B190-A27106828E9C@illinois.edu> <99475964-CFB3-4A27-8024-8A14876533E0@illinois.edu> Message-ID: <93b45ca50901310303t37905e8ak3819c05f4b94c287@mail.gmail.com> Hi - My feeling is that the diversity of languages and frameworks within languages would mean that a generic web front end to BioSQL will and should never materialize. What would be a lot more sensible is a generic API in the form of a webservice or collection of webservices that could be used by (theoretically) any web frame work to generate a website. User preferences and requirements will be far too diverse for a generic web front end. - Mark On 1/31/09, Chris Fields wrote: > Another article (as pointed out by Heikki on bioperl-l): > > http://www.heise-online.co.uk/open/Healthcheck-Perl-The-Perl-Future--/features/112388/0 > > The last section is all on MVC-oriented frameworks. > > chris > > On Jan 30, 2009, at 1:57 PM, Gudmundur A. Thorisson wrote: > >> We use Catalyst MVC framework for our project (http://www.hgvbaseg2p.org >> ). Very good stuff, we combine it with the DBIx::Class ORM and >> Template Toolkit as the templating engine. Totally recommended. >> >> >> Mummi >> >> On 30 Jan 2009, at 19:45, Chris Fields wrote: >>>> >>> >>> Perl web application framework: Catalyst and Jifty (have not tried >>> them myself). RoR gets a lot of press, but I understand the RoR >>> devs tend not to listen to the core ruby devs and (as a >>> consequence) had recently run into issues with the 1.8.7 ruby >>> release, detailed by the always-entertaining chromatic here: >>> >>> http://use.perl.org/~chromatic/journal/37125 >>> >>> chris >>> >>>> My $0.02, and I'd be keen so see what comes out of this. If >>>> there's something I can do to tip the balance towards something >>>> tangible happening, let me know. >>>> >>>> -hilmar >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > From gwu at molbio.mgh.harvard.edu Tue Jan 27 22:39:57 2009 From: gwu at molbio.mgh.harvard.edu (gwu) Date: Tue, 27 Jan 2009 17:39:57 -0500 Subject: [BioSQL-l] Genbank loading time Message-ID: <497F8D3D.5060907@molbio.mgh.harvard.edu> Hi Everyone, I recently visited the BioWarehouse web site and the document shows loading the whole Genbank into their database takes the data loader 68 hours for MySQL, and 27.5 hours for Oracle. So I wonder if there is a similar test done with BioSQL? Gang Wu From holland at eaglegenomics.com Tue Jan 27 22:57:59 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 27 Jan 2009 22:57:59 +0000 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <497F8D3D.5060907@molbio.mgh.harvard.edu> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> Message-ID: <497F9177.7040309@eaglegenomics.com> It would depend on the toolkit you use. BioWarehouse is a complete API, whereas BioSQL is just a schema and the way in which it is populated (and therefore how long that takes) depends on your toolkit. Currently I'm aware of loaders existing for BioJava, BioPerl, and possibly also BioPython. However each of them load the same data in subtly different ways, so can't be directly compared in terms of which one is faster than the other. I vaguely remember seeing some performance figures for the BioJava/Genbank/BioSQL combination somewhere, but it's been a while! I'm not sure where they were documented though - I certainly haven't got them written down anywhere. Mark Schreiber might know as he definitely did some testing of this - Mark, can you remember what the figures were for BioJava? As for BioPerl/BioPython/etc. I expect their respective project authors will respond to this thread accordingly with the figures from their own domains! cheers, Richard gwu wrote: > Hi Everyone, > > I recently visited the BioWarehouse web site and the document shows > loading the whole Genbank into their database takes the data loader 68 > hours for MySQL, and 27.5 hours for Oracle. So I wonder if there is a > similar test done with BioSQL? > > Gang Wu > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From hlapp at gmx.net Wed Jan 28 05:09:04 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 28 Jan 2009 00:09:04 -0500 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <497F9177.7040309@eaglegenomics.com> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> Message-ID: <72E5157F-02BC-40F6-A59D-3E887A5207C8@gmx.net> The loader for BioPerl is load_seqdatabase.pl, which is part of bioperl-db. With machines current as of 3-4 years ago, I saw upload speeds of between 5 and 15 sequences per second for richly annotated sequences (human/mouse RefSeqs). If you are talking about all of GenBank, the far majority of that will be ESTs and sequencing reads (do you really want to load those?), which are typically sparsely annotated if at all, and so should be faster. mRNA and cDNA sequences will be more in the above range. I have never loaded all of GenBank into a database (and I'm not sure why anyone would want to do this) and so don't have a comparison figure for the total for that. Finally, several instances of load_seqdatabase.pl can be nicely run in parallel on multi-core machines. -hilmar On Jan 27, 2009, at 5:57 PM, Richard Holland wrote: > It would depend on the toolkit you use. BioWarehouse is a complete > API, > whereas BioSQL is just a schema and the way in which it is populated > (and therefore how long that takes) depends on your toolkit. > > Currently I'm aware of loaders existing for BioJava, BioPerl, and > possibly also BioPython. However each of them load the same data in > subtly different ways, so can't be directly compared in terms of which > one is faster than the other. > > I vaguely remember seeing some performance figures for the > BioJava/Genbank/BioSQL combination somewhere, but it's been a while! > I'm > not sure where they were documented though - I certainly haven't got > them written down anywhere. Mark Schreiber might know as he definitely > did some testing of this - Mark, can you remember what the figures > were > for BioJava? > > As for BioPerl/BioPython/etc. I expect their respective project > authors > will respond to this thread accordingly with the figures from their > own > domains! > > cheers, > Richard > > gwu wrote: >> Hi Everyone, >> >> I recently visited the BioWarehouse web site and the document shows >> loading the whole Genbank into their database takes the data loader >> 68 >> hours for MySQL, and 27.5 hours for Oracle. So I wonder if there is a >> similar test done with BioSQL? >> >> Gang Wu >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Wed Jan 28 11:50:50 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 28 Jan 2009 11:50:50 +0000 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <497F9177.7040309@eaglegenomics.com> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> Message-ID: <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> On Tue, Jan 27, 2009 at 10:57 PM, Richard Holland wrote: > > As for BioPerl/BioPython/etc. I expect their respective project authors > will respond to this thread accordingly with the figures from their own > domains! I can tell you importing GenBank files into BioSQL with Biopython is faster than BioPerl, sometimes several times faster, but this will depend on the nature of the files (e.g. genomes versus ESTs). http://lists.open-bio.org/pipermail/biosql-l/2008-August/001320.html http://lists.open-bio.org/pipermail/biopython-dev/2008-April/003625.html I don't have any BioJava comparison figures. In any case, as Richard points out, there will be slight differences in the different Bio* tools how exactly how the data is parsed and stored. I've never tries to import the whole of GenBank, so I don't have any numbers for you there. Peter (Biopython) From biopython at maubp.freeserve.co.uk Wed Jan 28 16:40:55 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 28 Jan 2009 16:40:55 +0000 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> Message-ID: <320fb6e00901280840q796bf5cawf085ad3a7c18bbdd@mail.gmail.com> On Wed, Jan 28, 2009 at 4:29 PM, Chris Fields wrote: > > I don't think sequence loading via load_seqdatabase.pl uses BioPerl. If one > uses BioPerl and bioperl-db the following can explain at least some of the > reason why loading is slow: > http://www.bioperl.org/wiki/Why_BioPerl_is_slow > We also go through the extra hand-wringing with Bio::Species objects > (something I don't think the other Bio* worry about). Looking at the source code for the load_seqdatabase.pl script included with bioperl-db, my impression is it uses Bio::DB::BioDB to talk to the database, and Bio::SeqIO to parse the input sequence files (in this case, Bio::SeqIO::genbank is used). See: http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-db/trunk/scripts/biosql/load_seqdatabase.pl > Regardless, it's not an easy problem to work around. There are such things > as Moose, and Perl6 is now in alpha... I'll take your word for it - I'm in no position to improve anyone's Perl code ;) Peter From cjfields at illinois.edu Wed Jan 28 16:29:50 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 28 Jan 2009 10:29:50 -0600 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> Message-ID: <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> On Jan 28, 2009, at 5:50 AM, Peter wrote: > On Tue, Jan 27, 2009 at 10:57 PM, Richard Holland wrote: >> >> As for BioPerl/BioPython/etc. I expect their respective project >> authors >> will respond to this thread accordingly with the figures from their >> own >> domains! > > I can tell you importing GenBank files into BioSQL with Biopython is > faster than BioPerl, sometimes several times faster, but this will > depend on the nature of the files (e.g. genomes versus ESTs). > http://lists.open-bio.org/pipermail/biosql-l/2008-August/001320.html > http://lists.open-bio.org/pipermail/biopython-dev/2008-April/003625.html I don't think sequence loading via load_seqdatabase.pl uses BioPerl. If one uses BioPerl and bioperl-db the following can explain at least some of the reason why loading is slow: http://www.bioperl.org/wiki/Why_BioPerl_is_slow We also go through the extra hand-wringing with Bio::Species objects (something I don't think the other Bio* worry about). Regardless, it's not an easy problem to work around. There are such things as Moose, and Perl6 is now in alpha... chris > I don't have any BioJava comparison figures. In any case, as Richard > points out, there will be slight differences in the different Bio* > tools how exactly how the data is parsed and stored. > > I've never tries to import the whole of GenBank, so I don't have any > numbers for you there. > > Peter > (Biopython) From cjfields at illinois.edu Wed Jan 28 16:53:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 28 Jan 2009 10:53:49 -0600 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <320fb6e00901280840q796bf5cawf085ad3a7c18bbdd@mail.gmail.com> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> <320fb6e00901280840q796bf5cawf085ad3a7c18bbdd@mail.gmail.com> Message-ID: <37CEB7ED-ECD6-4186-BF84-72B704B3A5E8@illinois.edu> On Jan 28, 2009, at 10:40 AM, Peter wrote: > On Wed, Jan 28, 2009 at 4:29 PM, Chris Fields > wrote: >> >> I don't think sequence loading via load_seqdatabase.pl uses >> BioPerl. If one >> uses BioPerl and bioperl-db the following can explain at least some >> of the >> reason why loading is slow: >> http://www.bioperl.org/wiki/Why_BioPerl_is_slow >> We also go through the extra hand-wringing with Bio::Species objects >> (something I don't think the other Bio* worry about). > > Looking at the source code for the load_seqdatabase.pl script included > with bioperl-db, my impression is it uses Bio::DB::BioDB to talk to > the database, and Bio::SeqIO to parse the input sequence files (in > this case, Bio::SeqIO::genbank is used). See: > > http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-db/trunk/scripts/biosql/load_seqdatabase.pl My bad, I'm thinking of the taxonomy loader (need more coffee). I'm wondering, though, whether it would be feasible to have a direct loader for the most common database formats (GenBank/EMBL/Swiss), something similar to the taxonomy loader that doesn't rely on any specific Bio* package. >> Regardless, it's not an easy problem to work around. There are >> such things >> as Moose, and Perl6 is now in alpha... > > I'll take your word for it - I'm in no position to improve anyone's > Perl code ;) > > Peter Well, the problem lies with perl5's welded-on OO which isn't easy to work around, particularly inheritance issues. Supposedly Moose helps speed things up a bit; it doesn't hurt that it is based somewhat on perl6's Objects: http://feather.perl6.nl/syn/S12.html chris From hlapp at gmx.net Wed Jan 28 17:06:01 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 28 Jan 2009 12:06:01 -0500 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> Message-ID: <0BD2B914-3E57-4266-AE4E-EA8B2F1DD307@gmx.net> On Jan 28, 2009, at 11:29 AM, Chris Fields wrote: > I don't think sequence loading via load_seqdatabase.pl uses BioPerl. It does, actually. All the input parsing is done by BioPerl. Bioperl- db only does the persistence, and the script itself handles all the command line options, opens files, yadda yadda ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Wed Jan 28 17:17:57 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 28 Jan 2009 17:17:57 +0000 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <37CEB7ED-ECD6-4186-BF84-72B704B3A5E8@illinois.edu> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> <320fb6e00901280840q796bf5cawf085ad3a7c18bbdd@mail.gmail.com> <37CEB7ED-ECD6-4186-BF84-72B704B3A5E8@illinois.edu> Message-ID: <320fb6e00901280917q42c39590jf54e0144c0e6bc28@mail.gmail.com> On 1/28/09, Chris Fields wrote: > > My bad, I'm thinking of the taxonomy loader (need more coffee). I'm > wondering, though, whether it would be feasible to have a direct loader for > the most common database formats (GenBank/EMBL/Swiss), something > similar to the taxonomy loader that doesn't rely on any specific Bio* package. > You could re-invent the wheel, and write yet another GenBank/EMBL/Swiss parser in standalone perl for use within load_seqdatabase.pl but I really don't see any point to this. Reusing the BioPerl parser seems most sensible, especially given that bioperl-db is an extension to bioperl in the first place - and the BioPerl parsers already exist and are well tested. Peter From cjfields at illinois.edu Wed Jan 28 17:47:20 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 28 Jan 2009 11:47:20 -0600 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <320fb6e00901280917q42c39590jf54e0144c0e6bc28@mail.gmail.com> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> <320fb6e00901280840q796bf5cawf085ad3a7c18bbdd@mail.gmail.com> <37CEB7ED-ECD6-4186-BF84-72B704B3A5E8@illinois.edu> <320fb6e00901280917q42c39590jf54e0144c0e6bc28@mail.gmail.com> Message-ID: On Jan 28, 2009, at 11:17 AM, Peter wrote: > On 1/28/09, Chris Fields wrote: >> >> My bad, I'm thinking of the taxonomy loader (need more coffee). I'm >> wondering, though, whether it would be feasible to have a direct >> loader for >> the most common database formats (GenBank/EMBL/Swiss), something >> similar to the taxonomy loader that doesn't rely on any specific >> Bio* package. >> > > You could re-invent the wheel, and write yet another > GenBank/EMBL/Swiss parser in standalone perl for use within > load_seqdatabase.pl but I really don't see any point to this. Reusing > the BioPerl parser seems most sensible, especially given that > bioperl-db is an extension to bioperl in the first place - and the > BioPerl parsers already exist and are well tested. > > Peter My point is, instead of first mapping record data to a specific object/ class then mapping the object data to the database, bypass the object completely and generically map relevant data directly in the database according to the BioSQL schema. If anything this may force some consistency between the various Bio* languages. chris From biopython at maubp.freeserve.co.uk Wed Jan 28 18:18:03 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 28 Jan 2009 18:18:03 +0000 Subject: [BioSQL-l] Genbank loading time In-Reply-To: References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> <320fb6e00901280840q796bf5cawf085ad3a7c18bbdd@mail.gmail.com> <37CEB7ED-ECD6-4186-BF84-72B704B3A5E8@illinois.edu> <320fb6e00901280917q42c39590jf54e0144c0e6bc28@mail.gmail.com> Message-ID: <320fb6e00901281018t3148af9exda473c101c15bcc8@mail.gmail.com> >> You could re-invent the wheel, and write yet another >> GenBank/EMBL/Swiss parser in standalone perl for use within >> load_seqdatabase.pl but I really don't see any point to this. Reusing >> the BioPerl parser seems most sensible, especially given that >> bioperl-db is an extension to bioperl in the first place - and the >> BioPerl parsers already exist and are well tested. >> >> Peter > > My point is, instead of first mapping record data to a specific object/class > then mapping the object data to the database, bypass the object completely > and generically map relevant data directly in the database according to the > BioSQL schema. > > If anything this may force some consistency between the various Bio* > languages. > > chris Ah - so rather than using BioPerl/Biopython/BioJava to import your sequence files into a BioSQL database, you'd like BioSQL to come with its own script that does the job? It would "solve" any inconsistencies for getting files of data into the database if this where the only sanctioned way to add records to the database. However, there are a number of downsides - in addition to the considerable extra effort needed to write and support another set of parsers just for BioSQL (without reusing BioPerl/Biopython/BioJava). What about BioPerl/Biopython/BioJava users who have sequence-record objects in memory they want to record in the database? These could have been loaded from GenBank files originally and then manipulated (e.g. adding additional crude annotation from running BLAST). How would they get them into the database - write them to a GenBank file and then invoke the project neutral BioSQL provided script? I think each project needs their own ORM bindings for both loading data into and from the database. Improving any inconsistencies in how each ends up storing sequence files (e.g. GenBank files) can be worked on gradually. [Perhaps I have read more into your comment than you intended - if I have got the wrong end of the stick, please clarify - thanks] Still, a project neutral BioSQL bundled script (not depending on any of BioPerl/Biopython/BioJava) for importing a GenBank file into a database could serve as a "reference implementation" (the role I currently assign to BioPerl's load_seqdatabase.pl). And if this proves faster than load_seqdatabase.pl that's a nice bonus. Peter From cjfields at illinois.edu Wed Jan 28 18:57:25 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 28 Jan 2009 12:57:25 -0600 Subject: [BioSQL-l] Genbank loading time In-Reply-To: <320fb6e00901281018t3148af9exda473c101c15bcc8@mail.gmail.com> References: <497F8D3D.5060907@molbio.mgh.harvard.edu> <497F9177.7040309@eaglegenomics.com> <320fb6e00901280350g363aa41ai7edc8181c606e26e@mail.gmail.com> <556F8B66-D407-46C1-A4AF-79469D9814FA@illinois.edu> <320fb6e00901280840q796bf5cawf085ad3a7c18bbdd@mail.gmail.com> <37CEB7ED-ECD6-4186-BF84-72B704B3A5E8@illinois.edu> <320fb6e00901280917q42c39590jf54e0144c0e6bc28@mail.gmail.com> <320fb6e00901281018t3148af9exda473c101c15bcc8@mail.gmail.com> Message-ID: <770D510F-C6EA-455E-B017-766587E1B23F@illinois.edu> On Jan 28, 2009, at 12:18 PM, Peter wrote: >>> You could re-invent the wheel, and write yet another >>> GenBank/EMBL/Swiss parser in standalone perl for use within >>> load_seqdatabase.pl but I really don't see any point to this. >>> Reusing >>> the BioPerl parser seems most sensible, especially given that >>> bioperl-db is an extension to bioperl in the first place - and the >>> BioPerl parsers already exist and are well tested. >>> >>> Peter >> >> My point is, instead of first mapping record data to a specific >> object/class >> then mapping the object data to the database, bypass the object >> completely >> and generically map relevant data directly in the database >> according to the >> BioSQL schema. >> >> If anything this may force some consistency between the various Bio* >> languages. >> >> chris > > Ah - so rather than using BioPerl/Biopython/BioJava to import your > sequence files into a BioSQL database, you'd like BioSQL to come with > its own script that does the job? It would "solve" any > inconsistencies for getting files of data into the database if this > where the only sanctioned way to add records to the database. > However, there are a number of downsides - in addition to the > considerable extra effort needed to write and support another set of > parsers just for BioSQL (without reusing BioPerl/Biopython/BioJava). > > What about BioPerl/Biopython/BioJava users who have sequence-record > objects in memory they want to record in the database? These could > have been loaded from GenBank files originally and then manipulated > (e.g. adding additional crude annotation from running BLAST). How > would they get them into the database - write them to a GenBank file > and then invoke the project neutral BioSQL provided script? No, one would use the same adaptors as before (bioperl-db for BioPerl, for instance). > I think each project needs their own ORM bindings for both loading > data into and from the database. Improving any inconsistencies in how > each ends up storing sequence files (e.g. GenBank files) can be worked > on gradually. > > [Perhaps I have read more into your comment than you intended - if I > have got the wrong end of the stick, please clarify - thanks] > > Still, a project neutral BioSQL bundled script (not depending on any > of BioPerl/Biopython/BioJava) for importing a GenBank file into a > database could serve as a "reference implementation" (the role I > currently assign to BioPerl's load_seqdatabase.pl). And if this > proves faster than load_seqdatabase.pl that's a nice bonus. > > Peter That's what I'm thinking, essentially; something that is Bio*-neutral that can be tested against. And it should be faster at least from the standpoint of not having to generate tons of objects. It's icing if it evolves past the point of a simple reference implementation into something that is useful as a fast BioSQL loader. chris From cjfields at illinois.edu Thu Jan 29 13:37:31 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 29 Jan 2009 07:37:31 -0600 Subject: [BioSQL-l] [Bioperl-l] [ANNOUNCEMENT] Alpha 1.6 releases of BioPerl-db In-Reply-To: References: Message-ID: That one may be database-dependent; it passes for mysql 5.1.26-rc. What is your db (mysql, Pg, oracle) and version? Hilmar, any ideas? chris On Jan 29, 2009, at 6:28 AM, Johann PELLET wrote: > Dear Chris, > > I have the following error on my Mac machine: (BioPerl 1.6, BioPerl- > run > 1.6) when I try to install Bioperl-db ( biosql-1.0.1): > > t/01dbadaptor.....1/23 > # Failed test in t/01dbadaptor.t at line 44. > # got: undef > # expected: '' > # Looks like you failed 1 test of 23. > t/01dbadaptor..... Dubious, test returned 1 (wstat 256, 0x100) > Failed 1/23 subtests > t/02species.......ok > t/03simpleseq.....ok > t/04swiss.........ok > t/05seqfeature....ok > t/06comment.......ok > t/07dblink........ok > t/08genbank.......ok > t/09fuzzy2........5/23 > # Failed (TODO) test in t/09fuzzy2.t at line 64. > # got: undef > # expected: 'Q9QYG8' > t/09fuzzy2........ok > t/10ensembl.......ok > t/11locuslink.....ok > t/12ontology......ok > t/13remove........ok > t/14query.........ok > t/15cluster.......ok > t/16obda..........ok > > Test Summary Report > ------------------- > t/01dbadaptor (Wstat: 256 Tests: 23 Failed: 1) > Failed test: 16 > Non-zero exit status: 1 > Files=16, Tests=1479, 15 wallclock secs ( 0.27 usr 0.10 sys + 11.15 > cusr 1.11 csys = 12.63 CPU) > Result: FAIL > Failed 1/16 test programs. 1/1479 subtests failed. > > -- -- > > Johann Pellet > IE Bioinformatique > INSERM U851, I-MAP CERVI > 21, Avenue Tony Garnier > 69365 Lyon cedex 07 France > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From michael.watson at bbsrc.ac.uk Thu Jan 29 14:41:05 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 29 Jan 2009 14:41:05 -0000 Subject: [BioSQL-l] Web front-ends to BioSQL Message-ID: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> Hi I am thinking about a project involving storage of sequences in a relational DB and of course thought of BioSQL - but I wondered if anyone has written a very quick and simple front end to the database (submission and searching) in something like CGI, mod_perl or PHP? Thanks Mick Head of Informatics Institute for Animal Health Compton Berks RG20 7NN 01635 578411 http://www.iah.ac.uk/research/bioinformatics/bioinf.shtml The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From cjfields at illinois.edu Thu Jan 29 14:54:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 29 Jan 2009 08:54:46 -0600 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu> Gbrowse, maybe? There is a BioSQL plugin for it (Bio::DB::Das::BioSQL): http://gmod.org/wiki/GBrowse#About_Databases chris On Jan 29, 2009, at 8:41 AM, michael watson (IAH-C) wrote: > Hi > > I am thinking about a project involving storage of sequences in a > relational DB and of course thought of BioSQL - but I wondered if > anyone > has written a very quick and simple front end to the database > (submission and searching) in something like CGI, mod_perl or PHP? > > Thanks > Mick > > Head of Informatics > Institute for Animal Health > Compton > Berks > RG20 7NN > 01635 578411 > > http://www.iah.ac.uk/research/bioinformatics/bioinf.shtml > > The information contained in this message may be confidential or > legally > privileged and is intended solely for the addressee. > If you have received this message in error please delete it & notify > the > originator immediately. > Unauthorised use, disclosure, copying or alteration of this message is > forbidden & may be unlawful. > The contents of this e-mail are the views of the sender and do not > necessarily represent the views of the Institute. > This email and associated attachments has been checked locally for > viruses but we can accept no responsibility once it has left our > systems. > Communications on Institute computers are monitored to secure the > effective operation of the systems and for other lawful purposes. > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From holland at eaglegenomics.com Thu Jan 29 16:10:42 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 29 Jan 2009 16:10:42 +0000 Subject: [BioSQL-l] Eagle Genomics is hiring Message-ID: <4981D502.1000905@eaglegenomics.com> Hi all, Apologies if this is inappropriate for the list, but I thought it would be a good way to reach the kind of people we're looking for. Richard ===== Senior Bioinformatics Software Developer Eagle Genomics Ltd., Cambridge, UK http://www.eaglegenomics.com/ We are a young and exciting bioinformatics company looking to revolutionise the way in which industry and academia work together. We are based at the heart of Europe's largest biotech cluster in Cambridge, UK. As we expand our client base, we're looking to build a talented and committed team of experts. We are currently looking for a software developer to work on a wide range of complex projects, and who is happy to work face-to-face with our customers. Ideally you will have had substantial prior experience working in a life science company or research institute, however we will also consider graduates with a track record in bioinformatics. In addition to your superb technical skills, you will also: * have the ability to quickly translate scientific problems into real software solutions, * be able to put technical concepts into simple language for end users to understand, * be able to pick up new skills and techniques in record time, * work well in a collaborative team environment, * be creative, innovative, and forward-thinking. You will have hands-on experience in some of the following: * Java, * Perl, * SQL query design, * Relational database schema design, * Open-source bioinformatics toolkits such as BioJava, BioPerl, BioSQL, etc., * Ensembl, * BioMart, * DAS, * Taverna, * Oracle Life Sciences Platform, * Oracle database administration, * MySQL database administration, * VMware virtual machines, * Grid computing and parallelisation. The preferred candidate will be able to work from our offices in Cambridge, but we would also consider telecommuting arrangements. We offer a competitive salary and a range of company benefits. To apply, please send your CV and cover letter as PDF documents to jobs at eaglegenomics.com. If you have any questions about the position or would like to discuss it further before applying, please use the same email address. We are only able to offer positions to EEA citizens and permanent residents, or Tier 1 migrants under the new UK points-based immigration scheme. Individual contracting arrangements could be considered but we will prefer those candidates who can work with us as employees. No agencies please. -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From jimp at compbio.dundee.ac.uk Thu Jan 29 17:44:12 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Thu, 29 Jan 2009 17:44:12 +0000 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> <49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu> Message-ID: <4981EAEC.4070508@compbio.dundee.ac.uk> Chris Fields wrote: > Gbrowse, maybe? There is a BioSQL plugin for it (Bio::DB::Das::BioSQL): > > http://gmod.org/wiki/GBrowse#About_Databases I'm also in the market for a quick and easy front end - from what I've heard from a colleague, GBrowse can be tricky to install. Also - for my application we'd like to easily gather sets of proteins and then explore their annotation. This is a little out of the scope of GBrowse. I think there might be a niche needing filling here - would anyone be interested in pooling code/resources ? Jim. -- ------------------------------------------------------------------- J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk The University of Dundee is a Scottish Registered Charity, No. SC015096. From raoul.bonnal at itb.cnr.it Thu Jan 29 15:06:37 2009 From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal) Date: Thu, 29 Jan 2009 16:06:37 +0100 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <200901291606.37472.raoul.bonnal@itb.cnr.it> Il gioved? 29 gennaio 2009 15:41:05 michael watson (IAH-C) ha scritto: > Hi > > I am thinking about a project involving storage of sequences in a > relational DB and of course thought of BioSQL - but I wondered if anyone > has written a very quick and simple front end to the database > (submission and searching) in something like CGI, mod_perl or PHP? I'm did some tests with ActiveRecord + Rails, and DataMapper + Merb, using Ruby. Using that orm the difficult is that the schema doesn't agree with their names conventions. -- Ra From gthorisson at gmail.com Thu Jan 29 18:29:08 2009 From: gthorisson at gmail.com (Gudmundur A. Thorisson) Date: Thu, 29 Jan 2009 18:29:08 +0000 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <4981EAEC.4070508@compbio.dundee.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> <49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu> <4981EAEC.4070508@compbio.dundee.ac.uk> Message-ID: <50326857-0614-4B43-909A-466403669E52@gmail.com> Jim. If a Java web-app would be acceptable as the platform for this, there is something called Molgenis developed by a group in the Netherlands that we are collaborating with. It's a Java-based code- generation framework used by several mouse genomics groups for microarray data and the like, and is under consideration by ourselves for use in our project: http://molgenis.sourceforge.net We were thinking of mixing this in with BioSQL/BioJava for certain management & curation tasks. Here's a couple of papers if you care to have a closer look: Smedley et al. Solutions for data integration in functional genomics: a critical assessment and case study. Brief Bioinformatics (2008) vol. 9 (6) pp. 532-44 Swertz et al. Beyond standardization: dynamic software infrastructures for systems biology. Nat Rev Genet (2007) vol. 8 (3) pp. 235-43 Best regards , Mummi, Leicester ----------------------------------------------------------- Gudmundur A. Thorisson, PhD student, Brookes lab Department of Genetics University of Leicester University Road Leicester, LE1 7RH, UK E-mail: gthorisson at gmail.com Tel: +44 (0)116 229 7273 On 29 Jan 2009, at 17:44, James Procter wrote: > > Chris Fields wrote: >> Gbrowse, maybe? There is a BioSQL plugin for it >> (Bio::DB::Das::BioSQL): >> >> http://gmod.org/wiki/GBrowse#About_Databases > I'm also in the market for a quick and easy front end - from what I've > heard from a colleague, GBrowse can be tricky to install. Also - for > my > application we'd like to easily gather sets of proteins and then > explore > their annotation. This is a little out of the scope of GBrowse. > > I think there might be a niche needing filling here - would anyone be > interested in pooling code/resources ? > > Jim. > > -- > ------------------------------------------------------------------- > J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group > Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk > The University of Dundee is a Scottish Registered Charity, No. > SC015096. > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From cjfields at illinois.edu Thu Jan 29 18:45:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 29 Jan 2009 12:45:05 -0600 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <4981EAEC.4070508@compbio.dundee.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> <49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu> <4981EAEC.4070508@compbio.dundee.ac.uk> Message-ID: <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> On Jan 29, 2009, at 11:44 AM, James Procter wrote: > > Chris Fields wrote: >> Gbrowse, maybe? There is a BioSQL plugin for it >> (Bio::DB::Das::BioSQL): >> >> http://gmod.org/wiki/GBrowse#About_Databases > I'm also in the market for a quick and easy front end - from what I've > heard from a colleague, GBrowse can be tricky to install. Also - for > my > application we'd like to easily gather sets of proteins and then > explore > their annotation. This is a little out of the scope of GBrowse. I don't find Gbrowse itself tricky as much as getting BioPerl installed. One can use Gbrowse for what you want but there are probably better resources (Ensembl, maybe). chris > I think there might be a niche needing filling here - would anyone be > interested in pooling code/resources ? > > Jim. > > -- > ------------------------------------------------------------------- > J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group > Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk > The University of Dundee is a Scottish Registered Charity, No. > SC015096. > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From mark.schreiber at novartis.com Fri Jan 30 02:51:34 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 30 Jan 2009 10:51:34 +0800 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> Message-ID: Hi - I have partly auto and partly manually generated an EJB 3 binding to BioSQL that can be used with JPA. Notably this uses the new EJB model not the nasty old one so it is very easy to use. As all EJB's are now plain old java beans it is also very easy to use these objects in web services and JSP pages (maybe PHP too??). Also, because the EJB's and JPA is now more flexible you don't need a full java app container (JBOSS, Glassfish) but can instead use them in standalone programs although with a container you do get other benefits of transaction control/ security/ load balance etc for free. Also if you do use a web interface the web front end will probably be in Tomcat and you can use this as a light container for talking to the biosql entity beans. If you think there will be more than a few users I would probably advocate using Glassfish or similar app server because there are many advantages that out weigh the relatively small overhead. The EJB binding is not part of BioJava but is a candiate for inclusion in BioJava3. I can provide you with code if you are interested. I would also be keen to see this get some use. Best regards, - Mark biosql-l-bounces at lists.open-bio.org wrote on 01/30/2009 02:45:05 AM: > > On Jan 29, 2009, at 11:44 AM, James Procter wrote: > > > > > Chris Fields wrote: > >> Gbrowse, maybe? There is a BioSQL plugin for it > >> (Bio::DB::Das::BioSQL): > >> > >> http://gmod.org/wiki/GBrowse#About_Databases > > I'm also in the market for a quick and easy front end - from what I've > > heard from a colleague, GBrowse can be tricky to install. Also - for > > my > > application we'd like to easily gather sets of proteins and then > > explore > > their annotation. This is a little out of the scope of GBrowse. > > I don't find Gbrowse itself tricky as much as getting BioPerl > installed. One can use Gbrowse for what you want but there are > probably better resources (Ensembl, maybe). > > chris > > > I think there might be a niche needing filling here - would anyone be > > interested in pooling code/resources ? > > > > Jim. > > > > -- > > ------------------------------------------------------------------- > > J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group > > Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk > > The University of Dundee is a Scottish Registered Charity, No. > > SC015096. > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From michael.watson at bbsrc.ac.uk Fri Jan 30 11:03:12 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Fri, 30 Jan 2009 11:03:12 -0000 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk><49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu><4981EAEC.4070508@compbio.dundee.ac.uk> <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> Dear All Thank you for the responses. I think it is clear there is a need - all over the World there are groups of various sizes who try to collate and curate sequences for their organism of choice, from fish virus databases with 200 records, to flu databases with many thousands. I'm in contact with a tiny percentage of these groups, and there is a clear need for: - common DB schema (tick, we can use BioSQL) - Web app for: - submitting new sequences - curating and editing sequences - comparing sequences - align, draw trees etc - showing sequences on maps (i.e. location of sample) - submitting sequences to GenBank - retrieving sequences from GenBank With all of the Bio* projects, this shouldn't be too hard to do, but as ever it needs bodies to do it... I took a quick look at Galaxy but that isn't really what was needed. Thanks again Mick -----Original Message----- From: biosql-l-bounces at lists.open-bio.org [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields Sent: 29 January 2009 18:45 To: James Procter Cc: biosql-l at lists.open-bio.org Subject: Re: [BioSQL-l] Web front-ends to BioSQL On Jan 29, 2009, at 11:44 AM, James Procter wrote: > > Chris Fields wrote: >> Gbrowse, maybe? There is a BioSQL plugin for it >> (Bio::DB::Das::BioSQL): >> >> http://gmod.org/wiki/GBrowse#About_Databases > I'm also in the market for a quick and easy front end - from what I've > heard from a colleague, GBrowse can be tricky to install. Also - for > my > application we'd like to easily gather sets of proteins and then > explore > their annotation. This is a little out of the scope of GBrowse. I don't find Gbrowse itself tricky as much as getting BioPerl installed. One can use Gbrowse for what you want but there are probably better resources (Ensembl, maybe). chris > I think there might be a niche needing filling here - would anyone be > interested in pooling code/resources ? > > Jim. > > -- > ------------------------------------------------------------------- > J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group > Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk > The University of Dundee is a Scottish Registered Charity, No. > SC015096. > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l _______________________________________________ BioSQL-l mailing list BioSQL-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biosql-l From hlapp at gmx.net Fri Jan 30 15:23:24 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 30 Jan 2009 10:23:24 -0500 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk><49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu><4981EAEC.4070508@compbio.dundee.ac.uk> <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net> Having such a webapp would be pretty cool, and I agree with the argument below that there are numerous small groups or individuals with this need. (we have some ourselves here ...) One word of caution as to where to look for lessons I think is the infamous GMOD gene page and standard web front-end, which has been labored on in various incarnations for more than half a decade, without producing a compelling and broadly adopted result. People's needs and technology obsessions vary from place to place. One possibly hugely complicating factor for the GMOD web front-end was that the target audience were model organism websites, which themselves have a large and diverse stakeholder community, so flexibility and configurability became overriding requirements resulting in bloat of code stacks and features. My personal take is that for this to be broadly useful, the primary target audience should probably be programmers, or programming-savvy scientists, who can extend and customize a core application at will. In other words, much in line with the philosophy behind the Bio* libraries. Other than that, keep it simple so I don't have to learn yet another (namely your templating or clever XML configuration scheme) language to extend it. I sat next to Mark when he generated a bare-bones BioSQL- binding in EJB literally in minutes, which I thought was cool. People rave about Ruby and RoR too as for ease of getting started. By far the most people out there will be familiar with Perl, but I'm not sure what the web application framework would be there that would put me at ease. In the end what may count more than anything else is critical mass even if it's not everyone's darling language. My $0.02, and I'd be keen so see what comes out of this. If there's something I can do to tip the balance towards something tangible happening, let me know. -hilmar On Jan 30, 2009, at 6:03 AM, michael watson (IAH-C) wrote: > Dear All > > Thank you for the responses. I think it is clear there is a need - > all > over the World there are groups of various sizes who try to collate > and > curate sequences for their organism of choice, from fish virus > databases > with 200 records, to flu databases with many thousands. I'm in > contact > with a tiny percentage of these groups, and there is a clear need for: > > - common DB schema (tick, we can use BioSQL) > - Web app for: > - submitting new sequences > - curating and editing sequences > - comparing sequences - align, draw trees etc > - showing sequences on maps (i.e. location of sample) > - submitting sequences to GenBank > - retrieving sequences from GenBank > > With all of the Bio* projects, this shouldn't be too hard to do, but > as > ever it needs bodies to do it... I took a quick look at Galaxy but > that > isn't really what was needed. > > Thanks again > > Mick > > -----Original Message----- > From: biosql-l-bounces at lists.open-bio.org > [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: 29 January 2009 18:45 > To: James Procter > Cc: biosql-l at lists.open-bio.org > Subject: Re: [BioSQL-l] Web front-ends to BioSQL > > > On Jan 29, 2009, at 11:44 AM, James Procter wrote: > >> >> Chris Fields wrote: >>> Gbrowse, maybe? There is a BioSQL plugin for it >>> (Bio::DB::Das::BioSQL): >>> >>> http://gmod.org/wiki/GBrowse#About_Databases >> I'm also in the market for a quick and easy front end - from what >> I've >> heard from a colleague, GBrowse can be tricky to install. Also - for >> my >> application we'd like to easily gather sets of proteins and then >> explore >> their annotation. This is a little out of the scope of GBrowse. > > I don't find Gbrowse itself tricky as much as getting BioPerl > installed. One can use Gbrowse for what you want but there are > probably better resources (Ensembl, maybe). > > chris > >> I think there might be a niche needing filling here - would anyone be >> interested in pooling code/resources ? >> >> Jim. >> >> -- >> ------------------------------------------------------------------- >> J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group >> Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk >> The University of Dundee is a Scottish Registered Charity, No. >> SC015096. >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Fri Jan 30 19:45:30 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 30 Jan 2009 13:45:30 -0600 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk><49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu><4981EAEC.4070508@compbio.dundee.ac.uk> <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> <903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net> Message-ID: <5B046A75-AFD3-4CEB-B190-A27106828E9C@illinois.edu> On Jan 30, 2009, at 9:23 AM, Hilmar Lapp wrote: > Having such a webapp would be pretty cool, and I agree with the > argument below that there are numerous small groups or individuals > with this need. (we have some ourselves here ...) > > One word of caution as to where to look for lessons I think is the > infamous GMOD gene page and standard web front-end, which has been > labored on in various incarnations for more than half a decade, > without producing a compelling and broadly adopted result. People's > needs and technology obsessions vary from place to place. > > One possibly hugely complicating factor for the GMOD web front-end > was that the target audience were model organism websites, which > themselves have a large and diverse stakeholder community, so > flexibility and configurability became overriding requirements > resulting in bloat of code stacks and features. > > My personal take is that for this to be broadly useful, the primary > target audience should probably be programmers, or programming-savvy > scientists, who can extend and customize a core application at will. > In other words, much in line with the philosophy behind the Bio* > libraries. > > Other than that, keep it simple so I don't have to learn yet another > (namely your templating or clever XML configuration scheme) language > to extend it. I sat next to Mark when he generated a bare-bones > BioSQL-binding in EJB literally in minutes, which I thought was > cool. People rave about Ruby and RoR too as for ease of getting > started. By far the most people out there will be familiar with > Perl, but I'm not sure what the web application framework would be > there that would put me at ease. In the end what may count more than > anything else is critical mass even if it's not everyone's darling > language. Perl web application framework: Catalyst and Jifty (have not tried them myself). RoR gets a lot of press, but I understand the RoR devs tend not to listen to the core ruby devs and (as a consequence) had recently run into issues with the 1.8.7 ruby release, detailed by the always-entertaining chromatic here: http://use.perl.org/~chromatic/journal/37125 chris > My $0.02, and I'd be keen so see what comes out of this. If there's > something I can do to tip the balance towards something tangible > happening, let me know. > > -hilmar From gthorisson at gmail.com Fri Jan 30 19:57:42 2009 From: gthorisson at gmail.com (Gudmundur A. Thorisson) Date: Fri, 30 Jan 2009 19:57:42 +0000 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <5B046A75-AFD3-4CEB-B190-A27106828E9C@illinois.edu> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk><49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu><4981EAEC.4070508@compbio.dundee.ac.uk> <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> <903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net> <5B046A75-AFD3-4CEB-B190-A27106828E9C@illinois.edu> Message-ID: We use Catalyst MVC framework for our project (http:// www.hgvbaseg2p.org). Very good stuff, we combine it with the DBIx::Class ORM and Template Toolkit as the templating engine. Totally recommended. Mummi On 30 Jan 2009, at 19:45, Chris Fields wrote: >> > > Perl web application framework: Catalyst and Jifty (have not tried > them myself). RoR gets a lot of press, but I understand the RoR > devs tend not to listen to the core ruby devs and (as a consequence) > had recently run into issues with the 1.8.7 ruby release, detailed > by the always-entertaining chromatic here: > > http://use.perl.org/~chromatic/journal/37125 > > chris > >> My $0.02, and I'd be keen so see what comes out of this. If there's >> something I can do to tip the balance towards something tangible >> happening, let me know. >> >> -hilmar > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From cjfields at illinois.edu Fri Jan 30 20:08:11 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 30 Jan 2009 14:08:11 -0600 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk><49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu><4981EAEC.4070508@compbio.dundee.ac.uk> <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> <903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net> <5B046A75-AFD3-4CEB-B190-A27106828E9C@illinois.edu> Message-ID: <99475964-CFB3-4A27-8024-8A14876533E0@illinois.edu> Another article (as pointed out by Heikki on bioperl-l): http://www.heise-online.co.uk/open/Healthcheck-Perl-The-Perl-Future--/features/112388/0 The last section is all on MVC-oriented frameworks. chris On Jan 30, 2009, at 1:57 PM, Gudmundur A. Thorisson wrote: > We use Catalyst MVC framework for our project (http://www.hgvbaseg2p.org > ). Very good stuff, we combine it with the DBIx::Class ORM and > Template Toolkit as the templating engine. Totally recommended. > > > Mummi > > On 30 Jan 2009, at 19:45, Chris Fields wrote: >>> >> >> Perl web application framework: Catalyst and Jifty (have not tried >> them myself). RoR gets a lot of press, but I understand the RoR >> devs tend not to listen to the core ruby devs and (as a >> consequence) had recently run into issues with the 1.8.7 ruby >> release, detailed by the always-entertaining chromatic here: >> >> http://use.perl.org/~chromatic/journal/37125 >> >> chris >> >>> My $0.02, and I'd be keen so see what comes out of this. If >>> there's something I can do to tip the balance towards something >>> tangible happening, let me know. >>> >>> -hilmar >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l From markjschreiber at gmail.com Sat Jan 31 11:03:53 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 31 Jan 2009 19:03:53 +0800 Subject: [BioSQL-l] Web front-ends to BioSQL In-Reply-To: <99475964-CFB3-4A27-8024-8A14876533E0@illinois.edu> References: <8975119BCD0AC5419D61A9CF1A923E9507E270EF@iahce2ksrv1.iah.bbsrc.ac.uk> <49DFF09F-8169-4D40-94FB-CDCDFC330E82@illinois.edu> <4981EAEC.4070508@compbio.dundee.ac.uk> <982A9E86-4CEA-428C-AF0E-5065C2036C91@illinois.edu> <8975119BCD0AC5419D61A9CF1A923E9507E2711C@iahce2ksrv1.iah.bbsrc.ac.uk> <903901EE-777B-43A8-9CDC-ED400B3E60BB@gmx.net> <5B046A75-AFD3-4CEB-B190-A27106828E9C@illinois.edu> <99475964-CFB3-4A27-8024-8A14876533E0@illinois.edu> Message-ID: <93b45ca50901310303t37905e8ak3819c05f4b94c287@mail.gmail.com> Hi - My feeling is that the diversity of languages and frameworks within languages would mean that a generic web front end to BioSQL will and should never materialize. What would be a lot more sensible is a generic API in the form of a webservice or collection of webservices that could be used by (theoretically) any web frame work to generate a website. User preferences and requirements will be far too diverse for a generic web front end. - Mark On 1/31/09, Chris Fields wrote: > Another article (as pointed out by Heikki on bioperl-l): > > http://www.heise-online.co.uk/open/Healthcheck-Perl-The-Perl-Future--/features/112388/0 > > The last section is all on MVC-oriented frameworks. > > chris > > On Jan 30, 2009, at 1:57 PM, Gudmundur A. Thorisson wrote: > >> We use Catalyst MVC framework for our project (http://www.hgvbaseg2p.org >> ). Very good stuff, we combine it with the DBIx::Class ORM and >> Template Toolkit as the templating engine. Totally recommended. >> >> >> Mummi >> >> On 30 Jan 2009, at 19:45, Chris Fields wrote: >>>> >>> >>> Perl web application framework: Catalyst and Jifty (have not tried >>> them myself). RoR gets a lot of press, but I understand the RoR >>> devs tend not to listen to the core ruby devs and (as a >>> consequence) had recently run into issues with the 1.8.7 ruby >>> release, detailed by the always-entertaining chromatic here: >>> >>> http://use.perl.org/~chromatic/journal/37125 >>> >>> chris >>> >>>> My $0.02, and I'd be keen so see what comes out of this. If >>>> there's something I can do to tip the balance towards something >>>> tangible happening, let me know. >>>> >>>> -hilmar >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l >