From jimp at compbio.dundee.ac.uk Thu Oct 1 09:20:12 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Thu, 01 Oct 2009 14:20:12 +0100 Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL In-Reply-To: <4ABB6FDC.2020007@compbio.dundee.ac.uk> References: <4ABB4866.9060200@compbio.dundee.ac.uk> <4ABB6FDC.2020007@compbio.dundee.ac.uk> Message-ID: <4AC4AC8C.8070105@compbio.dundee.ac.uk> Hello all. Here's my review of plone4bio+Biosql. Thanks to Peter and Michael who sent me encouraging emails - sorry it took so long to post! Finally, please accept my apologies in advance for any unnecessary rambling... and for my cross-posting to p4bio and biosql-l. Installing Plone4Bio -------------------- This basically went according to the instructions, except for two issues: 1. I experienced some problems accessing some python egg repositories, and had to manually download and build one module before adding it to the buildout (python build system) configuration. This was possibly related to our local network config, since Ivan Rossi couldn't reproduce the problem. 2. Once the download/build/plone-instance generation steps were finished the plone server instance that had been built took way too long to launch. The installation was running off a directory hosted on our SAN, and I decided the delay was probably due to the large number of files needed by plone. I ended up moving the whole install onto a locally attached disk to minimise the time spent statting the files on a network. In that config, the server comes up after around 40-60 secs on a lightly loaded Opteron. Adding a biosql database and browsing ------------------------------------- It was easy to add connections to a local biosql database - even for a plone admin novice like myself. All you need is to know how to form the appropriate python database connector URI - however, a minor patch to the site's help text is needed to remind certain forgetful users (me) how to put the database user's password in the ODBC (?) string. Once added, I could access the source and browse through my bioentry sequences via the same list interface as shown in the demo. Clicking on a sequence link gave me the same five tabs (annotation, features, dbxrefs, sequence, references) as in the demo. However, here is where I noticed some issues which I've logged on the plone4bio trac: * issue #1: Plone4bio uses the bioentry_id primary key as the main identifier for the bioentry, rather than its accession. E.g. a sequence's plon4bio record has the URL http://myploneserver/plone4bio/mybiosqlsource/bioentry.database/bioentry.id As people on the list will know, the bioentry ID primary key is autogenerated and only really for internal consumption. Using it as the primary identifier means it's not possible to link directly to a sequence's page if you only know its bioentry database and accession. * issue #2: The imagemap shown under the 'Features' tab is generated using bioperl from a genbank file emitted by biopython. This is a flaw, and means lots of info is lost (my biosql db is used to serve protein sequence DAS annotation, so it has URLs, scores, and lots of notes). I had to hack this script to cope with feature labels that contain spaces in order for the intervals to display correctly (otherwise they get a start of '-1'). I'd recommend that the image generator is modified to use a less restrictive format, and/or made easily pluggable to allow other feature renderers to be used (perhaps even something like dasty). * issue #3: The search box doesn't search BioSQL datasources. No idea how hard this would be to fix, but a little plone knowledge probably required. This was a bit of a killer for me - I was hoping for a basic search interface that worked out of the box, allowing me to focus on providing more advanced queries. As it is, I don't have the time at this moment to fix this issue myself. Suggested Enhancements ---------------------- The Biosql/GenBank data format transformation is an easily fixed bug in the current plone4bio version, but it stopped me exploring the das/biojava/bioperl/biopython interoperation issues any further. However, it also revealed a few aspects of the plone4bio architecture that might need thinking about: 1. pluggable feature rendering tools - potentially use the biosql connection directly (already said) 2. easily configured database cross-reference linkout URLs. Typically, its bad form to hard-code URLs within a biosql database, and plone4bio has its own set of URLs that it decorates dbxrefs with. However, these are currently buried inside the plone4bio python code, but they could be configured via a flatfile or even via the web interface. In summary... ------------- This process took far longer than I'd expected, and the slow install and startup time gave me the impression that plone is a heavyweight solution that may not have sufficient performance for high-volume situations (I'm sure I'm wrong here). The functionality available at the time of writing is not enough for my purposes - but it is a good starting point (particularly if you know how to develop in plone). However - if issues 1,2 and 3 were resolved, and the default .cfg scripts were made more robust and slightly better commented for python-n00bs like myself, then plone4bio would certainly be worth installing to provide basic biosql datasource browsing for your lab or institute. thats all folks! Jim. -- ------------------------------------------------------------------- J. B. Procter (Jalview/ENFIN) Barton Bioinformatics Research Group Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk The University of Dundee is a Scottish Registered Charity, No. SC015096. From biopython at maubp.freeserve.co.uk Thu Oct 1 10:38:39 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 1 Oct 2009 15:38:39 +0100 Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL In-Reply-To: <4AC4AC8C.8070105@compbio.dundee.ac.uk> References: <4ABB4866.9060200@compbio.dundee.ac.uk> <4ABB6FDC.2020007@compbio.dundee.ac.uk> <4AC4AC8C.8070105@compbio.dundee.ac.uk> Message-ID: <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com> Thanks for the report James! On Thu, Oct 1, 2009 at 2:20 PM, James Procter wrote: > > * issue #2: The imagemap shown under the 'Features' tab is generated using > bioperl from a genbank file emitted by biopython. This is a flaw, and means > lots of info is lost (my biosql db is used to serve protein > sequence DAS annotation, so it has URLs, scores, and lots of notes). That is a curious and round about way of doing things, with many data transformations risking loosing things at each point. It would be possible to use Biopython's GenomeDiagram module to draw the image directly (although the style and capabilities would differ). I've done this for an in house TurboGears based BioSQL front end, and it was fine for prokaryotic organisms. Another more elegant alternative would be to call a BioPerl script which talks to the BioSQL database directly to get the data to draw the image. Can you point me at the relevant files in Plone4bio to see their code? I agree with your general point that a pluggable rendering option might be best, but that would be a question for the Plone4bio team to debate. Peter [Biopython Project] From jimp at compbio.dundee.ac.uk Thu Oct 1 11:08:21 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Thu, 01 Oct 2009 16:08:21 +0100 Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL In-Reply-To: <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com> References: <4ABB4866.9060200@compbio.dundee.ac.uk> <4ABB6FDC.2020007@compbio.dundee.ac.uk> <4AC4AC8C.8070105@compbio.dundee.ac.uk> <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com> Message-ID: <4AC4C5E5.3000508@compbio.dundee.ac.uk> Peter wrote: > Thanks for the report James! :) > On Thu, Oct 1, 2009 at 2:20 PM, James Procter wrote: >> * issue #2: The imagemap shown under the 'Features' tab is generated using >> bioperl from a genbank file emitted by biopython. This is a flaw, and means >> lots of info is lost (my biosql db is used to serve protein >> sequence DAS annotation, so it has URLs, scores, and lots of notes). > > That is a curious and round about way of doing things, with many > data transformations risking loosing things at each point. I can understand why it was done - if you already have an image renderer that eats genbank, its the shortest path :) > It would be possible to use Biopython's GenomeDiagram module to > draw the image directly (although the style and capabilities would > differ). I've done this for an in house TurboGears based BioSQL > front end, and it was fine for prokaryotic organisms. Sounds good... I was sure there was a python way to go here. Happy to test any alternative you can provide ;) > Another more elegant alternative would be to call a BioPerl script which > talks to the BioSQL database directly to get the data to draw the image. definitely. It does incur the overhead of creating a new database connection and instantiating another object representation of the same biosql records, the latter isn't really a problem but the former could have scalabilty implications. > Can you point me at the relevant files in Plone4bio to see their code? > I agree with your general point that a pluggable rendering option might > be best, but that would be a question for the Plone4bio team to debate. The bioperl bits that generate images/maps for genbank files are here: https://www.plone4bio.org/trac/browser/plone4bio.base/trunk/src/plone4bio/base/png/perl The python that does the piping is here: https://www.plone4bio.org/trac/browser/plone4bio.base/trunk/src/plone4bio/base/png/seqrecord.py Looking at the code again, I can see that there are well defined interfaces - so in principle, plugging in other instances should be fairly easy. My issues are here: Patch for genbank parser: https://www.plone4bio.org/trac/ticket/8 Enhancement for Pluggable Renderers : https://www.plone4bio.org/trac/ticket/9 hope it helps! Jim. From ivan at biodec.com Thu Oct 1 11:22:13 2009 From: ivan at biodec.com (Ivan Rossi) Date: Thu, 1 Oct 2009 17:22:13 +0200 (CEST) Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL In-Reply-To: <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com> References: <4ABB4866.9060200@compbio.dundee.ac.uk> <4ABB6FDC.2020007@compbio.dundee.ac.uk> <4AC4AC8C.8070105@compbio.dundee.ac.uk> <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com> Message-ID: On Thu, 1 Oct 2009, Peter wrote: > Thanks for the report James! > > On Thu, Oct 1, 2009 at 2:20 PM, James Procter wrote: >> >> * issue #2: The imagemap shown under the 'Features' tab is generated using >> bioperl from a genbank file emitted by biopython. This is a flaw, and means >> lots of info is lost (my biosql db is used to serve protein >> sequence DAS annotation, so it has URLs, scores, and lots of notes). > > That is a curious and round about way of doing things, with many > data transformations risking loosing things at each point. > > It would be possible to use Biopython's GenomeDiagram module to > draw the image directly (although the style and capabilities would > differ). I've done this for an in house TurboGears based BioSQL > front end, and it was fine for prokaryotic organisms. Hello Peter, happy that you are now on p4b too and not just many of us on biopython &;-) We plan to remove the Bioperl-graphics option at some time, since we already need biopython for many things, and we are aware it is somewhat a kludge. Furthermore a full python implementation will be well-integrated within Zope HOWEVER there are valid technical reasons for that, the main one being that Bioperl-graphics is VERY advanced compared, in particular it automatically handles clashes of features lines and text, and map support. (click on a feature line to show a feature summary). They were not available at the time we evaluated GenomeDiagram (at the time it was not even in the standard distribution but just within Biopython CVS). And clashes-handling is a VERY DESIRABLE FEATURE if you always want readable images when you have lots of features of the same kind. > Another more elegant alternative would be to call a BioPerl script which > talks to the BioSQL database directly to get the data to draw the image. > > Can you point me at the relevant files in Plone4bio to see their code? > I agree with your general point that a pluggable rendering option might > be best, but that would be a question for the Plone4bio team to debate. Pluggable rendering option will be GREAT. As I said above we think that having mixed-language code is a problem, and we would like a pure-python implementation. Actually we started evaluation of genometools graphics too (see http://genometools.org/annotationsketch.html) since it has python bindings and looks nice, but other company priorities stalled it. (read: we have to provide a working solution to a customer NOW) We are open to contribution. Ivan -- Ivan Rossi, PhD - ivan AT biodec dot com BioDec Srl, Via Calzavecchio 20/2, 40033 Casalecchio di Reno (BO), Italy Phone: (+39)-051-0548263 - Fax: (+39)-051-7459582 - http://www.biodec.com From biopython at maubp.freeserve.co.uk Thu Oct 1 11:32:01 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 1 Oct 2009 16:32:01 +0100 Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL In-Reply-To: <4AC4C5E5.3000508@compbio.dundee.ac.uk> References: <4ABB4866.9060200@compbio.dundee.ac.uk> <4ABB6FDC.2020007@compbio.dundee.ac.uk> <4AC4AC8C.8070105@compbio.dundee.ac.uk> <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com> <4AC4C5E5.3000508@compbio.dundee.ac.uk> Message-ID: <320fb6e00910010832q4290259bh3b9cead501ebb6f5@mail.gmail.com> James wrote: > Peter wrote: >> James wrote: >>> >>> * issue #2: The imagemap shown under the 'Features' tab is generated >>> using bioperl from a genbank file emitted by biopython. This is a flaw, >>> and means lots of info is lost (my biosql db is used to serve protein >>> sequence DAS annotation, so it has URLs, scores, and lots of notes). >> >> That is a curious and round about way of doing things, with many >> data transformations risking loosing things at each point. > > I can understand why it was done - if you already have an image renderer > that eats genbank, its the shortest path :) That could easily be the case. >> It would be possible to use Biopython's GenomeDiagram module to >> draw the image directly (although the style and capabilities would >> differ). I've done this for an in house TurboGears based BioSQL >> front end, and it was fine for prokaryotic organisms. > > Sounds good... I was sure there was a python way to go here. Happy > to test any alternative you can provide ;) Without me installing my own copy of Plone4Bio, I would at least need to see a sample PNG image to try and mimic. However, from Ivan's email it sounds like they need features we don't currently support. >> Another more elegant alternative would be to call a BioPerl script which >> talks to the BioSQL database directly to get the data to draw the image. > > definitely. It does incur the overhead of creating a new database connection > and instantiating another object representation of the same biosql records, > the latter isn't really a problem but the former could have scalabilty > implications. It does look like Plone already has a Biopython (DB)SeqRecord object in memory, so yes, constructing the BioPerl equivalent from the database might be a bit of a waste. >> Can you point me at the relevant files in Plone4bio to see their code? >> I agree with your general point that a pluggable rendering option might >> be best, but that would be a question for the Plone4bio team to debate. > > The bioperl bits that generate images/maps for genbank files are here: > https://www.plone4bio.org/trac/browser/plone4bio.base/trunk/src/plone4bio/base/png/perl > > The python that does the piping is here: > https://www.plone4bio.org/trac/browser/plone4bio.base/trunk/src/plone4bio/base/png/seqrecord.py > > Looking at the code again, I can see that there are well defined interfaces > - so in principle, plugging in other instances should be fairly easy. I'm sure they could also spit out the database primary keys, and pass that the BioPerl script which can use bioperl-db to talk to the database. It may be that the current GenBank file route is faster though ;) > My issues are here: > Patch for genbank parser: https://www.plone4bio.org/trac/ticket/8 Could you attach an example of the problem GenBank files being generated? Before we blame the BioPerl parser, we should check that Biopython is producing n valid GenBank file. Off hand, I'm not sure if feature types are allowed to have spaces in them for example. James - how was your BioSQL database populated? Peter From biopython at maubp.freeserve.co.uk Thu Oct 1 11:46:28 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 1 Oct 2009 16:46:28 +0100 Subject: [BioSQL-l] [P4b] Plone4bio 1.0 and BioSQL In-Reply-To: References: <4ABB4866.9060200@compbio.dundee.ac.uk> <4ABB6FDC.2020007@compbio.dundee.ac.uk> <4AC4AC8C.8070105@compbio.dundee.ac.uk> <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com> Message-ID: <320fb6e00910010846n138b55acmbba881f1e59b80c8@mail.gmail.com> On Thu, Oct 1, 2009 at 4:22 PM, Ivan Rossi wrote: > > Hello Peter, happy that you are now on p4b too and not just many of us on > biopython &;-) Hi! Looking at the Plone4bio.org website I was surprised to see no mention of BioSQL, Biopython on the install page (while BioPerl is mentioned): http://plone4bio.org/trac/wiki/Install I did eventually find the Manifesto page, but think it could be a little more prominent (or even merged into the main page?) http://plone4bio.org/trac/wiki/Manifesto >From the way you are using Biopython for GenBank output, I guess you need at least Biopython 1.51 for the feature support, but other than that I am not clear how extensively Biopython is used. P.S. The Manifesto page has a broken link to Plone (first paragraph), and officially it should be Biopython not BioPython. > We plan to remove the Bioperl-graphics option at some time, since we > already need biopython for many things, and we are aware it is somewhat a > kludge. Furthermore a full python implementation will be well-integrated > within Zope HOWEVER there are valid technical reasons for that, the main > one being that Bioperl-graphics is VERY advanced compared, in particular it > automatically handles clashes of features lines and text, and map support. > (click on a feature line to show a feature summary). They were not > available at the time we evaluated GenomeDiagram (at the time it was not > even in the standard distribution but just within Biopython CVS). And > clashes-handling is a VERY DESIRABLE FEATURE if you always want readable > images when you have lots of features of the same kind. For now, GenomeDiagram requires you to put features on different tracks explicitly to avoid overlaps or clashes. Not ideal for your needs. What did you mean by "map support"? ReportLab's trunk (i.e. pre 2.4) has good SVG output, and I have been meaning to contribute basic HTML image map output to them. Either of these can be used with Biopython's GenomeDiagram and a tiny patch to make diagrams with click-able features. This worked pretty well I found. Peter From michael.watson at bbsrc.ac.uk Fri Oct 2 06:15:32 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Fri, 2 Oct 2009 11:15:32 +0100 Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL In-Reply-To: <4AC4AC8C.8070105@compbio.dundee.ac.uk> References: <4ABB4866.9060200@compbio.dundee.ac.uk> <4ABB6FDC.2020007@compbio.dundee.ac.uk> <4AC4AC8C.8070105@compbio.dundee.ac.uk> Message-ID: <8D08960C647E64438CE5740657CBBDC5010D2AC730@iahcexch1.iah.bbsrc.ac.uk> Hi Jim Thanks for that. I think this has real potential, but I am luke warm about the sequence images - I am not sure I would need them in this context. Could you expand on this but? * issue #3: The search box doesn't search BioSQL datasources. No idea how hard this would be to fix, but a little plone knowledge probably required. So the search box doesn't do the equivalent of a full text search of the BioSQL database? Mick -----Original Message----- From: biosql-l-bounces at lists.open-bio.org [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of James Procter Sent: 01 October 2009 14:20 To: BioSQL-l at lists.open-bio.org Cc: Plone4Bio mailing list Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL Hello all. Here's my review of plone4bio+Biosql. Thanks to Peter and Michael who sent me encouraging emails - sorry it took so long to post! Finally, please accept my apologies in advance for any unnecessary rambling... and for my cross-posting to p4bio and biosql-l. Installing Plone4Bio -------------------- This basically went according to the instructions, except for two issues: 1. I experienced some problems accessing some python egg repositories, and had to manually download and build one module before adding it to the buildout (python build system) configuration. This was possibly related to our local network config, since Ivan Rossi couldn't reproduce the problem. 2. Once the download/build/plone-instance generation steps were finished the plone server instance that had been built took way too long to launch. The installation was running off a directory hosted on our SAN, and I decided the delay was probably due to the large number of files needed by plone. I ended up moving the whole install onto a locally attached disk to minimise the time spent statting the files on a network. In that config, the server comes up after around 40-60 secs on a lightly loaded Opteron. Adding a biosql database and browsing ------------------------------------- It was easy to add connections to a local biosql database - even for a plone admin novice like myself. All you need is to know how to form the appropriate python database connector URI - however, a minor patch to the site's help text is needed to remind certain forgetful users (me) how to put the database user's password in the ODBC (?) string. Once added, I could access the source and browse through my bioentry sequences via the same list interface as shown in the demo. Clicking on a sequence link gave me the same five tabs (annotation, features, dbxrefs, sequence, references) as in the demo. However, here is where I noticed some issues which I've logged on the plone4bio trac: * issue #1: Plone4bio uses the bioentry_id primary key as the main identifier for the bioentry, rather than its accession. E.g. a sequence's plon4bio record has the URL http://myploneserver/plone4bio/mybiosqlsource/bioentry.database/bioentry.id As people on the list will know, the bioentry ID primary key is autogenerated and only really for internal consumption. Using it as the primary identifier means it's not possible to link directly to a sequence's page if you only know its bioentry database and accession. * issue #2: The imagemap shown under the 'Features' tab is generated using bioperl from a genbank file emitted by biopython. This is a flaw, and means lots of info is lost (my biosql db is used to serve protein sequence DAS annotation, so it has URLs, scores, and lots of notes). I had to hack this script to cope with feature labels that contain spaces in order for the intervals to display correctly (otherwise they get a start of '-1'). I'd recommend that the image generator is modified to use a less restrictive format, and/or made easily pluggable to allow other feature renderers to be used (perhaps even something like dasty). * issue #3: The search box doesn't search BioSQL datasources. No idea how hard this would be to fix, but a little plone knowledge probably required. This was a bit of a killer for me - I was hoping for a basic search interface that worked out of the box, allowing me to focus on providing more advanced queries. As it is, I don't have the time at this moment to fix this issue myself. Suggested Enhancements ---------------------- The Biosql/GenBank data format transformation is an easily fixed bug in the current plone4bio version, but it stopped me exploring the das/biojava/bioperl/biopython interoperation issues any further. However, it also revealed a few aspects of the plone4bio architecture that might need thinking about: 1. pluggable feature rendering tools - potentially use the biosql connection directly (already said) 2. easily configured database cross-reference linkout URLs. Typically, its bad form to hard-code URLs within a biosql database, and plone4bio has its own set of URLs that it decorates dbxrefs with. However, these are currently buried inside the plone4bio python code, but they could be configured via a flatfile or even via the web interface. In summary... ------------- This process took far longer than I'd expected, and the slow install and startup time gave me the impression that plone is a heavyweight solution that may not have sufficient performance for high-volume situations (I'm sure I'm wrong here). The functionality available at the time of writing is not enough for my purposes - but it is a good starting point (particularly if you know how to develop in plone). However - if issues 1,2 and 3 were resolved, and the default .cfg scripts were made more robust and slightly better commented for python-n00bs like myself, then plone4bio would certainly be worth installing to provide basic biosql datasource browsing for your lab or institute. thats all folks! Jim. -- ------------------------------------------------------------------- J. B. Procter (Jalview/ENFIN) Barton Bioinformatics Research Group Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk The University of Dundee is a Scottish Registered Charity, No. SC015096. _______________________________________________ BioSQL-l mailing list BioSQL-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biosql-l From ivan at biodec.com Tue Oct 6 10:33:36 2009 From: ivan at biodec.com (Ivan Rossi) Date: Tue, 6 Oct 2009 16:33:36 +0200 (CEST) Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL In-Reply-To: <8D08960C647E64438CE5740657CBBDC5010D2AC730@iahcexch1.iah.bbsrc.ac.uk> References: <4ABB4866.9060200@compbio.dundee.ac.uk> <4ABB6FDC.2020007@compbio.dundee.ac.uk> <4AC4AC8C.8070105@compbio.dundee.ac.uk> <8D08960C647E64438CE5740657CBBDC5010D2AC730@iahcexch1.iah.bbsrc.ac.uk> Message-ID: On Fri, 2 Oct 2009, michael watson (IAH-C) wrote: > Hi Jim > > Thanks for that. I think this has real potential, but I am luke warm > about the sequence images - I am not sure I would need them in this > context. Then do not use them. &;-) You also have the hidden-able text tables below But your comment gave me the opportunity to point out (someone asked it off-list) that they are generated on-the-fly as needed, so that they do not eat unnecessary space on the server. Nonetheless many people think that the feature images are really neat and useful. That's why you find this kind of images on genome browsers... > Could you expand on this but? > > * issue #3: The search box doesn't search BioSQL datasources. No idea > how hard this would be to fix, but a little plone knowledge probably > required. > > So the search box doesn't do the equivalent of a full text search of the > BioSQL database? I can assure you that IT DOES. try http://p4bdemo.biodec.com for yourself I already told Jim off-list: if live-search does not work means that indexing died at some point. It appears that, at some point during indexing, Plone search engine has a relatively large need for RAM. To index human CDS entries from NCBI and human proteins from Uniprot on p4bdemo.biodec.com we used a virtual machine with 4GB of RAM, so I would say that seems to be the requirement to load a complete euchariotic genome. Could be less. BTW it also took a couple of hours, so do not expect immediate availability of livesearch when you add tens of thousands of sequences. On the contrary browsing is immediately available. I am going to add this information to the Plone4bio wiki in the installation requirements. Anyway I suggest that you test your install using something smaller, such as a bacterial proteome, to verify that everything is up and running, before attacking large databases. More about indexing later, on p4b ML. Ivan > > Mick > > -----Original Message----- > From: biosql-l-bounces at lists.open-bio.org [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of James Procter > Sent: 01 October 2009 14:20 > To: BioSQL-l at lists.open-bio.org > Cc: Plone4Bio mailing list > Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL > > > Hello all. > > Here's my review of plone4bio+Biosql. Thanks to Peter and Michael who > sent me encouraging emails - sorry it took so long to post! Finally, > please accept my apologies in advance for any unnecessary rambling... > and for my cross-posting to p4bio and biosql-l. > > Installing Plone4Bio > -------------------- > This basically went according to the instructions, except for two issues: > 1. I experienced some problems accessing some python egg repositories, > and had to manually download and build one module before adding it to > the buildout (python build system) configuration. This was possibly > related to our local network config, since Ivan Rossi couldn't reproduce > the problem. > > 2. Once the download/build/plone-instance generation steps were > finished the plone server instance that had been built took way too long > to launch. The installation was running off a directory hosted on our > SAN, and I decided the delay was probably due to the large number of > files needed by plone. I ended up moving the whole install onto a > locally attached disk to minimise the time spent statting the files on a > network. In that config, the server comes up after around 40-60 secs on > a lightly loaded Opteron. > > > Adding a biosql database and browsing > ------------------------------------- > It was easy to add connections to a local biosql database - even for a > plone admin novice like myself. All you need is to know how to form the > appropriate python database connector URI - however, a minor patch to > the site's help text is needed to remind certain forgetful users (me) > how to put the database user's password in the ODBC (?) string. > > Once added, I could access the source and browse through my bioentry > sequences via the same list interface as shown in the demo. Clicking on > a sequence link gave me the same five tabs (annotation, features, > dbxrefs, sequence, references) as in the demo. However, here is where I > noticed some issues which I've logged on the plone4bio trac: > > * issue #1: Plone4bio uses the bioentry_id primary key as the main > identifier for the bioentry, rather than its accession. E.g. a > sequence's plon4bio record has the URL > http://myploneserver/plone4bio/mybiosqlsource/bioentry.database/bioentry.id > > As people on the list will know, the bioentry ID primary key is > autogenerated and only really for internal consumption. Using it as the > primary identifier means it's not possible to link directly to a > sequence's page if you only know its bioentry database and accession. > > * issue #2: The imagemap shown under the 'Features' tab is generated > using bioperl from a genbank file emitted by biopython. This is a flaw, > and means lots of info is lost (my biosql db is used to serve protein > sequence DAS annotation, so it has URLs, scores, and lots of notes). > > I had to hack this script to cope with feature labels that contain > spaces in order for the intervals to display correctly (otherwise they > get a start of '-1'). I'd recommend that the image generator is modified > to use a less restrictive format, and/or made easily pluggable to allow > other feature renderers to be used (perhaps even something like dasty). > > * issue #3: The search box doesn't search BioSQL datasources. No idea > how hard this would be to fix, but a little plone knowledge probably > required. > > This was a bit of a killer for me - I was hoping for a basic search > interface that worked out of the box, allowing me to focus on providing > more advanced queries. As it is, I don't have the time at this moment to > fix this issue myself. > > Suggested Enhancements > ---------------------- > The Biosql/GenBank data format transformation is an easily fixed bug in > the current plone4bio version, but it stopped me exploring the > das/biojava/bioperl/biopython interoperation issues any further. > However, it also revealed a few aspects of the plone4bio architecture > that might need thinking about: > > 1. pluggable feature rendering tools - potentially use the biosql > connection directly (already said) > 2. easily configured database cross-reference linkout URLs. Typically, > its bad form to hard-code URLs within a biosql database, and plone4bio > has its own set of URLs that it decorates dbxrefs with. However, these > are currently buried inside the plone4bio python code, but they could be > configured via a flatfile or even via the web interface. > > > In summary... > ------------- > This process took far longer than I'd expected, and the slow install and > startup time gave me the impression that plone is a heavyweight solution > that may not have sufficient performance for high-volume situations (I'm > sure I'm wrong here). > > The functionality available at the time of writing is not enough for my > purposes - but it is a good starting point (particularly if you know how > to develop in plone). However - if issues 1,2 and 3 were resolved, and > the default .cfg scripts were made more robust and slightly better > commented for python-n00bs like myself, then plone4bio would certainly > be worth installing to provide basic biosql datasource browsing for your > lab or institute. > > thats all folks! > Jim. > > -- > ------------------------------------------------------------------- > J. B. Procter (Jalview/ENFIN) Barton Bioinformatics Research Group > Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk > The University of Dundee is a Scottish Registered Charity, No. SC015096. > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- Ivan Rossi, PhD - ivan AT biodec dot com, ivan dot rossi3 AT unibo dot it BioDec Srl, Via Calzavecchio 20/2, 40033 Casalecchio di Reno (BO), Italy Phone: (+39)-051-0548263 - Fax: (+39)-051-7459582 - http://www.biodec.com From jimp at compbio.dundee.ac.uk Thu Oct 1 13:20:12 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Thu, 01 Oct 2009 14:20:12 +0100 Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL In-Reply-To: <4ABB6FDC.2020007@compbio.dundee.ac.uk> References: <4ABB4866.9060200@compbio.dundee.ac.uk> <4ABB6FDC.2020007@compbio.dundee.ac.uk> Message-ID: <4AC4AC8C.8070105@compbio.dundee.ac.uk> Hello all. Here's my review of plone4bio+Biosql. Thanks to Peter and Michael who sent me encouraging emails - sorry it took so long to post! Finally, please accept my apologies in advance for any unnecessary rambling... and for my cross-posting to p4bio and biosql-l. Installing Plone4Bio -------------------- This basically went according to the instructions, except for two issues: 1. I experienced some problems accessing some python egg repositories, and had to manually download and build one module before adding it to the buildout (python build system) configuration. This was possibly related to our local network config, since Ivan Rossi couldn't reproduce the problem. 2. Once the download/build/plone-instance generation steps were finished the plone server instance that had been built took way too long to launch. The installation was running off a directory hosted on our SAN, and I decided the delay was probably due to the large number of files needed by plone. I ended up moving the whole install onto a locally attached disk to minimise the time spent statting the files on a network. In that config, the server comes up after around 40-60 secs on a lightly loaded Opteron. Adding a biosql database and browsing ------------------------------------- It was easy to add connections to a local biosql database - even for a plone admin novice like myself. All you need is to know how to form the appropriate python database connector URI - however, a minor patch to the site's help text is needed to remind certain forgetful users (me) how to put the database user's password in the ODBC (?) string. Once added, I could access the source and browse through my bioentry sequences via the same list interface as shown in the demo. Clicking on a sequence link gave me the same five tabs (annotation, features, dbxrefs, sequence, references) as in the demo. However, here is where I noticed some issues which I've logged on the plone4bio trac: * issue #1: Plone4bio uses the bioentry_id primary key as the main identifier for the bioentry, rather than its accession. E.g. a sequence's plon4bio record has the URL http://myploneserver/plone4bio/mybiosqlsource/bioentry.database/bioentry.id As people on the list will know, the bioentry ID primary key is autogenerated and only really for internal consumption. Using it as the primary identifier means it's not possible to link directly to a sequence's page if you only know its bioentry database and accession. * issue #2: The imagemap shown under the 'Features' tab is generated using bioperl from a genbank file emitted by biopython. This is a flaw, and means lots of info is lost (my biosql db is used to serve protein sequence DAS annotation, so it has URLs, scores, and lots of notes). I had to hack this script to cope with feature labels that contain spaces in order for the intervals to display correctly (otherwise they get a start of '-1'). I'd recommend that the image generator is modified to use a less restrictive format, and/or made easily pluggable to allow other feature renderers to be used (perhaps even something like dasty). * issue #3: The search box doesn't search BioSQL datasources. No idea how hard this would be to fix, but a little plone knowledge probably required. This was a bit of a killer for me - I was hoping for a basic search interface that worked out of the box, allowing me to focus on providing more advanced queries. As it is, I don't have the time at this moment to fix this issue myself. Suggested Enhancements ---------------------- The Biosql/GenBank data format transformation is an easily fixed bug in the current plone4bio version, but it stopped me exploring the das/biojava/bioperl/biopython interoperation issues any further. However, it also revealed a few aspects of the plone4bio architecture that might need thinking about: 1. pluggable feature rendering tools - potentially use the biosql connection directly (already said) 2. easily configured database cross-reference linkout URLs. Typically, its bad form to hard-code URLs within a biosql database, and plone4bio has its own set of URLs that it decorates dbxrefs with. However, these are currently buried inside the plone4bio python code, but they could be configured via a flatfile or even via the web interface. In summary... ------------- This process took far longer than I'd expected, and the slow install and startup time gave me the impression that plone is a heavyweight solution that may not have sufficient performance for high-volume situations (I'm sure I'm wrong here). The functionality available at the time of writing is not enough for my purposes - but it is a good starting point (particularly if you know how to develop in plone). However - if issues 1,2 and 3 were resolved, and the default .cfg scripts were made more robust and slightly better commented for python-n00bs like myself, then plone4bio would certainly be worth installing to provide basic biosql datasource browsing for your lab or institute. thats all folks! Jim. -- ------------------------------------------------------------------- J. B. Procter (Jalview/ENFIN) Barton Bioinformatics Research Group Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk The University of Dundee is a Scottish Registered Charity, No. SC015096. From biopython at maubp.freeserve.co.uk Thu Oct 1 14:38:39 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 1 Oct 2009 15:38:39 +0100 Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL In-Reply-To: <4AC4AC8C.8070105@compbio.dundee.ac.uk> References: <4ABB4866.9060200@compbio.dundee.ac.uk> <4ABB6FDC.2020007@compbio.dundee.ac.uk> <4AC4AC8C.8070105@compbio.dundee.ac.uk> Message-ID: <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com> Thanks for the report James! On Thu, Oct 1, 2009 at 2:20 PM, James Procter wrote: > > * issue #2: The imagemap shown under the 'Features' tab is generated using > bioperl from a genbank file emitted by biopython. This is a flaw, and means > lots of info is lost (my biosql db is used to serve protein > sequence DAS annotation, so it has URLs, scores, and lots of notes). That is a curious and round about way of doing things, with many data transformations risking loosing things at each point. It would be possible to use Biopython's GenomeDiagram module to draw the image directly (although the style and capabilities would differ). I've done this for an in house TurboGears based BioSQL front end, and it was fine for prokaryotic organisms. Another more elegant alternative would be to call a BioPerl script which talks to the BioSQL database directly to get the data to draw the image. Can you point me at the relevant files in Plone4bio to see their code? I agree with your general point that a pluggable rendering option might be best, but that would be a question for the Plone4bio team to debate. Peter [Biopython Project] From jimp at compbio.dundee.ac.uk Thu Oct 1 15:08:21 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Thu, 01 Oct 2009 16:08:21 +0100 Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL In-Reply-To: <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com> References: <4ABB4866.9060200@compbio.dundee.ac.uk> <4ABB6FDC.2020007@compbio.dundee.ac.uk> <4AC4AC8C.8070105@compbio.dundee.ac.uk> <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com> Message-ID: <4AC4C5E5.3000508@compbio.dundee.ac.uk> Peter wrote: > Thanks for the report James! :) > On Thu, Oct 1, 2009 at 2:20 PM, James Procter wrote: >> * issue #2: The imagemap shown under the 'Features' tab is generated using >> bioperl from a genbank file emitted by biopython. This is a flaw, and means >> lots of info is lost (my biosql db is used to serve protein >> sequence DAS annotation, so it has URLs, scores, and lots of notes). > > That is a curious and round about way of doing things, with many > data transformations risking loosing things at each point. I can understand why it was done - if you already have an image renderer that eats genbank, its the shortest path :) > It would be possible to use Biopython's GenomeDiagram module to > draw the image directly (although the style and capabilities would > differ). I've done this for an in house TurboGears based BioSQL > front end, and it was fine for prokaryotic organisms. Sounds good... I was sure there was a python way to go here. Happy to test any alternative you can provide ;) > Another more elegant alternative would be to call a BioPerl script which > talks to the BioSQL database directly to get the data to draw the image. definitely. It does incur the overhead of creating a new database connection and instantiating another object representation of the same biosql records, the latter isn't really a problem but the former could have scalabilty implications. > Can you point me at the relevant files in Plone4bio to see their code? > I agree with your general point that a pluggable rendering option might > be best, but that would be a question for the Plone4bio team to debate. The bioperl bits that generate images/maps for genbank files are here: https://www.plone4bio.org/trac/browser/plone4bio.base/trunk/src/plone4bio/base/png/perl The python that does the piping is here: https://www.plone4bio.org/trac/browser/plone4bio.base/trunk/src/plone4bio/base/png/seqrecord.py Looking at the code again, I can see that there are well defined interfaces - so in principle, plugging in other instances should be fairly easy. My issues are here: Patch for genbank parser: https://www.plone4bio.org/trac/ticket/8 Enhancement for Pluggable Renderers : https://www.plone4bio.org/trac/ticket/9 hope it helps! Jim. From ivan at biodec.com Thu Oct 1 15:22:13 2009 From: ivan at biodec.com (Ivan Rossi) Date: Thu, 1 Oct 2009 17:22:13 +0200 (CEST) Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL In-Reply-To: <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com> References: <4ABB4866.9060200@compbio.dundee.ac.uk> <4ABB6FDC.2020007@compbio.dundee.ac.uk> <4AC4AC8C.8070105@compbio.dundee.ac.uk> <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com> Message-ID: On Thu, 1 Oct 2009, Peter wrote: > Thanks for the report James! > > On Thu, Oct 1, 2009 at 2:20 PM, James Procter wrote: >> >> * issue #2: The imagemap shown under the 'Features' tab is generated using >> bioperl from a genbank file emitted by biopython. This is a flaw, and means >> lots of info is lost (my biosql db is used to serve protein >> sequence DAS annotation, so it has URLs, scores, and lots of notes). > > That is a curious and round about way of doing things, with many > data transformations risking loosing things at each point. > > It would be possible to use Biopython's GenomeDiagram module to > draw the image directly (although the style and capabilities would > differ). I've done this for an in house TurboGears based BioSQL > front end, and it was fine for prokaryotic organisms. Hello Peter, happy that you are now on p4b too and not just many of us on biopython &;-) We plan to remove the Bioperl-graphics option at some time, since we already need biopython for many things, and we are aware it is somewhat a kludge. Furthermore a full python implementation will be well-integrated within Zope HOWEVER there are valid technical reasons for that, the main one being that Bioperl-graphics is VERY advanced compared, in particular it automatically handles clashes of features lines and text, and map support. (click on a feature line to show a feature summary). They were not available at the time we evaluated GenomeDiagram (at the time it was not even in the standard distribution but just within Biopython CVS). And clashes-handling is a VERY DESIRABLE FEATURE if you always want readable images when you have lots of features of the same kind. > Another more elegant alternative would be to call a BioPerl script which > talks to the BioSQL database directly to get the data to draw the image. > > Can you point me at the relevant files in Plone4bio to see their code? > I agree with your general point that a pluggable rendering option might > be best, but that would be a question for the Plone4bio team to debate. Pluggable rendering option will be GREAT. As I said above we think that having mixed-language code is a problem, and we would like a pure-python implementation. Actually we started evaluation of genometools graphics too (see http://genometools.org/annotationsketch.html) since it has python bindings and looks nice, but other company priorities stalled it. (read: we have to provide a working solution to a customer NOW) We are open to contribution. Ivan -- Ivan Rossi, PhD - ivan AT biodec dot com BioDec Srl, Via Calzavecchio 20/2, 40033 Casalecchio di Reno (BO), Italy Phone: (+39)-051-0548263 - Fax: (+39)-051-7459582 - http://www.biodec.com From biopython at maubp.freeserve.co.uk Thu Oct 1 15:32:01 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 1 Oct 2009 16:32:01 +0100 Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL In-Reply-To: <4AC4C5E5.3000508@compbio.dundee.ac.uk> References: <4ABB4866.9060200@compbio.dundee.ac.uk> <4ABB6FDC.2020007@compbio.dundee.ac.uk> <4AC4AC8C.8070105@compbio.dundee.ac.uk> <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com> <4AC4C5E5.3000508@compbio.dundee.ac.uk> Message-ID: <320fb6e00910010832q4290259bh3b9cead501ebb6f5@mail.gmail.com> James wrote: > Peter wrote: >> James wrote: >>> >>> * issue #2: The imagemap shown under the 'Features' tab is generated >>> using bioperl from a genbank file emitted by biopython. This is a flaw, >>> and means lots of info is lost (my biosql db is used to serve protein >>> sequence DAS annotation, so it has URLs, scores, and lots of notes). >> >> That is a curious and round about way of doing things, with many >> data transformations risking loosing things at each point. > > I can understand why it was done - if you already have an image renderer > that eats genbank, its the shortest path :) That could easily be the case. >> It would be possible to use Biopython's GenomeDiagram module to >> draw the image directly (although the style and capabilities would >> differ). I've done this for an in house TurboGears based BioSQL >> front end, and it was fine for prokaryotic organisms. > > Sounds good... I was sure there was a python way to go here. Happy > to test any alternative you can provide ;) Without me installing my own copy of Plone4Bio, I would at least need to see a sample PNG image to try and mimic. However, from Ivan's email it sounds like they need features we don't currently support. >> Another more elegant alternative would be to call a BioPerl script which >> talks to the BioSQL database directly to get the data to draw the image. > > definitely. It does incur the overhead of creating a new database connection > and instantiating another object representation of the same biosql records, > the latter isn't really a problem but the former could have scalabilty > implications. It does look like Plone already has a Biopython (DB)SeqRecord object in memory, so yes, constructing the BioPerl equivalent from the database might be a bit of a waste. >> Can you point me at the relevant files in Plone4bio to see their code? >> I agree with your general point that a pluggable rendering option might >> be best, but that would be a question for the Plone4bio team to debate. > > The bioperl bits that generate images/maps for genbank files are here: > https://www.plone4bio.org/trac/browser/plone4bio.base/trunk/src/plone4bio/base/png/perl > > The python that does the piping is here: > https://www.plone4bio.org/trac/browser/plone4bio.base/trunk/src/plone4bio/base/png/seqrecord.py > > Looking at the code again, I can see that there are well defined interfaces > - so in principle, plugging in other instances should be fairly easy. I'm sure they could also spit out the database primary keys, and pass that the BioPerl script which can use bioperl-db to talk to the database. It may be that the current GenBank file route is faster though ;) > My issues are here: > Patch for genbank parser: https://www.plone4bio.org/trac/ticket/8 Could you attach an example of the problem GenBank files being generated? Before we blame the BioPerl parser, we should check that Biopython is producing n valid GenBank file. Off hand, I'm not sure if feature types are allowed to have spaces in them for example. James - how was your BioSQL database populated? Peter From biopython at maubp.freeserve.co.uk Thu Oct 1 15:46:28 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 1 Oct 2009 16:46:28 +0100 Subject: [BioSQL-l] [P4b] Plone4bio 1.0 and BioSQL In-Reply-To: References: <4ABB4866.9060200@compbio.dundee.ac.uk> <4ABB6FDC.2020007@compbio.dundee.ac.uk> <4AC4AC8C.8070105@compbio.dundee.ac.uk> <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com> Message-ID: <320fb6e00910010846n138b55acmbba881f1e59b80c8@mail.gmail.com> On Thu, Oct 1, 2009 at 4:22 PM, Ivan Rossi wrote: > > Hello Peter, happy that you are now on p4b too and not just many of us on > biopython &;-) Hi! Looking at the Plone4bio.org website I was surprised to see no mention of BioSQL, Biopython on the install page (while BioPerl is mentioned): http://plone4bio.org/trac/wiki/Install I did eventually find the Manifesto page, but think it could be a little more prominent (or even merged into the main page?) http://plone4bio.org/trac/wiki/Manifesto >From the way you are using Biopython for GenBank output, I guess you need at least Biopython 1.51 for the feature support, but other than that I am not clear how extensively Biopython is used. P.S. The Manifesto page has a broken link to Plone (first paragraph), and officially it should be Biopython not BioPython. > We plan to remove the Bioperl-graphics option at some time, since we > already need biopython for many things, and we are aware it is somewhat a > kludge. Furthermore a full python implementation will be well-integrated > within Zope HOWEVER there are valid technical reasons for that, the main > one being that Bioperl-graphics is VERY advanced compared, in particular it > automatically handles clashes of features lines and text, and map support. > (click on a feature line to show a feature summary). They were not > available at the time we evaluated GenomeDiagram (at the time it was not > even in the standard distribution but just within Biopython CVS). And > clashes-handling is a VERY DESIRABLE FEATURE if you always want readable > images when you have lots of features of the same kind. For now, GenomeDiagram requires you to put features on different tracks explicitly to avoid overlaps or clashes. Not ideal for your needs. What did you mean by "map support"? ReportLab's trunk (i.e. pre 2.4) has good SVG output, and I have been meaning to contribute basic HTML image map output to them. Either of these can be used with Biopython's GenomeDiagram and a tiny patch to make diagrams with click-able features. This worked pretty well I found. Peter From michael.watson at bbsrc.ac.uk Fri Oct 2 10:15:32 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Fri, 2 Oct 2009 11:15:32 +0100 Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL In-Reply-To: <4AC4AC8C.8070105@compbio.dundee.ac.uk> References: <4ABB4866.9060200@compbio.dundee.ac.uk> <4ABB6FDC.2020007@compbio.dundee.ac.uk> <4AC4AC8C.8070105@compbio.dundee.ac.uk> Message-ID: <8D08960C647E64438CE5740657CBBDC5010D2AC730@iahcexch1.iah.bbsrc.ac.uk> Hi Jim Thanks for that. I think this has real potential, but I am luke warm about the sequence images - I am not sure I would need them in this context. Could you expand on this but? * issue #3: The search box doesn't search BioSQL datasources. No idea how hard this would be to fix, but a little plone knowledge probably required. So the search box doesn't do the equivalent of a full text search of the BioSQL database? Mick -----Original Message----- From: biosql-l-bounces at lists.open-bio.org [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of James Procter Sent: 01 October 2009 14:20 To: BioSQL-l at lists.open-bio.org Cc: Plone4Bio mailing list Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL Hello all. Here's my review of plone4bio+Biosql. Thanks to Peter and Michael who sent me encouraging emails - sorry it took so long to post! Finally, please accept my apologies in advance for any unnecessary rambling... and for my cross-posting to p4bio and biosql-l. Installing Plone4Bio -------------------- This basically went according to the instructions, except for two issues: 1. I experienced some problems accessing some python egg repositories, and had to manually download and build one module before adding it to the buildout (python build system) configuration. This was possibly related to our local network config, since Ivan Rossi couldn't reproduce the problem. 2. Once the download/build/plone-instance generation steps were finished the plone server instance that had been built took way too long to launch. The installation was running off a directory hosted on our SAN, and I decided the delay was probably due to the large number of files needed by plone. I ended up moving the whole install onto a locally attached disk to minimise the time spent statting the files on a network. In that config, the server comes up after around 40-60 secs on a lightly loaded Opteron. Adding a biosql database and browsing ------------------------------------- It was easy to add connections to a local biosql database - even for a plone admin novice like myself. All you need is to know how to form the appropriate python database connector URI - however, a minor patch to the site's help text is needed to remind certain forgetful users (me) how to put the database user's password in the ODBC (?) string. Once added, I could access the source and browse through my bioentry sequences via the same list interface as shown in the demo. Clicking on a sequence link gave me the same five tabs (annotation, features, dbxrefs, sequence, references) as in the demo. However, here is where I noticed some issues which I've logged on the plone4bio trac: * issue #1: Plone4bio uses the bioentry_id primary key as the main identifier for the bioentry, rather than its accession. E.g. a sequence's plon4bio record has the URL http://myploneserver/plone4bio/mybiosqlsource/bioentry.database/bioentry.id As people on the list will know, the bioentry ID primary key is autogenerated and only really for internal consumption. Using it as the primary identifier means it's not possible to link directly to a sequence's page if you only know its bioentry database and accession. * issue #2: The imagemap shown under the 'Features' tab is generated using bioperl from a genbank file emitted by biopython. This is a flaw, and means lots of info is lost (my biosql db is used to serve protein sequence DAS annotation, so it has URLs, scores, and lots of notes). I had to hack this script to cope with feature labels that contain spaces in order for the intervals to display correctly (otherwise they get a start of '-1'). I'd recommend that the image generator is modified to use a less restrictive format, and/or made easily pluggable to allow other feature renderers to be used (perhaps even something like dasty). * issue #3: The search box doesn't search BioSQL datasources. No idea how hard this would be to fix, but a little plone knowledge probably required. This was a bit of a killer for me - I was hoping for a basic search interface that worked out of the box, allowing me to focus on providing more advanced queries. As it is, I don't have the time at this moment to fix this issue myself. Suggested Enhancements ---------------------- The Biosql/GenBank data format transformation is an easily fixed bug in the current plone4bio version, but it stopped me exploring the das/biojava/bioperl/biopython interoperation issues any further. However, it also revealed a few aspects of the plone4bio architecture that might need thinking about: 1. pluggable feature rendering tools - potentially use the biosql connection directly (already said) 2. easily configured database cross-reference linkout URLs. Typically, its bad form to hard-code URLs within a biosql database, and plone4bio has its own set of URLs that it decorates dbxrefs with. However, these are currently buried inside the plone4bio python code, but they could be configured via a flatfile or even via the web interface. In summary... ------------- This process took far longer than I'd expected, and the slow install and startup time gave me the impression that plone is a heavyweight solution that may not have sufficient performance for high-volume situations (I'm sure I'm wrong here). The functionality available at the time of writing is not enough for my purposes - but it is a good starting point (particularly if you know how to develop in plone). However - if issues 1,2 and 3 were resolved, and the default .cfg scripts were made more robust and slightly better commented for python-n00bs like myself, then plone4bio would certainly be worth installing to provide basic biosql datasource browsing for your lab or institute. thats all folks! Jim. -- ------------------------------------------------------------------- J. B. Procter (Jalview/ENFIN) Barton Bioinformatics Research Group Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk The University of Dundee is a Scottish Registered Charity, No. SC015096. _______________________________________________ BioSQL-l mailing list BioSQL-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biosql-l From ivan at biodec.com Tue Oct 6 14:33:36 2009 From: ivan at biodec.com (Ivan Rossi) Date: Tue, 6 Oct 2009 16:33:36 +0200 (CEST) Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL In-Reply-To: <8D08960C647E64438CE5740657CBBDC5010D2AC730@iahcexch1.iah.bbsrc.ac.uk> References: <4ABB4866.9060200@compbio.dundee.ac.uk> <4ABB6FDC.2020007@compbio.dundee.ac.uk> <4AC4AC8C.8070105@compbio.dundee.ac.uk> <8D08960C647E64438CE5740657CBBDC5010D2AC730@iahcexch1.iah.bbsrc.ac.uk> Message-ID: On Fri, 2 Oct 2009, michael watson (IAH-C) wrote: > Hi Jim > > Thanks for that. I think this has real potential, but I am luke warm > about the sequence images - I am not sure I would need them in this > context. Then do not use them. &;-) You also have the hidden-able text tables below But your comment gave me the opportunity to point out (someone asked it off-list) that they are generated on-the-fly as needed, so that they do not eat unnecessary space on the server. Nonetheless many people think that the feature images are really neat and useful. That's why you find this kind of images on genome browsers... > Could you expand on this but? > > * issue #3: The search box doesn't search BioSQL datasources. No idea > how hard this would be to fix, but a little plone knowledge probably > required. > > So the search box doesn't do the equivalent of a full text search of the > BioSQL database? I can assure you that IT DOES. try http://p4bdemo.biodec.com for yourself I already told Jim off-list: if live-search does not work means that indexing died at some point. It appears that, at some point during indexing, Plone search engine has a relatively large need for RAM. To index human CDS entries from NCBI and human proteins from Uniprot on p4bdemo.biodec.com we used a virtual machine with 4GB of RAM, so I would say that seems to be the requirement to load a complete euchariotic genome. Could be less. BTW it also took a couple of hours, so do not expect immediate availability of livesearch when you add tens of thousands of sequences. On the contrary browsing is immediately available. I am going to add this information to the Plone4bio wiki in the installation requirements. Anyway I suggest that you test your install using something smaller, such as a bacterial proteome, to verify that everything is up and running, before attacking large databases. More about indexing later, on p4b ML. Ivan > > Mick > > -----Original Message----- > From: biosql-l-bounces at lists.open-bio.org [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of James Procter > Sent: 01 October 2009 14:20 > To: BioSQL-l at lists.open-bio.org > Cc: Plone4Bio mailing list > Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL > > > Hello all. > > Here's my review of plone4bio+Biosql. Thanks to Peter and Michael who > sent me encouraging emails - sorry it took so long to post! Finally, > please accept my apologies in advance for any unnecessary rambling... > and for my cross-posting to p4bio and biosql-l. > > Installing Plone4Bio > -------------------- > This basically went according to the instructions, except for two issues: > 1. I experienced some problems accessing some python egg repositories, > and had to manually download and build one module before adding it to > the buildout (python build system) configuration. This was possibly > related to our local network config, since Ivan Rossi couldn't reproduce > the problem. > > 2. Once the download/build/plone-instance generation steps were > finished the plone server instance that had been built took way too long > to launch. The installation was running off a directory hosted on our > SAN, and I decided the delay was probably due to the large number of > files needed by plone. I ended up moving the whole install onto a > locally attached disk to minimise the time spent statting the files on a > network. In that config, the server comes up after around 40-60 secs on > a lightly loaded Opteron. > > > Adding a biosql database and browsing > ------------------------------------- > It was easy to add connections to a local biosql database - even for a > plone admin novice like myself. All you need is to know how to form the > appropriate python database connector URI - however, a minor patch to > the site's help text is needed to remind certain forgetful users (me) > how to put the database user's password in the ODBC (?) string. > > Once added, I could access the source and browse through my bioentry > sequences via the same list interface as shown in the demo. Clicking on > a sequence link gave me the same five tabs (annotation, features, > dbxrefs, sequence, references) as in the demo. However, here is where I > noticed some issues which I've logged on the plone4bio trac: > > * issue #1: Plone4bio uses the bioentry_id primary key as the main > identifier for the bioentry, rather than its accession. E.g. a > sequence's plon4bio record has the URL > http://myploneserver/plone4bio/mybiosqlsource/bioentry.database/bioentry.id > > As people on the list will know, the bioentry ID primary key is > autogenerated and only really for internal consumption. Using it as the > primary identifier means it's not possible to link directly to a > sequence's page if you only know its bioentry database and accession. > > * issue #2: The imagemap shown under the 'Features' tab is generated > using bioperl from a genbank file emitted by biopython. This is a flaw, > and means lots of info is lost (my biosql db is used to serve protein > sequence DAS annotation, so it has URLs, scores, and lots of notes). > > I had to hack this script to cope with feature labels that contain > spaces in order for the intervals to display correctly (otherwise they > get a start of '-1'). I'd recommend that the image generator is modified > to use a less restrictive format, and/or made easily pluggable to allow > other feature renderers to be used (perhaps even something like dasty). > > * issue #3: The search box doesn't search BioSQL datasources. No idea > how hard this would be to fix, but a little plone knowledge probably > required. > > This was a bit of a killer for me - I was hoping for a basic search > interface that worked out of the box, allowing me to focus on providing > more advanced queries. As it is, I don't have the time at this moment to > fix this issue myself. > > Suggested Enhancements > ---------------------- > The Biosql/GenBank data format transformation is an easily fixed bug in > the current plone4bio version, but it stopped me exploring the > das/biojava/bioperl/biopython interoperation issues any further. > However, it also revealed a few aspects of the plone4bio architecture > that might need thinking about: > > 1. pluggable feature rendering tools - potentially use the biosql > connection directly (already said) > 2. easily configured database cross-reference linkout URLs. Typically, > its bad form to hard-code URLs within a biosql database, and plone4bio > has its own set of URLs that it decorates dbxrefs with. However, these > are currently buried inside the plone4bio python code, but they could be > configured via a flatfile or even via the web interface. > > > In summary... > ------------- > This process took far longer than I'd expected, and the slow install and > startup time gave me the impression that plone is a heavyweight solution > that may not have sufficient performance for high-volume situations (I'm > sure I'm wrong here). > > The functionality available at the time of writing is not enough for my > purposes - but it is a good starting point (particularly if you know how > to develop in plone). However - if issues 1,2 and 3 were resolved, and > the default .cfg scripts were made more robust and slightly better > commented for python-n00bs like myself, then plone4bio would certainly > be worth installing to provide basic biosql datasource browsing for your > lab or institute. > > thats all folks! > Jim. > > -- > ------------------------------------------------------------------- > J. B. Procter (Jalview/ENFIN) Barton Bioinformatics Research Group > Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk > The University of Dundee is a Scottish Registered Charity, No. SC015096. > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- Ivan Rossi, PhD - ivan AT biodec dot com, ivan dot rossi3 AT unibo dot it BioDec Srl, Via Calzavecchio 20/2, 40033 Casalecchio di Reno (BO), Italy Phone: (+39)-051-0548263 - Fax: (+39)-051-7459582 - http://www.biodec.com