[BioSQL-l] Plone4bio 1.0 and BioSQL

Ivan Rossi ivan at biodec.com
Tue Oct 6 14:33:36 UTC 2009


On Fri, 2 Oct 2009, michael watson (IAH-C) wrote:

> Hi Jim
>
> Thanks for that.  I think this has real potential, but I am luke warm 
> about the sequence images - I am not sure I would need them in this 
> context.

Then do not use them. &;-) You also have the hidden-able text tables below

But your comment gave me the opportunity to point out (someone asked it 
off-list)  that they are generated on-the-fly as needed, so that they do 
not eat unnecessary space on the server.

Nonetheless many people think that the feature images are really neat and 
useful. That's why you find this kind of images on genome browsers...

> Could you expand on this but?
>
> * issue #3: The search box doesn't search BioSQL datasources. No idea
> how hard this would be to fix, but a little plone knowledge probably
> required.
>
> So the search box doesn't do the equivalent of a full text search of the 
> BioSQL database?

I can assure you that IT DOES. try http://p4bdemo.biodec.com for yourself

I already told Jim off-list: if live-search does not work means that 
indexing died at some point. It appears that, at some point during 
indexing, Plone search engine has a relatively large need for RAM. To index 
human CDS entries from NCBI and human proteins from Uniprot on 
p4bdemo.biodec.com we used a virtual machine with 4GB of RAM, so I would 
say that seems to be the requirement to load a complete euchariotic genome. 
Could be less. BTW it also took a couple of hours, so do not expect 
immediate availability of livesearch when you add tens of thousands of 
sequences. On the contrary browsing is immediately available.

I am going to add this information to the Plone4bio wiki in the 
installation requirements.

Anyway I suggest that you test your install using something smaller, such 
as a bacterial proteome, to verify that everything is up and running, 
before attacking large databases.

More about indexing later, on p4b ML.

Ivan

>
> Mick
>
> -----Original Message-----
> From: biosql-l-bounces at lists.open-bio.org [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of James Procter
> Sent: 01 October 2009 14:20
> To: BioSQL-l at lists.open-bio.org
> Cc: Plone4Bio mailing list
> Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL
>
>
> Hello all.
>
> Here's my review of plone4bio+Biosql. Thanks to Peter and Michael who
> sent me encouraging emails - sorry it took so long to post! Finally,
> please accept my apologies in advance for any unnecessary rambling...
> and for my cross-posting to p4bio and biosql-l.
>
> Installing Plone4Bio
> --------------------
> This basically went according to the instructions, except for two issues:
>  1. I experienced some problems accessing some python egg repositories,
> and had to manually download and build one module before adding it to
> the buildout (python build system) configuration. This was possibly
> related to our local network config, since Ivan Rossi couldn't reproduce
> the problem.
>
>  2. Once the download/build/plone-instance generation steps were
> finished the plone server instance that had been built took way too long
> to launch. The installation was running off a directory hosted on our
> SAN, and I decided the delay was probably due to the large number of
> files needed by plone. I ended up moving the whole install onto a
> locally attached disk to minimise the time spent statting the files on a
> network. In that config, the server comes up after around 40-60 secs on
> a lightly loaded Opteron.
>
>
> Adding a biosql database and browsing
> -------------------------------------
> It was easy to add connections to a local biosql database - even for a
> plone admin novice like myself. All you need is to know how to form the
> appropriate python database connector URI - however, a minor patch to
> the site's help text is needed to remind certain forgetful users (me)
> how to put the database user's password in the ODBC (?) string.
>
> Once added, I could access the source and browse through my bioentry
> sequences via the same list interface as shown in the demo. Clicking on
> a sequence link gave me the same five tabs (annotation, features,
> dbxrefs, sequence, references) as in the demo. However, here is where I
> noticed some issues which I've logged on the plone4bio trac:
>
> * issue #1: Plone4bio uses the bioentry_id primary key as the main
> identifier for the bioentry, rather than its accession. E.g. a
> sequence's plon4bio record has the URL
> http://myploneserver/plone4bio/mybiosqlsource/bioentry.database/bioentry.id
>
> As people on the list will know, the bioentry ID primary key is
> autogenerated and only really for internal consumption. Using it as the
> primary identifier means it's not possible to link directly to a
> sequence's page if you only know its bioentry database and accession.
>
> * issue #2: The imagemap shown under the 'Features' tab is generated
> using bioperl from a genbank file emitted by biopython. This is a flaw,
> and means lots of info is lost (my biosql db is used to serve protein
> sequence DAS annotation, so it has URLs, scores, and lots of notes).
>
> I had to hack this script to cope with feature labels that contain
> spaces in order for the intervals to display correctly (otherwise they
> get a start of '-1'). I'd recommend that the image generator is modified
> to use a less restrictive format, and/or made easily pluggable to allow
> other feature renderers to be used (perhaps even something like dasty).
>
> * issue #3: The search box doesn't search BioSQL datasources. No idea
> how hard this would be to fix, but a little plone knowledge probably
> required.
>
> This was a bit of a killer for me - I was hoping for a basic search
> interface that worked out of the box, allowing me to focus on providing
> more advanced queries. As it is, I don't have the time at this moment to
> fix this issue myself.
>
> Suggested Enhancements
> ----------------------
> The Biosql/GenBank data format transformation is an easily fixed bug in
> the current plone4bio version, but it stopped me exploring the
> das/biojava/bioperl/biopython interoperation issues any further.
> However, it also revealed a few aspects of the plone4bio architecture
> that might need thinking about:
>
>  1. pluggable feature rendering tools - potentially use the biosql
> connection directly (already said)
>  2. easily configured database cross-reference linkout URLs. Typically,
> its bad form to hard-code URLs within a biosql database, and plone4bio
> has its own set of URLs that it decorates dbxrefs with. However, these
> are currently buried inside the plone4bio python code, but they could be
> configured via a flatfile or even via the web interface.
>
>
> In summary...
> -------------
> This process took far longer than I'd expected, and the slow install and
> startup time gave me the impression that plone is a heavyweight solution
> that may not have sufficient performance for high-volume situations (I'm
> sure I'm wrong here).
>
> The functionality available at the time of writing is not enough for my
> purposes - but it is a good starting point (particularly if you know how
> to develop in plone). However - if issues 1,2 and 3 were resolved, and
> the default .cfg scripts were made more robust and slightly better
> commented for python-n00bs like myself, then plone4bio would certainly
> be worth installing to provide basic biosql datasource browsing for your
> lab or institute.
>
> thats all folks!
> Jim.
>
> -- 
> -------------------------------------------------------------------
> J. B. Procter  (Jalview/ENFIN)  Barton Bioinformatics Research Group
> Phone/Fax:+44(0)1382 388734/345764  http://www.compbio.dundee.ac.uk
> The University of Dundee is a Scottish Registered Charity, No. SC015096.
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

--
Ivan Rossi, PhD - ivan AT biodec dot com, ivan dot rossi3 AT unibo dot it
BioDec Srl, Via Calzavecchio 20/2, 40033 Casalecchio di Reno (BO), Italy
Phone: (+39)-051-0548263 - Fax: (+39)-051-7459582 - http://www.biodec.com



More information about the BioSQL-l mailing list