From jimp at compbio.dundee.ac.uk  Thu Oct  1 09:20:12 2009
From: jimp at compbio.dundee.ac.uk (James Procter)
Date: Thu, 01 Oct 2009 14:20:12 +0100
Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL
In-Reply-To: <4ABB6FDC.2020007@compbio.dundee.ac.uk>
References: <alpine.DEB.1.10.0909151447330.5426@bendaldolum.bo.biodec.com>	<4ABB4866.9060200@compbio.dundee.ac.uk>
	<4ABB6FDC.2020007@compbio.dundee.ac.uk>
Message-ID: <4AC4AC8C.8070105@compbio.dundee.ac.uk>


Hello all.

Here's my review of plone4bio+Biosql. Thanks to Peter and Michael who 
sent me encouraging emails - sorry it took so long to post! Finally, 
please accept my apologies in advance for any unnecessary rambling... 
and for my cross-posting to p4bio and biosql-l.

Installing Plone4Bio
--------------------
This basically went according to the instructions, except for two issues:
  1. I experienced some problems accessing some python egg repositories,
and had to manually download and build one module before adding it to
the buildout (python build system) configuration. This was possibly
related to our local network config, since Ivan Rossi couldn't reproduce
the problem.

  2. Once the download/build/plone-instance generation steps were
finished the plone server instance that had been built took way too long
to launch. The installation was running off a directory hosted on our
SAN, and I decided the delay was probably due to the large number of
files needed by plone. I ended up moving the whole install onto a
locally attached disk to minimise the time spent statting the files on a
network. In that config, the server comes up after around 40-60 secs on
a lightly loaded Opteron.


Adding a biosql database and browsing
-------------------------------------
It was easy to add connections to a local biosql database - even for a
plone admin novice like myself. All you need is to know how to form the
appropriate python database connector URI - however, a minor patch to
the site's help text is needed to remind certain forgetful users (me)
how to put the database user's password in the ODBC (?) string.

Once added, I could access the source and browse through my bioentry
sequences via the same list interface as shown in the demo. Clicking on
a sequence link gave me the same five tabs (annotation, features,
dbxrefs, sequence, references) as in the demo. However, here is where I
noticed some issues which I've logged on the plone4bio trac:

* issue #1: Plone4bio uses the bioentry_id primary key as the main
identifier for the bioentry, rather than its accession. E.g. a
sequence's plon4bio record has the URL
http://myploneserver/plone4bio/mybiosqlsource/bioentry.database/bioentry.id

As people on the list will know, the bioentry ID primary key is
autogenerated and only really for internal consumption. Using it as the
primary identifier means it's not possible to link directly to a
sequence's page if you only know its bioentry database and accession.

* issue #2: The imagemap shown under the 'Features' tab is generated 
using bioperl from a genbank file emitted by biopython. This is a flaw, 
and means lots of info is lost (my biosql db is used to serve protein
sequence DAS annotation, so it has URLs, scores, and lots of notes).

I had to hack this script to cope with feature labels that contain
spaces in order for the intervals to display correctly (otherwise they
get a start of '-1'). I'd recommend that the image generator is modified
to use a less restrictive format, and/or made easily pluggable to allow
other feature renderers to be used (perhaps even something like dasty).

* issue #3: The search box doesn't search BioSQL datasources. No idea 
how hard this would be to fix, but a little plone knowledge probably 
required.

This was a bit of a killer for me - I was hoping for a basic search
interface that worked out of the box, allowing me to focus on providing
more advanced queries. As it is, I don't have the time at this moment to
fix this issue myself.

Suggested Enhancements
----------------------
The Biosql/GenBank data format transformation is an easily fixed bug in
the current plone4bio version, but it stopped me exploring the
das/biojava/bioperl/biopython interoperation issues any further.
However, it also revealed a few aspects of the plone4bio architecture
that might need thinking about:

  1. pluggable feature rendering tools - potentially use the biosql
connection directly (already said)
  2. easily configured database cross-reference linkout URLs. Typically,
its bad form to hard-code URLs within a biosql database, and plone4bio 
has its own set of URLs that it decorates dbxrefs with. However, these 
are currently buried inside the plone4bio python code, but they could be 
configured via a flatfile or even via the web interface.


In summary...
-------------
This process took far longer than I'd expected, and the slow install and
startup time gave me the impression that plone is a heavyweight solution
that may not have sufficient performance for high-volume situations (I'm 
sure I'm wrong here).

The functionality available at the time of writing is not enough for my 
purposes - but it is a good starting point (particularly if you know how 
to develop in plone). However - if issues 1,2 and 3 were resolved, and 
the default .cfg scripts were made more robust and slightly better 
commented for python-n00bs like myself, then plone4bio would certainly 
be worth installing to provide basic biosql datasource browsing for your 
lab or institute.

thats all folks!
Jim.

-- 
-------------------------------------------------------------------
J. B. Procter  (Jalview/ENFIN)  Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764  http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.


From biopython at maubp.freeserve.co.uk  Thu Oct  1 10:38:39 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 1 Oct 2009 15:38:39 +0100
Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL
In-Reply-To: <4AC4AC8C.8070105@compbio.dundee.ac.uk>
References: <alpine.DEB.1.10.0909151447330.5426@bendaldolum.bo.biodec.com>
	<4ABB4866.9060200@compbio.dundee.ac.uk>
	<4ABB6FDC.2020007@compbio.dundee.ac.uk>
	<4AC4AC8C.8070105@compbio.dundee.ac.uk>
Message-ID: <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com>

Thanks for the report James!

On Thu, Oct 1, 2009 at 2:20 PM, James Procter <jimp at compbio.dundee.ac.uk> wrote:
>
> * issue #2: The imagemap shown under the 'Features' tab is generated using
> bioperl from a genbank file emitted by biopython. This is a flaw, and means
> lots of info is lost (my biosql db is used to serve protein
> sequence DAS annotation, so it has URLs, scores, and lots of notes).

That is a curious and round about way of doing things, with many
data transformations risking loosing things at each point.

It would be possible to use Biopython's GenomeDiagram module to
draw the image directly (although the style and capabilities would
differ). I've done this for an in house TurboGears based BioSQL
front end, and it was fine for prokaryotic organisms.

Another more elegant alternative would be to call a BioPerl script which
talks to the BioSQL database directly to get the data to draw the image.

Can you point me at the relevant files in Plone4bio to see their code?
I agree with your general point that a pluggable rendering option might
be best, but that would be a question for the Plone4bio team to debate.

Peter
[Biopython Project]

From jimp at compbio.dundee.ac.uk  Thu Oct  1 11:08:21 2009
From: jimp at compbio.dundee.ac.uk (James Procter)
Date: Thu, 01 Oct 2009 16:08:21 +0100
Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL
In-Reply-To: <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com>
References: <alpine.DEB.1.10.0909151447330.5426@bendaldolum.bo.biodec.com>	
	<4ABB4866.9060200@compbio.dundee.ac.uk>	
	<4ABB6FDC.2020007@compbio.dundee.ac.uk>	
	<4AC4AC8C.8070105@compbio.dundee.ac.uk>
	<320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com>
Message-ID: <4AC4C5E5.3000508@compbio.dundee.ac.uk>


Peter wrote:
> Thanks for the report James!
:)

> On Thu, Oct 1, 2009 at 2:20 PM, James Procter <jimp at compbio.dundee.ac.uk> wrote:
>> * issue #2: The imagemap shown under the 'Features' tab is generated using
>> bioperl from a genbank file emitted by biopython. This is a flaw, and means
>> lots of info is lost (my biosql db is used to serve protein
>> sequence DAS annotation, so it has URLs, scores, and lots of notes).
> 
> That is a curious and round about way of doing things, with many
> data transformations risking loosing things at each point.
I can understand why it was done - if you already have an image renderer 
that eats genbank, its the shortest path :)

> It would be possible to use Biopython's GenomeDiagram module to
> draw the image directly (although the style and capabilities would
> differ). I've done this for an in house TurboGears based BioSQL
> front end, and it was fine for prokaryotic organisms.
Sounds good... I was sure there was a python way to go here. Happy to 
test any alternative you can provide ;)

> Another more elegant alternative would be to call a BioPerl script which
> talks to the BioSQL database directly to get the data to draw the image.
definitely. It does incur the overhead of creating a new database 
connection and instantiating another object representation of the same 
biosql records, the latter isn't really a problem but the former could 
have scalabilty implications.

> Can you point me at the relevant files in Plone4bio to see their code?
> I agree with your general point that a pluggable rendering option might
> be best, but that would be a question for the Plone4bio team to debate.
The bioperl bits that generate images/maps for genbank files are here:
https://www.plone4bio.org/trac/browser/plone4bio.base/trunk/src/plone4bio/base/png/perl

The python that does the piping is here:
https://www.plone4bio.org/trac/browser/plone4bio.base/trunk/src/plone4bio/base/png/seqrecord.py


Looking at the code again, I can see that there are well defined 
interfaces - so in principle, plugging in other instances should be 
fairly easy.

My issues are here:
Patch for genbank parser: https://www.plone4bio.org/trac/ticket/8
Enhancement for Pluggable Renderers : 
https://www.plone4bio.org/trac/ticket/9

hope it helps!
Jim.

From ivan at biodec.com  Thu Oct  1 11:22:13 2009
From: ivan at biodec.com (Ivan Rossi)
Date: Thu, 1 Oct 2009 17:22:13 +0200 (CEST)
Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL
In-Reply-To: <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com>
References: <alpine.DEB.1.10.0909151447330.5426@bendaldolum.bo.biodec.com>
	<4ABB4866.9060200@compbio.dundee.ac.uk>
	<4ABB6FDC.2020007@compbio.dundee.ac.uk>
	<4AC4AC8C.8070105@compbio.dundee.ac.uk>
	<320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com>
Message-ID: <alpine.DEB.1.10.0910011647010.3730@gramsci.bo.biodec.com>

On Thu, 1 Oct 2009, Peter wrote:

> Thanks for the report James!
>
> On Thu, Oct 1, 2009 at 2:20 PM, James Procter <jimp at compbio.dundee.ac.uk> wrote:
>>
>> * issue #2: The imagemap shown under the 'Features' tab is generated using
>> bioperl from a genbank file emitted by biopython. This is a flaw, and means
>> lots of info is lost (my biosql db is used to serve protein
>> sequence DAS annotation, so it has URLs, scores, and lots of notes).
>
> That is a curious and round about way of doing things, with many
> data transformations risking loosing things at each point.
>
> It would be possible to use Biopython's GenomeDiagram module to
> draw the image directly (although the style and capabilities would
> differ). I've done this for an in house TurboGears based BioSQL
> front end, and it was fine for prokaryotic organisms.

Hello Peter, happy that you are now on p4b too and not just many of us on 
biopython &;-)

We plan to remove the Bioperl-graphics option at some time, since we 
already need biopython for many things, and we are aware it is somewhat a 
kludge. Furthermore a full python implementation will be well-integrated 
within Zope HOWEVER there are valid technical reasons for that, the main 
one being that Bioperl-graphics is VERY advanced compared, in particular it 
automatically handles clashes of features lines and text, and map support. 
(click on a feature line to show a feature summary). They were not 
available at the time we evaluated GenomeDiagram (at the time it was not 
even in the standard distribution but just within Biopython CVS). And 
clashes-handling is a VERY DESIRABLE FEATURE if you always want readable 
images when you have lots of features of the same kind.

> Another more elegant alternative would be to call a BioPerl script which
> talks to the BioSQL database directly to get the data to draw the image.
>
> Can you point me at the relevant files in Plone4bio to see their code?
> I agree with your general point that a pluggable rendering option might
> be best, but that would be a question for the Plone4bio team to debate.

Pluggable rendering option will be GREAT. As I said above we think that 
having mixed-language code is a problem, and we would like a pure-python 
implementation.

Actually we started evaluation of genometools graphics too 
(see http://genometools.org/annotationsketch.html) since it has python 
bindings and looks nice, but other company priorities stalled it.
(read: we have to provide a working solution to a customer NOW)

We are open to contribution.

Ivan

--
Ivan Rossi, PhD - ivan AT biodec dot com
BioDec Srl, Via Calzavecchio 20/2, 40033 Casalecchio di Reno (BO), Italy
Phone: (+39)-051-0548263 - Fax: (+39)-051-7459582 - http://www.biodec.com

From biopython at maubp.freeserve.co.uk  Thu Oct  1 11:32:01 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 1 Oct 2009 16:32:01 +0100
Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL
In-Reply-To: <4AC4C5E5.3000508@compbio.dundee.ac.uk>
References: <alpine.DEB.1.10.0909151447330.5426@bendaldolum.bo.biodec.com>
	<4ABB4866.9060200@compbio.dundee.ac.uk>
	<4ABB6FDC.2020007@compbio.dundee.ac.uk>
	<4AC4AC8C.8070105@compbio.dundee.ac.uk>
	<320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com>
	<4AC4C5E5.3000508@compbio.dundee.ac.uk>
Message-ID: <320fb6e00910010832q4290259bh3b9cead501ebb6f5@mail.gmail.com>

James wrote:
> Peter wrote:
>> James wrote:
>>>
>>> * issue #2: The imagemap shown under the 'Features' tab is generated
>>> using bioperl from a genbank file emitted by biopython. This is a flaw,
>>> and means lots of info is lost (my biosql db is used to serve protein
>>> sequence DAS annotation, so it has URLs, scores, and lots of notes).
>>
>> That is a curious and round about way of doing things, with many
>> data transformations risking loosing things at each point.
>
> I can understand why it was done - if you already have an image renderer
> that eats genbank, its the shortest path :)

That could easily be the case.

>> It would be possible to use Biopython's GenomeDiagram module to
>> draw the image directly (although the style and capabilities would
>> differ). I've done this for an in house TurboGears based BioSQL
>> front end, and it was fine for prokaryotic organisms.
>
> Sounds good... I was sure there was a python way to go here. Happy
> to test any alternative you can provide ;)

Without me installing my own copy of Plone4Bio, I would at least need
to see a sample PNG image to try and mimic. However, from Ivan's
email it sounds like they need features we don't currently support.

>> Another more elegant alternative would be to call a BioPerl script which
>> talks to the BioSQL database directly to get the data to draw the image.
>
> definitely. It does incur the overhead of creating a new database connection
> and instantiating another object representation of the same biosql records,
> the latter isn't really a problem but the former could have scalabilty
> implications.

It does look like Plone already has a Biopython (DB)SeqRecord object
in memory, so yes, constructing the BioPerl equivalent from the database
might be a bit of a waste.

>> Can you point me at the relevant files in Plone4bio to see their code?
>> I agree with your general point that a pluggable rendering option might
>> be best, but that would be a question for the Plone4bio team to debate.
>
> The bioperl bits that generate images/maps for genbank files are here:
> https://www.plone4bio.org/trac/browser/plone4bio.base/trunk/src/plone4bio/base/png/perl
>
> The python that does the piping is here:
> https://www.plone4bio.org/trac/browser/plone4bio.base/trunk/src/plone4bio/base/png/seqrecord.py
>
> Looking at the code again, I can see that there are well defined interfaces
> - so in principle, plugging in other instances should be fairly easy.

I'm sure they could also spit out the database primary keys, and pass
that the BioPerl script which can use bioperl-db to talk to the database.
It may be that the current GenBank file route is faster though ;)

> My issues are here:
> Patch for genbank parser: https://www.plone4bio.org/trac/ticket/8

Could you attach an example of the problem GenBank files being
generated? Before we blame the BioPerl parser, we should check
that Biopython is producing n valid GenBank file. Off hand, I'm not
sure if feature types are allowed to have spaces in them for example.

James - how was your BioSQL database populated?

Peter

From biopython at maubp.freeserve.co.uk  Thu Oct  1 11:46:28 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 1 Oct 2009 16:46:28 +0100
Subject: [BioSQL-l] [P4b] Plone4bio 1.0 and BioSQL
In-Reply-To: <alpine.DEB.1.10.0910011647010.3730@gramsci.bo.biodec.com>
References: <alpine.DEB.1.10.0909151447330.5426@bendaldolum.bo.biodec.com>
	<4ABB4866.9060200@compbio.dundee.ac.uk>
	<4ABB6FDC.2020007@compbio.dundee.ac.uk>
	<4AC4AC8C.8070105@compbio.dundee.ac.uk>
	<320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com>
	<alpine.DEB.1.10.0910011647010.3730@gramsci.bo.biodec.com>
Message-ID: <320fb6e00910010846n138b55acmbba881f1e59b80c8@mail.gmail.com>

On Thu, Oct 1, 2009 at 4:22 PM, Ivan Rossi <ivan at biodec.com> wrote:
>
> Hello Peter, happy that you are now on p4b too and not just many of us on
> biopython &;-)

Hi!

Looking at the Plone4bio.org website I was surprised to see no mention
of BioSQL, Biopython on the install page (while BioPerl is mentioned):
http://plone4bio.org/trac/wiki/Install

I did eventually find the Manifesto page, but think it could be a little
more prominent (or even merged into the main page?)
http://plone4bio.org/trac/wiki/Manifesto

>From the way you are using Biopython for GenBank output, I guess
you need at least Biopython 1.51 for the feature support, but other
than that I am not clear how extensively Biopython is used.

P.S. The Manifesto page has a broken link to Plone (first paragraph),
and officially it should be Biopython not BioPython.

> We plan to remove the Bioperl-graphics option at some time, since we
> already need biopython for many things, and we are aware it is somewhat a
> kludge. Furthermore a full python implementation will be well-integrated
> within Zope HOWEVER there are valid technical reasons for that, the main
> one being that Bioperl-graphics is VERY advanced compared, in particular it
> automatically handles clashes of features lines and text, and map support.
> (click on a feature line to show a feature summary). They were not
> available at the time we evaluated GenomeDiagram (at the time it was not
> even in the standard distribution but just within Biopython CVS). And
> clashes-handling is a VERY DESIRABLE FEATURE if you always want readable
> images when you have lots of features of the same kind.

For now, GenomeDiagram requires you to put features on different
tracks explicitly to avoid overlaps or clashes. Not ideal for your needs.

What did you mean by "map support"? ReportLab's trunk (i.e. pre 2.4)
has good SVG output, and I have been meaning to contribute basic
HTML image map output to them. Either of these can be used with
Biopython's GenomeDiagram and a tiny patch to make diagrams with
click-able features. This worked pretty well I found.

Peter

From michael.watson at bbsrc.ac.uk  Fri Oct  2 06:15:32 2009
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Fri, 2 Oct 2009 11:15:32 +0100
Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL
In-Reply-To: <4AC4AC8C.8070105@compbio.dundee.ac.uk>
References: <alpine.DEB.1.10.0909151447330.5426@bendaldolum.bo.biodec.com>
	<4ABB4866.9060200@compbio.dundee.ac.uk>
	<4ABB6FDC.2020007@compbio.dundee.ac.uk>
	<4AC4AC8C.8070105@compbio.dundee.ac.uk>
Message-ID: <8D08960C647E64438CE5740657CBBDC5010D2AC730@iahcexch1.iah.bbsrc.ac.uk>

Hi Jim

Thanks for that.  I think this has real potential, but I am luke warm about the sequence images - I am not sure I would need them in this context.

Could you expand on this but?

* issue #3: The search box doesn't search BioSQL datasources. No idea 
how hard this would be to fix, but a little plone knowledge probably 
required.

So the search box doesn't do the equivalent of a full text search of the BioSQL database?

Mick

-----Original Message-----
From: biosql-l-bounces at lists.open-bio.org [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of James Procter
Sent: 01 October 2009 14:20
To: BioSQL-l at lists.open-bio.org
Cc: Plone4Bio mailing list
Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL


Hello all.

Here's my review of plone4bio+Biosql. Thanks to Peter and Michael who 
sent me encouraging emails - sorry it took so long to post! Finally, 
please accept my apologies in advance for any unnecessary rambling... 
and for my cross-posting to p4bio and biosql-l.

Installing Plone4Bio
--------------------
This basically went according to the instructions, except for two issues:
  1. I experienced some problems accessing some python egg repositories,
and had to manually download and build one module before adding it to
the buildout (python build system) configuration. This was possibly
related to our local network config, since Ivan Rossi couldn't reproduce
the problem.

  2. Once the download/build/plone-instance generation steps were
finished the plone server instance that had been built took way too long
to launch. The installation was running off a directory hosted on our
SAN, and I decided the delay was probably due to the large number of
files needed by plone. I ended up moving the whole install onto a
locally attached disk to minimise the time spent statting the files on a
network. In that config, the server comes up after around 40-60 secs on
a lightly loaded Opteron.


Adding a biosql database and browsing
-------------------------------------
It was easy to add connections to a local biosql database - even for a
plone admin novice like myself. All you need is to know how to form the
appropriate python database connector URI - however, a minor patch to
the site's help text is needed to remind certain forgetful users (me)
how to put the database user's password in the ODBC (?) string.

Once added, I could access the source and browse through my bioentry
sequences via the same list interface as shown in the demo. Clicking on
a sequence link gave me the same five tabs (annotation, features,
dbxrefs, sequence, references) as in the demo. However, here is where I
noticed some issues which I've logged on the plone4bio trac:

* issue #1: Plone4bio uses the bioentry_id primary key as the main
identifier for the bioentry, rather than its accession. E.g. a
sequence's plon4bio record has the URL
http://myploneserver/plone4bio/mybiosqlsource/bioentry.database/bioentry.id

As people on the list will know, the bioentry ID primary key is
autogenerated and only really for internal consumption. Using it as the
primary identifier means it's not possible to link directly to a
sequence's page if you only know its bioentry database and accession.

* issue #2: The imagemap shown under the 'Features' tab is generated 
using bioperl from a genbank file emitted by biopython. This is a flaw, 
and means lots of info is lost (my biosql db is used to serve protein
sequence DAS annotation, so it has URLs, scores, and lots of notes).

I had to hack this script to cope with feature labels that contain
spaces in order for the intervals to display correctly (otherwise they
get a start of '-1'). I'd recommend that the image generator is modified
to use a less restrictive format, and/or made easily pluggable to allow
other feature renderers to be used (perhaps even something like dasty).

* issue #3: The search box doesn't search BioSQL datasources. No idea 
how hard this would be to fix, but a little plone knowledge probably 
required.

This was a bit of a killer for me - I was hoping for a basic search
interface that worked out of the box, allowing me to focus on providing
more advanced queries. As it is, I don't have the time at this moment to
fix this issue myself.

Suggested Enhancements
----------------------
The Biosql/GenBank data format transformation is an easily fixed bug in
the current plone4bio version, but it stopped me exploring the
das/biojava/bioperl/biopython interoperation issues any further.
However, it also revealed a few aspects of the plone4bio architecture
that might need thinking about:

  1. pluggable feature rendering tools - potentially use the biosql
connection directly (already said)
  2. easily configured database cross-reference linkout URLs. Typically,
its bad form to hard-code URLs within a biosql database, and plone4bio 
has its own set of URLs that it decorates dbxrefs with. However, these 
are currently buried inside the plone4bio python code, but they could be 
configured via a flatfile or even via the web interface.


In summary...
-------------
This process took far longer than I'd expected, and the slow install and
startup time gave me the impression that plone is a heavyweight solution
that may not have sufficient performance for high-volume situations (I'm 
sure I'm wrong here).

The functionality available at the time of writing is not enough for my 
purposes - but it is a good starting point (particularly if you know how 
to develop in plone). However - if issues 1,2 and 3 were resolved, and 
the default .cfg scripts were made more robust and slightly better 
commented for python-n00bs like myself, then plone4bio would certainly 
be worth installing to provide basic biosql datasource browsing for your 
lab or institute.

thats all folks!
Jim.

-- 
-------------------------------------------------------------------
J. B. Procter  (Jalview/ENFIN)  Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764  http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.

_______________________________________________
BioSQL-l mailing list
BioSQL-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biosql-l


From ivan at biodec.com  Tue Oct  6 10:33:36 2009
From: ivan at biodec.com (Ivan Rossi)
Date: Tue, 6 Oct 2009 16:33:36 +0200 (CEST)
Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL
In-Reply-To: <8D08960C647E64438CE5740657CBBDC5010D2AC730@iahcexch1.iah.bbsrc.ac.uk>
References: <alpine.DEB.1.10.0909151447330.5426@bendaldolum.bo.biodec.com>
	<4ABB4866.9060200@compbio.dundee.ac.uk>
	<4ABB6FDC.2020007@compbio.dundee.ac.uk>
	<4AC4AC8C.8070105@compbio.dundee.ac.uk>
	<8D08960C647E64438CE5740657CBBDC5010D2AC730@iahcexch1.iah.bbsrc.ac.uk>
Message-ID: <alpine.DEB.1.10.0910051530320.3721@gramsci.bo.biodec.com>

On Fri, 2 Oct 2009, michael watson (IAH-C) wrote:

> Hi Jim
>
> Thanks for that.  I think this has real potential, but I am luke warm 
> about the sequence images - I am not sure I would need them in this 
> context.

Then do not use them. &;-) You also have the hidden-able text tables below

But your comment gave me the opportunity to point out (someone asked it 
off-list)  that they are generated on-the-fly as needed, so that they do 
not eat unnecessary space on the server.

Nonetheless many people think that the feature images are really neat and 
useful. That's why you find this kind of images on genome browsers...

> Could you expand on this but?
>
> * issue #3: The search box doesn't search BioSQL datasources. No idea
> how hard this would be to fix, but a little plone knowledge probably
> required.
>
> So the search box doesn't do the equivalent of a full text search of the 
> BioSQL database?

I can assure you that IT DOES. try http://p4bdemo.biodec.com for yourself

I already told Jim off-list: if live-search does not work means that 
indexing died at some point. It appears that, at some point during 
indexing, Plone search engine has a relatively large need for RAM. To index 
human CDS entries from NCBI and human proteins from Uniprot on 
p4bdemo.biodec.com we used a virtual machine with 4GB of RAM, so I would 
say that seems to be the requirement to load a complete euchariotic genome. 
Could be less. BTW it also took a couple of hours, so do not expect 
immediate availability of livesearch when you add tens of thousands of 
sequences. On the contrary browsing is immediately available.

I am going to add this information to the Plone4bio wiki in the 
installation requirements.

Anyway I suggest that you test your install using something smaller, such 
as a bacterial proteome, to verify that everything is up and running, 
before attacking large databases.

More about indexing later, on p4b ML.

Ivan

>
> Mick
>
> -----Original Message-----
> From: biosql-l-bounces at lists.open-bio.org [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of James Procter
> Sent: 01 October 2009 14:20
> To: BioSQL-l at lists.open-bio.org
> Cc: Plone4Bio mailing list
> Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL
>
>
> Hello all.
>
> Here's my review of plone4bio+Biosql. Thanks to Peter and Michael who
> sent me encouraging emails - sorry it took so long to post! Finally,
> please accept my apologies in advance for any unnecessary rambling...
> and for my cross-posting to p4bio and biosql-l.
>
> Installing Plone4Bio
> --------------------
> This basically went according to the instructions, except for two issues:
>  1. I experienced some problems accessing some python egg repositories,
> and had to manually download and build one module before adding it to
> the buildout (python build system) configuration. This was possibly
> related to our local network config, since Ivan Rossi couldn't reproduce
> the problem.
>
>  2. Once the download/build/plone-instance generation steps were
> finished the plone server instance that had been built took way too long
> to launch. The installation was running off a directory hosted on our
> SAN, and I decided the delay was probably due to the large number of
> files needed by plone. I ended up moving the whole install onto a
> locally attached disk to minimise the time spent statting the files on a
> network. In that config, the server comes up after around 40-60 secs on
> a lightly loaded Opteron.
>
>
> Adding a biosql database and browsing
> -------------------------------------
> It was easy to add connections to a local biosql database - even for a
> plone admin novice like myself. All you need is to know how to form the
> appropriate python database connector URI - however, a minor patch to
> the site's help text is needed to remind certain forgetful users (me)
> how to put the database user's password in the ODBC (?) string.
>
> Once added, I could access the source and browse through my bioentry
> sequences via the same list interface as shown in the demo. Clicking on
> a sequence link gave me the same five tabs (annotation, features,
> dbxrefs, sequence, references) as in the demo. However, here is where I
> noticed some issues which I've logged on the plone4bio trac:
>
> * issue #1: Plone4bio uses the bioentry_id primary key as the main
> identifier for the bioentry, rather than its accession. E.g. a
> sequence's plon4bio record has the URL
> http://myploneserver/plone4bio/mybiosqlsource/bioentry.database/bioentry.id
>
> As people on the list will know, the bioentry ID primary key is
> autogenerated and only really for internal consumption. Using it as the
> primary identifier means it's not possible to link directly to a
> sequence's page if you only know its bioentry database and accession.
>
> * issue #2: The imagemap shown under the 'Features' tab is generated
> using bioperl from a genbank file emitted by biopython. This is a flaw,
> and means lots of info is lost (my biosql db is used to serve protein
> sequence DAS annotation, so it has URLs, scores, and lots of notes).
>
> I had to hack this script to cope with feature labels that contain
> spaces in order for the intervals to display correctly (otherwise they
> get a start of '-1'). I'd recommend that the image generator is modified
> to use a less restrictive format, and/or made easily pluggable to allow
> other feature renderers to be used (perhaps even something like dasty).
>
> * issue #3: The search box doesn't search BioSQL datasources. No idea
> how hard this would be to fix, but a little plone knowledge probably
> required.
>
> This was a bit of a killer for me - I was hoping for a basic search
> interface that worked out of the box, allowing me to focus on providing
> more advanced queries. As it is, I don't have the time at this moment to
> fix this issue myself.
>
> Suggested Enhancements
> ----------------------
> The Biosql/GenBank data format transformation is an easily fixed bug in
> the current plone4bio version, but it stopped me exploring the
> das/biojava/bioperl/biopython interoperation issues any further.
> However, it also revealed a few aspects of the plone4bio architecture
> that might need thinking about:
>
>  1. pluggable feature rendering tools - potentially use the biosql
> connection directly (already said)
>  2. easily configured database cross-reference linkout URLs. Typically,
> its bad form to hard-code URLs within a biosql database, and plone4bio
> has its own set of URLs that it decorates dbxrefs with. However, these
> are currently buried inside the plone4bio python code, but they could be
> configured via a flatfile or even via the web interface.
>
>
> In summary...
> -------------
> This process took far longer than I'd expected, and the slow install and
> startup time gave me the impression that plone is a heavyweight solution
> that may not have sufficient performance for high-volume situations (I'm
> sure I'm wrong here).
>
> The functionality available at the time of writing is not enough for my
> purposes - but it is a good starting point (particularly if you know how
> to develop in plone). However - if issues 1,2 and 3 were resolved, and
> the default .cfg scripts were made more robust and slightly better
> commented for python-n00bs like myself, then plone4bio would certainly
> be worth installing to provide basic biosql datasource browsing for your
> lab or institute.
>
> thats all folks!
> Jim.
>
> -- 
> -------------------------------------------------------------------
> J. B. Procter  (Jalview/ENFIN)  Barton Bioinformatics Research Group
> Phone/Fax:+44(0)1382 388734/345764  http://www.compbio.dundee.ac.uk
> The University of Dundee is a Scottish Registered Charity, No. SC015096.
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

--
Ivan Rossi, PhD - ivan AT biodec dot com, ivan dot rossi3 AT unibo dot it
BioDec Srl, Via Calzavecchio 20/2, 40033 Casalecchio di Reno (BO), Italy
Phone: (+39)-051-0548263 - Fax: (+39)-051-7459582 - http://www.biodec.com

From jimp at compbio.dundee.ac.uk  Thu Oct  1 13:20:12 2009
From: jimp at compbio.dundee.ac.uk (James Procter)
Date: Thu, 01 Oct 2009 14:20:12 +0100
Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL
In-Reply-To: <4ABB6FDC.2020007@compbio.dundee.ac.uk>
References: <alpine.DEB.1.10.0909151447330.5426@bendaldolum.bo.biodec.com>	<4ABB4866.9060200@compbio.dundee.ac.uk>
	<4ABB6FDC.2020007@compbio.dundee.ac.uk>
Message-ID: <4AC4AC8C.8070105@compbio.dundee.ac.uk>


Hello all.

Here's my review of plone4bio+Biosql. Thanks to Peter and Michael who 
sent me encouraging emails - sorry it took so long to post! Finally, 
please accept my apologies in advance for any unnecessary rambling... 
and for my cross-posting to p4bio and biosql-l.

Installing Plone4Bio
--------------------
This basically went according to the instructions, except for two issues:
  1. I experienced some problems accessing some python egg repositories,
and had to manually download and build one module before adding it to
the buildout (python build system) configuration. This was possibly
related to our local network config, since Ivan Rossi couldn't reproduce
the problem.

  2. Once the download/build/plone-instance generation steps were
finished the plone server instance that had been built took way too long
to launch. The installation was running off a directory hosted on our
SAN, and I decided the delay was probably due to the large number of
files needed by plone. I ended up moving the whole install onto a
locally attached disk to minimise the time spent statting the files on a
network. In that config, the server comes up after around 40-60 secs on
a lightly loaded Opteron.


Adding a biosql database and browsing
-------------------------------------
It was easy to add connections to a local biosql database - even for a
plone admin novice like myself. All you need is to know how to form the
appropriate python database connector URI - however, a minor patch to
the site's help text is needed to remind certain forgetful users (me)
how to put the database user's password in the ODBC (?) string.

Once added, I could access the source and browse through my bioentry
sequences via the same list interface as shown in the demo. Clicking on
a sequence link gave me the same five tabs (annotation, features,
dbxrefs, sequence, references) as in the demo. However, here is where I
noticed some issues which I've logged on the plone4bio trac:

* issue #1: Plone4bio uses the bioentry_id primary key as the main
identifier for the bioentry, rather than its accession. E.g. a
sequence's plon4bio record has the URL
http://myploneserver/plone4bio/mybiosqlsource/bioentry.database/bioentry.id

As people on the list will know, the bioentry ID primary key is
autogenerated and only really for internal consumption. Using it as the
primary identifier means it's not possible to link directly to a
sequence's page if you only know its bioentry database and accession.

* issue #2: The imagemap shown under the 'Features' tab is generated 
using bioperl from a genbank file emitted by biopython. This is a flaw, 
and means lots of info is lost (my biosql db is used to serve protein
sequence DAS annotation, so it has URLs, scores, and lots of notes).

I had to hack this script to cope with feature labels that contain
spaces in order for the intervals to display correctly (otherwise they
get a start of '-1'). I'd recommend that the image generator is modified
to use a less restrictive format, and/or made easily pluggable to allow
other feature renderers to be used (perhaps even something like dasty).

* issue #3: The search box doesn't search BioSQL datasources. No idea 
how hard this would be to fix, but a little plone knowledge probably 
required.

This was a bit of a killer for me - I was hoping for a basic search
interface that worked out of the box, allowing me to focus on providing
more advanced queries. As it is, I don't have the time at this moment to
fix this issue myself.

Suggested Enhancements
----------------------
The Biosql/GenBank data format transformation is an easily fixed bug in
the current plone4bio version, but it stopped me exploring the
das/biojava/bioperl/biopython interoperation issues any further.
However, it also revealed a few aspects of the plone4bio architecture
that might need thinking about:

  1. pluggable feature rendering tools - potentially use the biosql
connection directly (already said)
  2. easily configured database cross-reference linkout URLs. Typically,
its bad form to hard-code URLs within a biosql database, and plone4bio 
has its own set of URLs that it decorates dbxrefs with. However, these 
are currently buried inside the plone4bio python code, but they could be 
configured via a flatfile or even via the web interface.


In summary...
-------------
This process took far longer than I'd expected, and the slow install and
startup time gave me the impression that plone is a heavyweight solution
that may not have sufficient performance for high-volume situations (I'm 
sure I'm wrong here).

The functionality available at the time of writing is not enough for my 
purposes - but it is a good starting point (particularly if you know how 
to develop in plone). However - if issues 1,2 and 3 were resolved, and 
the default .cfg scripts were made more robust and slightly better 
commented for python-n00bs like myself, then plone4bio would certainly 
be worth installing to provide basic biosql datasource browsing for your 
lab or institute.

thats all folks!
Jim.

-- 
-------------------------------------------------------------------
J. B. Procter  (Jalview/ENFIN)  Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764  http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.


From biopython at maubp.freeserve.co.uk  Thu Oct  1 14:38:39 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 1 Oct 2009 15:38:39 +0100
Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL
In-Reply-To: <4AC4AC8C.8070105@compbio.dundee.ac.uk>
References: <alpine.DEB.1.10.0909151447330.5426@bendaldolum.bo.biodec.com>
	<4ABB4866.9060200@compbio.dundee.ac.uk>
	<4ABB6FDC.2020007@compbio.dundee.ac.uk>
	<4AC4AC8C.8070105@compbio.dundee.ac.uk>
Message-ID: <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com>

Thanks for the report James!

On Thu, Oct 1, 2009 at 2:20 PM, James Procter <jimp at compbio.dundee.ac.uk> wrote:
>
> * issue #2: The imagemap shown under the 'Features' tab is generated using
> bioperl from a genbank file emitted by biopython. This is a flaw, and means
> lots of info is lost (my biosql db is used to serve protein
> sequence DAS annotation, so it has URLs, scores, and lots of notes).

That is a curious and round about way of doing things, with many
data transformations risking loosing things at each point.

It would be possible to use Biopython's GenomeDiagram module to
draw the image directly (although the style and capabilities would
differ). I've done this for an in house TurboGears based BioSQL
front end, and it was fine for prokaryotic organisms.

Another more elegant alternative would be to call a BioPerl script which
talks to the BioSQL database directly to get the data to draw the image.

Can you point me at the relevant files in Plone4bio to see their code?
I agree with your general point that a pluggable rendering option might
be best, but that would be a question for the Plone4bio team to debate.

Peter
[Biopython Project]


From jimp at compbio.dundee.ac.uk  Thu Oct  1 15:08:21 2009
From: jimp at compbio.dundee.ac.uk (James Procter)
Date: Thu, 01 Oct 2009 16:08:21 +0100
Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL
In-Reply-To: <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com>
References: <alpine.DEB.1.10.0909151447330.5426@bendaldolum.bo.biodec.com>	
	<4ABB4866.9060200@compbio.dundee.ac.uk>	
	<4ABB6FDC.2020007@compbio.dundee.ac.uk>	
	<4AC4AC8C.8070105@compbio.dundee.ac.uk>
	<320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com>
Message-ID: <4AC4C5E5.3000508@compbio.dundee.ac.uk>


Peter wrote:
> Thanks for the report James!
:)

> On Thu, Oct 1, 2009 at 2:20 PM, James Procter <jimp at compbio.dundee.ac.uk> wrote:
>> * issue #2: The imagemap shown under the 'Features' tab is generated using
>> bioperl from a genbank file emitted by biopython. This is a flaw, and means
>> lots of info is lost (my biosql db is used to serve protein
>> sequence DAS annotation, so it has URLs, scores, and lots of notes).
> 
> That is a curious and round about way of doing things, with many
> data transformations risking loosing things at each point.
I can understand why it was done - if you already have an image renderer 
that eats genbank, its the shortest path :)

> It would be possible to use Biopython's GenomeDiagram module to
> draw the image directly (although the style and capabilities would
> differ). I've done this for an in house TurboGears based BioSQL
> front end, and it was fine for prokaryotic organisms.
Sounds good... I was sure there was a python way to go here. Happy to 
test any alternative you can provide ;)

> Another more elegant alternative would be to call a BioPerl script which
> talks to the BioSQL database directly to get the data to draw the image.
definitely. It does incur the overhead of creating a new database 
connection and instantiating another object representation of the same 
biosql records, the latter isn't really a problem but the former could 
have scalabilty implications.

> Can you point me at the relevant files in Plone4bio to see their code?
> I agree with your general point that a pluggable rendering option might
> be best, but that would be a question for the Plone4bio team to debate.
The bioperl bits that generate images/maps for genbank files are here:
https://www.plone4bio.org/trac/browser/plone4bio.base/trunk/src/plone4bio/base/png/perl

The python that does the piping is here:
https://www.plone4bio.org/trac/browser/plone4bio.base/trunk/src/plone4bio/base/png/seqrecord.py


Looking at the code again, I can see that there are well defined 
interfaces - so in principle, plugging in other instances should be 
fairly easy.

My issues are here:
Patch for genbank parser: https://www.plone4bio.org/trac/ticket/8
Enhancement for Pluggable Renderers : 
https://www.plone4bio.org/trac/ticket/9

hope it helps!
Jim.


From ivan at biodec.com  Thu Oct  1 15:22:13 2009
From: ivan at biodec.com (Ivan Rossi)
Date: Thu, 1 Oct 2009 17:22:13 +0200 (CEST)
Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL
In-Reply-To: <320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com>
References: <alpine.DEB.1.10.0909151447330.5426@bendaldolum.bo.biodec.com>
	<4ABB4866.9060200@compbio.dundee.ac.uk>
	<4ABB6FDC.2020007@compbio.dundee.ac.uk>
	<4AC4AC8C.8070105@compbio.dundee.ac.uk>
	<320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com>
Message-ID: <alpine.DEB.1.10.0910011647010.3730@gramsci.bo.biodec.com>

On Thu, 1 Oct 2009, Peter wrote:

> Thanks for the report James!
>
> On Thu, Oct 1, 2009 at 2:20 PM, James Procter <jimp at compbio.dundee.ac.uk> wrote:
>>
>> * issue #2: The imagemap shown under the 'Features' tab is generated using
>> bioperl from a genbank file emitted by biopython. This is a flaw, and means
>> lots of info is lost (my biosql db is used to serve protein
>> sequence DAS annotation, so it has URLs, scores, and lots of notes).
>
> That is a curious and round about way of doing things, with many
> data transformations risking loosing things at each point.
>
> It would be possible to use Biopython's GenomeDiagram module to
> draw the image directly (although the style and capabilities would
> differ). I've done this for an in house TurboGears based BioSQL
> front end, and it was fine for prokaryotic organisms.

Hello Peter, happy that you are now on p4b too and not just many of us on 
biopython &;-)

We plan to remove the Bioperl-graphics option at some time, since we 
already need biopython for many things, and we are aware it is somewhat a 
kludge. Furthermore a full python implementation will be well-integrated 
within Zope HOWEVER there are valid technical reasons for that, the main 
one being that Bioperl-graphics is VERY advanced compared, in particular it 
automatically handles clashes of features lines and text, and map support. 
(click on a feature line to show a feature summary). They were not 
available at the time we evaluated GenomeDiagram (at the time it was not 
even in the standard distribution but just within Biopython CVS). And 
clashes-handling is a VERY DESIRABLE FEATURE if you always want readable 
images when you have lots of features of the same kind.

> Another more elegant alternative would be to call a BioPerl script which
> talks to the BioSQL database directly to get the data to draw the image.
>
> Can you point me at the relevant files in Plone4bio to see their code?
> I agree with your general point that a pluggable rendering option might
> be best, but that would be a question for the Plone4bio team to debate.

Pluggable rendering option will be GREAT. As I said above we think that 
having mixed-language code is a problem, and we would like a pure-python 
implementation.

Actually we started evaluation of genometools graphics too 
(see http://genometools.org/annotationsketch.html) since it has python 
bindings and looks nice, but other company priorities stalled it.
(read: we have to provide a working solution to a customer NOW)

We are open to contribution.

Ivan

--
Ivan Rossi, PhD - ivan AT biodec dot com
BioDec Srl, Via Calzavecchio 20/2, 40033 Casalecchio di Reno (BO), Italy
Phone: (+39)-051-0548263 - Fax: (+39)-051-7459582 - http://www.biodec.com


From biopython at maubp.freeserve.co.uk  Thu Oct  1 15:32:01 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 1 Oct 2009 16:32:01 +0100
Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL
In-Reply-To: <4AC4C5E5.3000508@compbio.dundee.ac.uk>
References: <alpine.DEB.1.10.0909151447330.5426@bendaldolum.bo.biodec.com>
	<4ABB4866.9060200@compbio.dundee.ac.uk>
	<4ABB6FDC.2020007@compbio.dundee.ac.uk>
	<4AC4AC8C.8070105@compbio.dundee.ac.uk>
	<320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com>
	<4AC4C5E5.3000508@compbio.dundee.ac.uk>
Message-ID: <320fb6e00910010832q4290259bh3b9cead501ebb6f5@mail.gmail.com>

James wrote:
> Peter wrote:
>> James wrote:
>>>
>>> * issue #2: The imagemap shown under the 'Features' tab is generated
>>> using bioperl from a genbank file emitted by biopython. This is a flaw,
>>> and means lots of info is lost (my biosql db is used to serve protein
>>> sequence DAS annotation, so it has URLs, scores, and lots of notes).
>>
>> That is a curious and round about way of doing things, with many
>> data transformations risking loosing things at each point.
>
> I can understand why it was done - if you already have an image renderer
> that eats genbank, its the shortest path :)

That could easily be the case.

>> It would be possible to use Biopython's GenomeDiagram module to
>> draw the image directly (although the style and capabilities would
>> differ). I've done this for an in house TurboGears based BioSQL
>> front end, and it was fine for prokaryotic organisms.
>
> Sounds good... I was sure there was a python way to go here. Happy
> to test any alternative you can provide ;)

Without me installing my own copy of Plone4Bio, I would at least need
to see a sample PNG image to try and mimic. However, from Ivan's
email it sounds like they need features we don't currently support.

>> Another more elegant alternative would be to call a BioPerl script which
>> talks to the BioSQL database directly to get the data to draw the image.
>
> definitely. It does incur the overhead of creating a new database connection
> and instantiating another object representation of the same biosql records,
> the latter isn't really a problem but the former could have scalabilty
> implications.

It does look like Plone already has a Biopython (DB)SeqRecord object
in memory, so yes, constructing the BioPerl equivalent from the database
might be a bit of a waste.

>> Can you point me at the relevant files in Plone4bio to see their code?
>> I agree with your general point that a pluggable rendering option might
>> be best, but that would be a question for the Plone4bio team to debate.
>
> The bioperl bits that generate images/maps for genbank files are here:
> https://www.plone4bio.org/trac/browser/plone4bio.base/trunk/src/plone4bio/base/png/perl
>
> The python that does the piping is here:
> https://www.plone4bio.org/trac/browser/plone4bio.base/trunk/src/plone4bio/base/png/seqrecord.py
>
> Looking at the code again, I can see that there are well defined interfaces
> - so in principle, plugging in other instances should be fairly easy.

I'm sure they could also spit out the database primary keys, and pass
that the BioPerl script which can use bioperl-db to talk to the database.
It may be that the current GenBank file route is faster though ;)

> My issues are here:
> Patch for genbank parser: https://www.plone4bio.org/trac/ticket/8

Could you attach an example of the problem GenBank files being
generated? Before we blame the BioPerl parser, we should check
that Biopython is producing n valid GenBank file. Off hand, I'm not
sure if feature types are allowed to have spaces in them for example.

James - how was your BioSQL database populated?

Peter


From biopython at maubp.freeserve.co.uk  Thu Oct  1 15:46:28 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 1 Oct 2009 16:46:28 +0100
Subject: [BioSQL-l] [P4b] Plone4bio 1.0 and BioSQL
In-Reply-To: <alpine.DEB.1.10.0910011647010.3730@gramsci.bo.biodec.com>
References: <alpine.DEB.1.10.0909151447330.5426@bendaldolum.bo.biodec.com>
	<4ABB4866.9060200@compbio.dundee.ac.uk>
	<4ABB6FDC.2020007@compbio.dundee.ac.uk>
	<4AC4AC8C.8070105@compbio.dundee.ac.uk>
	<320fb6e00910010738g7330855h54b5bf8785c6d1ec@mail.gmail.com>
	<alpine.DEB.1.10.0910011647010.3730@gramsci.bo.biodec.com>
Message-ID: <320fb6e00910010846n138b55acmbba881f1e59b80c8@mail.gmail.com>

On Thu, Oct 1, 2009 at 4:22 PM, Ivan Rossi <ivan at biodec.com> wrote:
>
> Hello Peter, happy that you are now on p4b too and not just many of us on
> biopython &;-)

Hi!

Looking at the Plone4bio.org website I was surprised to see no mention
of BioSQL, Biopython on the install page (while BioPerl is mentioned):
http://plone4bio.org/trac/wiki/Install

I did eventually find the Manifesto page, but think it could be a little
more prominent (or even merged into the main page?)
http://plone4bio.org/trac/wiki/Manifesto

>From the way you are using Biopython for GenBank output, I guess
you need at least Biopython 1.51 for the feature support, but other
than that I am not clear how extensively Biopython is used.

P.S. The Manifesto page has a broken link to Plone (first paragraph),
and officially it should be Biopython not BioPython.

> We plan to remove the Bioperl-graphics option at some time, since we
> already need biopython for many things, and we are aware it is somewhat a
> kludge. Furthermore a full python implementation will be well-integrated
> within Zope HOWEVER there are valid technical reasons for that, the main
> one being that Bioperl-graphics is VERY advanced compared, in particular it
> automatically handles clashes of features lines and text, and map support.
> (click on a feature line to show a feature summary). They were not
> available at the time we evaluated GenomeDiagram (at the time it was not
> even in the standard distribution but just within Biopython CVS). And
> clashes-handling is a VERY DESIRABLE FEATURE if you always want readable
> images when you have lots of features of the same kind.

For now, GenomeDiagram requires you to put features on different
tracks explicitly to avoid overlaps or clashes. Not ideal for your needs.

What did you mean by "map support"? ReportLab's trunk (i.e. pre 2.4)
has good SVG output, and I have been meaning to contribute basic
HTML image map output to them. Either of these can be used with
Biopython's GenomeDiagram and a tiny patch to make diagrams with
click-able features. This worked pretty well I found.

Peter


From michael.watson at bbsrc.ac.uk  Fri Oct  2 10:15:32 2009
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Fri, 2 Oct 2009 11:15:32 +0100
Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL
In-Reply-To: <4AC4AC8C.8070105@compbio.dundee.ac.uk>
References: <alpine.DEB.1.10.0909151447330.5426@bendaldolum.bo.biodec.com>
	<4ABB4866.9060200@compbio.dundee.ac.uk>
	<4ABB6FDC.2020007@compbio.dundee.ac.uk>
	<4AC4AC8C.8070105@compbio.dundee.ac.uk>
Message-ID: <8D08960C647E64438CE5740657CBBDC5010D2AC730@iahcexch1.iah.bbsrc.ac.uk>

Hi Jim

Thanks for that.  I think this has real potential, but I am luke warm about the sequence images - I am not sure I would need them in this context.

Could you expand on this but?

* issue #3: The search box doesn't search BioSQL datasources. No idea 
how hard this would be to fix, but a little plone knowledge probably 
required.

So the search box doesn't do the equivalent of a full text search of the BioSQL database?

Mick

-----Original Message-----
From: biosql-l-bounces at lists.open-bio.org [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of James Procter
Sent: 01 October 2009 14:20
To: BioSQL-l at lists.open-bio.org
Cc: Plone4Bio mailing list
Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL


Hello all.

Here's my review of plone4bio+Biosql. Thanks to Peter and Michael who 
sent me encouraging emails - sorry it took so long to post! Finally, 
please accept my apologies in advance for any unnecessary rambling... 
and for my cross-posting to p4bio and biosql-l.

Installing Plone4Bio
--------------------
This basically went according to the instructions, except for two issues:
  1. I experienced some problems accessing some python egg repositories,
and had to manually download and build one module before adding it to
the buildout (python build system) configuration. This was possibly
related to our local network config, since Ivan Rossi couldn't reproduce
the problem.

  2. Once the download/build/plone-instance generation steps were
finished the plone server instance that had been built took way too long
to launch. The installation was running off a directory hosted on our
SAN, and I decided the delay was probably due to the large number of
files needed by plone. I ended up moving the whole install onto a
locally attached disk to minimise the time spent statting the files on a
network. In that config, the server comes up after around 40-60 secs on
a lightly loaded Opteron.


Adding a biosql database and browsing
-------------------------------------
It was easy to add connections to a local biosql database - even for a
plone admin novice like myself. All you need is to know how to form the
appropriate python database connector URI - however, a minor patch to
the site's help text is needed to remind certain forgetful users (me)
how to put the database user's password in the ODBC (?) string.

Once added, I could access the source and browse through my bioentry
sequences via the same list interface as shown in the demo. Clicking on
a sequence link gave me the same five tabs (annotation, features,
dbxrefs, sequence, references) as in the demo. However, here is where I
noticed some issues which I've logged on the plone4bio trac:

* issue #1: Plone4bio uses the bioentry_id primary key as the main
identifier for the bioentry, rather than its accession. E.g. a
sequence's plon4bio record has the URL
http://myploneserver/plone4bio/mybiosqlsource/bioentry.database/bioentry.id

As people on the list will know, the bioentry ID primary key is
autogenerated and only really for internal consumption. Using it as the
primary identifier means it's not possible to link directly to a
sequence's page if you only know its bioentry database and accession.

* issue #2: The imagemap shown under the 'Features' tab is generated 
using bioperl from a genbank file emitted by biopython. This is a flaw, 
and means lots of info is lost (my biosql db is used to serve protein
sequence DAS annotation, so it has URLs, scores, and lots of notes).

I had to hack this script to cope with feature labels that contain
spaces in order for the intervals to display correctly (otherwise they
get a start of '-1'). I'd recommend that the image generator is modified
to use a less restrictive format, and/or made easily pluggable to allow
other feature renderers to be used (perhaps even something like dasty).

* issue #3: The search box doesn't search BioSQL datasources. No idea 
how hard this would be to fix, but a little plone knowledge probably 
required.

This was a bit of a killer for me - I was hoping for a basic search
interface that worked out of the box, allowing me to focus on providing
more advanced queries. As it is, I don't have the time at this moment to
fix this issue myself.

Suggested Enhancements
----------------------
The Biosql/GenBank data format transformation is an easily fixed bug in
the current plone4bio version, but it stopped me exploring the
das/biojava/bioperl/biopython interoperation issues any further.
However, it also revealed a few aspects of the plone4bio architecture
that might need thinking about:

  1. pluggable feature rendering tools - potentially use the biosql
connection directly (already said)
  2. easily configured database cross-reference linkout URLs. Typically,
its bad form to hard-code URLs within a biosql database, and plone4bio 
has its own set of URLs that it decorates dbxrefs with. However, these 
are currently buried inside the plone4bio python code, but they could be 
configured via a flatfile or even via the web interface.


In summary...
-------------
This process took far longer than I'd expected, and the slow install and
startup time gave me the impression that plone is a heavyweight solution
that may not have sufficient performance for high-volume situations (I'm 
sure I'm wrong here).

The functionality available at the time of writing is not enough for my 
purposes - but it is a good starting point (particularly if you know how 
to develop in plone). However - if issues 1,2 and 3 were resolved, and 
the default .cfg scripts were made more robust and slightly better 
commented for python-n00bs like myself, then plone4bio would certainly 
be worth installing to provide basic biosql datasource browsing for your 
lab or institute.

thats all folks!
Jim.

-- 
-------------------------------------------------------------------
J. B. Procter  (Jalview/ENFIN)  Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764  http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.

_______________________________________________
BioSQL-l mailing list
BioSQL-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biosql-l


From ivan at biodec.com  Tue Oct  6 14:33:36 2009
From: ivan at biodec.com (Ivan Rossi)
Date: Tue, 6 Oct 2009 16:33:36 +0200 (CEST)
Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL
In-Reply-To: <8D08960C647E64438CE5740657CBBDC5010D2AC730@iahcexch1.iah.bbsrc.ac.uk>
References: <alpine.DEB.1.10.0909151447330.5426@bendaldolum.bo.biodec.com>
	<4ABB4866.9060200@compbio.dundee.ac.uk>
	<4ABB6FDC.2020007@compbio.dundee.ac.uk>
	<4AC4AC8C.8070105@compbio.dundee.ac.uk>
	<8D08960C647E64438CE5740657CBBDC5010D2AC730@iahcexch1.iah.bbsrc.ac.uk>
Message-ID: <alpine.DEB.1.10.0910051530320.3721@gramsci.bo.biodec.com>

On Fri, 2 Oct 2009, michael watson (IAH-C) wrote:

> Hi Jim
>
> Thanks for that.  I think this has real potential, but I am luke warm 
> about the sequence images - I am not sure I would need them in this 
> context.

Then do not use them. &;-) You also have the hidden-able text tables below

But your comment gave me the opportunity to point out (someone asked it 
off-list)  that they are generated on-the-fly as needed, so that they do 
not eat unnecessary space on the server.

Nonetheless many people think that the feature images are really neat and 
useful. That's why you find this kind of images on genome browsers...

> Could you expand on this but?
>
> * issue #3: The search box doesn't search BioSQL datasources. No idea
> how hard this would be to fix, but a little plone knowledge probably
> required.
>
> So the search box doesn't do the equivalent of a full text search of the 
> BioSQL database?

I can assure you that IT DOES. try http://p4bdemo.biodec.com for yourself

I already told Jim off-list: if live-search does not work means that 
indexing died at some point. It appears that, at some point during 
indexing, Plone search engine has a relatively large need for RAM. To index 
human CDS entries from NCBI and human proteins from Uniprot on 
p4bdemo.biodec.com we used a virtual machine with 4GB of RAM, so I would 
say that seems to be the requirement to load a complete euchariotic genome. 
Could be less. BTW it also took a couple of hours, so do not expect 
immediate availability of livesearch when you add tens of thousands of 
sequences. On the contrary browsing is immediately available.

I am going to add this information to the Plone4bio wiki in the 
installation requirements.

Anyway I suggest that you test your install using something smaller, such 
as a bacterial proteome, to verify that everything is up and running, 
before attacking large databases.

More about indexing later, on p4b ML.

Ivan

>
> Mick
>
> -----Original Message-----
> From: biosql-l-bounces at lists.open-bio.org [mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of James Procter
> Sent: 01 October 2009 14:20
> To: BioSQL-l at lists.open-bio.org
> Cc: Plone4Bio mailing list
> Subject: [BioSQL-l] Plone4bio 1.0 and BioSQL
>
>
> Hello all.
>
> Here's my review of plone4bio+Biosql. Thanks to Peter and Michael who
> sent me encouraging emails - sorry it took so long to post! Finally,
> please accept my apologies in advance for any unnecessary rambling...
> and for my cross-posting to p4bio and biosql-l.
>
> Installing Plone4Bio
> --------------------
> This basically went according to the instructions, except for two issues:
>  1. I experienced some problems accessing some python egg repositories,
> and had to manually download and build one module before adding it to
> the buildout (python build system) configuration. This was possibly
> related to our local network config, since Ivan Rossi couldn't reproduce
> the problem.
>
>  2. Once the download/build/plone-instance generation steps were
> finished the plone server instance that had been built took way too long
> to launch. The installation was running off a directory hosted on our
> SAN, and I decided the delay was probably due to the large number of
> files needed by plone. I ended up moving the whole install onto a
> locally attached disk to minimise the time spent statting the files on a
> network. In that config, the server comes up after around 40-60 secs on
> a lightly loaded Opteron.
>
>
> Adding a biosql database and browsing
> -------------------------------------
> It was easy to add connections to a local biosql database - even for a
> plone admin novice like myself. All you need is to know how to form the
> appropriate python database connector URI - however, a minor patch to
> the site's help text is needed to remind certain forgetful users (me)
> how to put the database user's password in the ODBC (?) string.
>
> Once added, I could access the source and browse through my bioentry
> sequences via the same list interface as shown in the demo. Clicking on
> a sequence link gave me the same five tabs (annotation, features,
> dbxrefs, sequence, references) as in the demo. However, here is where I
> noticed some issues which I've logged on the plone4bio trac:
>
> * issue #1: Plone4bio uses the bioentry_id primary key as the main
> identifier for the bioentry, rather than its accession. E.g. a
> sequence's plon4bio record has the URL
> http://myploneserver/plone4bio/mybiosqlsource/bioentry.database/bioentry.id
>
> As people on the list will know, the bioentry ID primary key is
> autogenerated and only really for internal consumption. Using it as the
> primary identifier means it's not possible to link directly to a
> sequence's page if you only know its bioentry database and accession.
>
> * issue #2: The imagemap shown under the 'Features' tab is generated
> using bioperl from a genbank file emitted by biopython. This is a flaw,
> and means lots of info is lost (my biosql db is used to serve protein
> sequence DAS annotation, so it has URLs, scores, and lots of notes).
>
> I had to hack this script to cope with feature labels that contain
> spaces in order for the intervals to display correctly (otherwise they
> get a start of '-1'). I'd recommend that the image generator is modified
> to use a less restrictive format, and/or made easily pluggable to allow
> other feature renderers to be used (perhaps even something like dasty).
>
> * issue #3: The search box doesn't search BioSQL datasources. No idea
> how hard this would be to fix, but a little plone knowledge probably
> required.
>
> This was a bit of a killer for me - I was hoping for a basic search
> interface that worked out of the box, allowing me to focus on providing
> more advanced queries. As it is, I don't have the time at this moment to
> fix this issue myself.
>
> Suggested Enhancements
> ----------------------
> The Biosql/GenBank data format transformation is an easily fixed bug in
> the current plone4bio version, but it stopped me exploring the
> das/biojava/bioperl/biopython interoperation issues any further.
> However, it also revealed a few aspects of the plone4bio architecture
> that might need thinking about:
>
>  1. pluggable feature rendering tools - potentially use the biosql
> connection directly (already said)
>  2. easily configured database cross-reference linkout URLs. Typically,
> its bad form to hard-code URLs within a biosql database, and plone4bio
> has its own set of URLs that it decorates dbxrefs with. However, these
> are currently buried inside the plone4bio python code, but they could be
> configured via a flatfile or even via the web interface.
>
>
> In summary...
> -------------
> This process took far longer than I'd expected, and the slow install and
> startup time gave me the impression that plone is a heavyweight solution
> that may not have sufficient performance for high-volume situations (I'm
> sure I'm wrong here).
>
> The functionality available at the time of writing is not enough for my
> purposes - but it is a good starting point (particularly if you know how
> to develop in plone). However - if issues 1,2 and 3 were resolved, and
> the default .cfg scripts were made more robust and slightly better
> commented for python-n00bs like myself, then plone4bio would certainly
> be worth installing to provide basic biosql datasource browsing for your
> lab or institute.
>
> thats all folks!
> Jim.
>
> -- 
> -------------------------------------------------------------------
> J. B. Procter  (Jalview/ENFIN)  Barton Bioinformatics Research Group
> Phone/Fax:+44(0)1382 388734/345764  http://www.compbio.dundee.ac.uk
> The University of Dundee is a Scottish Registered Charity, No. SC015096.
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

--
Ivan Rossi, PhD - ivan AT biodec dot com, ivan dot rossi3 AT unibo dot it
BioDec Srl, Via Calzavecchio 20/2, 40033 Casalecchio di Reno (BO), Italy
Phone: (+39)-051-0548263 - Fax: (+39)-051-7459582 - http://www.biodec.com