[Bioperl-l] Graphics BSML module appropriate in BioPerl?

Charles Tilford charles.tilford@bms.com
Thu, 28 Mar 2002 14:43:04 -0500


Lincoln Stein wrote:
> BSML is supported and will round-trip BioPerl sequences quite well, but the
> BioPerl parser is a memory hog and rather slow.

Appologies for the performance issues of SeqIO::bsml - this was my
first XML project, and I chose to use XML::DOM for parsing, largely
because it allowed me to randomly access the tree. I agree with
Lincoln, it is slow and unwieldy. If I had more time, I'd redo it as a
SAX parser.

This is a good opportunity to drop a related question on the BioPerl
community - I have several large pieces of code that generate enhanced
BSML output. The modules utilize SeqIO::bsml to capture standard file
formats as BSML structures, then perform additional formatting for
nice display in the LabBook viewing software (the default display of
an unformatted sequence (such as generated by SeqIO::bsml) is pretty
scruffy).

We are using the code in-house to easily format database entries (that
get turned into BioPerl seq objects) into BSML for interactive
browsing. There are about 5 modules comprising 180kb of internally
documented Perl code, and I've gotten clearance to release them to the
public domain.

BioPerl seems like the most relevant place to put them, but there are
a few concerns:

First, I know that there is some leariness of including in BioPerl
modules that are associated more with display of data, as opposed to
storage and analysis of data.

Second, the primary formatting module (BsmlHelper) is built on a
custom validating XML writing module that I wrote specifically for
this task (ValWriter). This module is required for BsmlHelper (which
inherits ValWriter), but otherwise has nothing to do with BioPerl.

Third, as Lincoln has noticed with SeqIO::bsml, the process is quite
slow - this is because ValWriter inherits XML::DOM - this makes for
very flexible construction of the document, but it's also a real dog
for memory and performance issues (one of the reasons we finally
upgraded to 64bit Perl was to accomodate some of the larger BsmlHelper
jobs). [ If I was rewriting *this* module, I'd store the data
structures in an internal custom hash and construct the XML document
on the fly only as it was being sent for output. Time issues again...
]

So... where do people think this should go? For that matter, is anyone
even *interested* in this functionality? I can provide example BSML
files or PDF versions of the output if anyone is curious.

The basic easyview() method of BsmlHelper is very convienent for
making nice-looking BSML views (once you have a sequence object in
hand, it's about 5 lines of code and designation of a stylesheet).
There are multiple additional methods that allow for more
sophisticated control of image formatting. Much of what I do is
preprocessed, so the performance issue is not so critical. For small
input files (say 100kb) it's fast enough for most users on our system
(say 10 seconds).

-Charles

-- 
Charles Tilford, Bioinformatics-Applied Genomics
Bristol-Myers Squibb PRI, Hopewell 3A039
P.O. Box 5400, Princeton, NJ 08543-5400, (609) 818-3213
charles.tilford@bms.com