Bioperl: XML/BioPerl - design proposal

Wayne Parrott wayne@workingobjects.com
Wed, 20 Jan 1999 16:55:07 -0600


BioPerl'ers:

I occasionally lurk on the periphery of this mail-list mainly because my
work is mostly Java specific. I've been head-down since just before
Christmas and missed the XML thread. Much of my work is consulting-based
and I have recently developed a couple of frameworks for a client to
integrate XML data and XML adapters into their their drug discovery
system. One of the recent post in this thread caught my attention. It
was regarding a plan to transform application output into XML, such as
Blast. I agree this is the way to go. But before "just going and doing
it" allow me to propose a design model for consideration.

Following is the design for a couple of Java frameworks I've recently
developed: the Blast Parsing Framework and the BlastXML SDK. One of my
key design goals was to maximize reuse. Instead of doing a straight
Blast->XML transformer, I created the Blast Parsing Framework to provide
basic Blast parsing and element handling functionality. Its at this
level that I write application-specific document processing logic. I
then reused the Blast Parsing Framework by expending it to build
BlastXML SDK. BlastXML SDK includes a Blast->XML transformer. 


Blast Parsing Framework Design

The Blast Parsing Framework (BPF) is designed to simplify the processing
of multiple forms of Blast results. I modeled the BPF after SAX, i.e.,
separate the parsing/recognition responsibility from the handling of
parsed elements. The modularity of the framework allows different
handlers and parsers to be plugged in. Below is a class diagram for the
framework (in reality I've abstract much of the details into another
framework known as the Abstract Parsing Framework for use in other
contexts; it's not shown here). Applications which use the BPF simply
implement a IBlastElementHandler class that processes the parsed
elements according to its requirements. The Blast2XML handler is really
part of the BlastXML SDK. Not shown is the BlastObjectModel that holds
an in-memory representation of the blast data. Also not shown is the
BlastResultVisitor, an implementor of IBlastElementProducer, that is
able to convert all or parts of the Blast Object Model into a series of
parsed elements similar to the parser. The BlastObjectModelVisitor can
be used to serialize the object model. This is a useful framework
independent of XML.

                         sends ->
   IBlastElement   parsedElements to
      Producer ----------------------- IBlastElementHandler
          ^                                     ^
          |exends                               | implements
          |                     ---------------------------------
     IBlastParser              |          |            |         |
          ^                Blast2HTML     |       DrugDiscovery  |
          |implements                     |       Knowledgebase  |
          |                     BlastObjectModelBuilder  
BlastXML::Blast2XML
       ----------    
      |          |       
 NCBIBlastParser |
            LocalBlastParser 
        



BlastXML SDK Design

BlastXML SDK is used to create and processing BlastXML results. The SDK
includes the Blast2XML translator depicted in the BPF class diagram, and
an adapter for reusing BPF handlers. The BPF's parser is used to
transform a Blast report into a series of BPF parsed elements that are
sent to a Blast2XML handler. The Blast2XML handler transforms the
element data into a BlastXML document. The SDK uses a SAX compliant XML
parser, e.g., IBM's xml4j or Sun's Project X parser, for recognizing and
validating a BlastXML document. An implementation of SAX::DocHandler
known as the BPFAdapter class is used to convert BlastXML elements and
parsed element notifications to its BPF::IBlastElementHandlers. Thus,
all of the IBlastElementHandlers depicted in the BPF class diagram are
reusable to BlastXML. Not shown is the BlastResultXMLPrintVisitor. This
class is used to serialize a Blast Object Model as a BlastXML document.


             sends BlastXML ->
               elements to
 SAX::Parser ---------------- SAX::DocHandler
                                      ^
                                      | implements
                                      |
                                  BPFAdapter
                                      |
                                      | sends BPF parsed
                                      |  elements to 
                                      |              
                          BPF::IBlastElementHandler



I'm considering porting the frameworks to Perl but thought I would get
in sync with the group's activities before "just doing it". I'm new to
Perl but have quite a bit of experience with loosely typed languages and
scripting concepts. I'm planning to familiarize myself with the bioPerl
tools and components before starting. Anybody care to comment on the
design and/or how elements of bioPerl may be used to realize such a
design.

By the way interested parties may download the Java frameworks from my
website. I'm in the process of adding a BlastXML server to my site. The
server should be on-line by next Monday. The BlastXML DTD is located at
www.workingobjects.com/blastxml/blastxml.dtd.

Wayne
-- 
-----------------------------------------------------------------------
 Wayne Parrott                   email: wayne@workingobjects.com      |
 WorkingObjects.com              voice: (972)491-3704                 |
 "Distributed Object Technology    fax: (972)491-7284                 |
   for Life Sciences"              web: http://www.workingobjects.com |
----------------------------------------------------------------------- 
 "The main thing, is to keep the main thing, the main thing" 
   lyrics by Scott Krippayne
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================