Bioperl: XML/BioPerl

Philip Bourne bourne@sdsc.edu
Fri, 1 Jan 1999 08:31:52 -0800 (PST)


Happy New Year...

The use of XML is compelling, however, one thing we have learned as we
prepare to support the PDB starting in Feb. 99 is the diversity of
hardware and software used to access PDB data. It will be some time before
a large percent of the PDB community are using browsers with XML support.
This is to our advantage since as Steve points out we are trying to get a
lot of basic tasks done right now. However, we must be ready to introduce
new technologies at the right time. To this end I have had a rotation
student - Peng Yang - working with XML. In collaboration with David
Goodsell at Scripps we have been looking at making David's beautiful text
and graphical descriptions of various protein families (see for
example www.amazon.com under author Goodsell) available as part
of a PDB Web query. This is primarily aimed at an audience who find the
direct link to the Medline abstract too detailed and need more of an
overview. Using a library from IBM (I think) Peng has written a prototype
XML browser which allows you to browse initial portions of the book
material (using a preliminary protein-based DTD) and from there make
direct queries of a PDB database. This preliminary work is the subject of
a grant application and Peng has indicated he wants to make it a thesis
project.

In my preliminary excursions into XML I have been struck by the ease with
which the mmCIF dictionary could be mapped into a DTD (I could be being
naive here and need to explore further). mmCIF supports multiple
biologically relevant descriptions. While mmCIF forms the basis of our
conceptual schema for the new PDB, the user can be presented with
different views of the data, including, of course, files conforming to PDB
v2.2 format. XML (based on a rigorous DTD) could potentially be another
view. John Westbrook and I with other folks in the RCSB
(http://www.rcsb.org) will discuss this further and help in any way that
time permits and which makes sense to the community. After Feb. 99 when
data are flowing via the RCSB to the community (including BNL) hopefully
we will be in a better position to make some firm plans. In the meantime
please include us in this dialog.

Cheers Phil Bourne
for the RCSB
 
On Thu, 31 Dec 1998, Steven E. Brenner wrote:

> 
> [Note to Readers on cc list: there has recently been considerable
> discussion of XML on the bioperl mailing list (see footer for details),
> and the following issue is coming up.]
> 
> > 
> > PDB is moving to San Diego, and the people involved have been quite devoted 
> > to CIF, but again, I have not heard anything specific.  There is a real 
> > need for a biological entities (as opposed to crystallographic unit cells) 
> > view of PDB, and XML would be a great place to start.
> > 
> > David States
> > 
> 
> I have spoken with Helen Berman and John Westbrook (of the new PDB)
> regarding using XML for biological macromolecules.  I cannot speak for
> them; however, they seem very open to the possibility of providing XML
> versions of the data.
> 
> As they point out, the hard problem is to define the data structures and
> collect data which reliably conforms to it.  Once that's done, producing
> reports in different formats is comparatively easy.  While I think they
> are currently taking a wait-and-see attitude to XML (perhaps because they
> are heavily overburdened with other projects right now -- they are trying
> to do an immense amount of work with a limited staff), I think that if XML
> does become established, we will see PDB providing data (in the
> crystallographic view) in that format.
> 
> I do not know of any plans to provide a "biological" view.  However, my
> impression is that the PDB is very open to suggestions from the community.
> 
> 
> Steven Brenner
> 
> 


=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================