Bioperl: trying XML in biology...

Lincoln Stein lstein@cshl.org
Tue, 3 Aug 1999 12:16:44 -0400 (EDT)


I've just put together an xml2boulder converter (65 lines of code).
Here's what's nice about it.

Say you have an XML file that looks like this (real life example):

<contig>
 <acc>NT_001817</acc>
 <name>Hs1_1938</name>
 <chr>1</chr>
 <labs>SC</labs>
 <partslist>
   <part>
    <cpos1>0</cpos1>
    <cpos2>41014</cpos2>
    <acc>AL049198</acc>
    <seqlen>41015</seqlen>
    <clone>590K14</clone>
    <ctype>pac</ctype>
   </part>
   <part>
    <cpos1>40915</cpos1>
    <cpos2>125326</cpos2>
    <acc>AL033533</acc>
    <seqlen>84412</seqlen>
    <clone>973M2</clone>
   </part>
</contig>

Then after converting it into boulder format, you can access the bits
and pieces as methods, like so:

    $xml = Boulder::XML->new($xmlfile);
    $chromosome = $xml->chr;
    @parts = $xml->partslist->parts;
    for my $part (@parts) {
       my $accession = $part->acc;
    }

I haven't written the corresponding boulder2xml converter, but if
people are interested, I'll send out what I've got now for beta
testing and experimentation.

Lincoln

Catherine Letondal writes:
 > 
 > Hi,
 > 
 > We have made some tries in XML here at Pasteur (thanks to an XML course 
 > by S. Bortzmeyer). 
 > 
 > The main idea is to first convince authors of biologicial software how XML
 > makes it easy to parse a program's output, and that producing
 > XML output is easy as well (well, as long as you are the author of the code).
 > 
 > That's why this page: http://www-alt.pasteur.fr/~letondal/XML/
 > shows examples of softs developped locally at Pasteur, to which we have
 > added an '-x' option in order to have them produce an XML output (CDS, satellites).
 > 
 > It shows also an XMLized fasta output with a parser, just to show
 > that a parser in XML is very easy to write, and how it could lower the 
 > difficulty of parsing this kind of texts.
 > 
 > We have also tried to see what a genbank entry would be like in XML. This is
 > naive of course, for only the authors of databanks may decide about this format, but
 > these are mainly examples. P. Bouige has also written a converter from
 > Genbank to XML, that works with the DTD we have set up.
 > 
 > We have made also simple examples of more general objects, like Phylip trees, 
 > but I guess that such kinds of objects will be discussed at the Bioperl 
 > workshop.
 > 
 > 
 > -- 
 > Catherine Letondal -- letondal@pasteur.fr -- +33 (1) 40 61 31 91
 > =========== Bioperl Project Mailing List Message Footer =======
 > Project URL: http://bio.perl.org/
 > For info about how to (un)subscribe, where messages are archived, etc:
 > http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
 > ====================================================================

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein@cshl.org			                  Cold Spring Harbor, NY
========================================================================
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================