[Bioperl-l] XML parser preference?

Chris Fields cjfields at uiuc.edu
Thu Aug 10 02:14:59 UTC 2006


Mauricio,

Sorry, didn't mean to imply I want to use XML::LibXML.  Just  
indicating that what most Perl-XML users who use DOM-like parsing  
seem to migrate towards XML::LibXML(not in Bioperl) or XML::Twig  
(which Bioperl uses) for large files and XML::Simple for small  
stuff.  Seems fewer people use XML::DOM these days.

XML::Twig is nice because I can process 'chunks' of XML at a time,  
but it may be overkill with some of the smaller XML data returned  
from NCBI via eutils.  I'll need to tax EUtilities to try and  
maximize the returned XML to get an idea of just how much XML data is  
returned for esearch/elink (epost XML is always very small, so no  
worries there).

XML::Simple and XML::Twig are both available for pretty much all OS's  
(Win, *nix) so I'll stick with one of those.  I was actually quite  
surprised that XML::Simple isn't used anywhere in Bioperl.  It's very  
easy to use and utilizes XML::SAX or XML::Parser on the back end, so  
having expat around speeds things up quite a bit.

Chris

On Aug 9, 2006, at 7:11 PM, Mauricio Herrera Cuadra wrote:

> Robert & Chris,
>
> I have no doubt that XML::LibXML is a great parser (I've used it a  
> few times), the problem with it is that it runs on top of libxml2's  
> C library. On *nix systems it's fairly simple to have this  
> dependency compiled and running, but what about having it under  
> other OS's (e.g. Windows)?
>
> Introducing XML::LibXML as a dependency into the toolkit will  
> probably place EUtilities as a module not usable by everyone,  
> especially those who use BioPerl in a OS where installing/compiling  
> C dependencies can be a headache.
>
> Mauricio.
>
> Chris Fields wrote:
>> Rob, There seems to be a general shift away from using the older  
>> XML::Parser and
>> XML::DOM parsers towards XML::SAX and XML::Twig as the former two  
>> are not
>> under active development.  For SAX parsing, we seem to be moving  
>> in the
>> direction of XML::SAX (the recent transition of SearchIO::blastxml  
>> was the
>> start).  However, nothing has been done for tree-like (DOM) parsing.
>> In fact, both the XML::DOM and XML::Twig docs recommend  
>> XML::LibXML over
>> XML::DOM.  However, XML::LibXML isn't used AFAIK in Bioperl, and I  
>> think
>> it's more of a burden to use that.
>> Grr...I wish I had checked bioperl dependencies before I started!   
>> Chris
>>> -----Original Message-----
>>> From: Robert Buels [mailto:rmb32 at cornell.edu]
>>> Sent: Wednesday, August 09, 2006 5:40 PM
>>> To: Chris Fields
>>> Cc: bioperl-l at lists.open-bio.org
>>> Subject: Re: [Bioperl-l] XML parser preference?
>>>
>>> I don't think it really matters. Every parser has its own strengths.
>>>
>>> If you've written something that already works well, but are  
>>> concerned
>>> about adding yet another XML parser to bioperl's external  
>>> dependencies,
>>> pick a parser that is a.) already being used somewhere else in  
>>> bioperl
>>> and b.) requires the fewest changes to your already-working code.
>>>
>>> Since you're already using XML::Simple, which is basically a DOM  
>>> parser,
>>> I would say go with another DOM parser that's already being used in
>>> bioperl. How about XML::DOM?
>>>
>>> Rob
>>>
>>> Chris Fields wrote:
>>>> All,
>>>>
>>>> I am finishing up the EUtilities modules in bioperl-live.  I'm  
>>>> using
>>>> XML::Simple to grab the IDs and other information from XML  
>>>> returned from
>>>> NCBI via esearch/elink/epost queries, but I noticed that no other
>>> Bioperl
>>>> modules use this particular module.
>>>>
>>>> It comes with ActiveState Perl by default (the reason I use it)  
>>>> but I
>>> found,
>>>> after the fact, other perl distributions do not include this  
>>>> (Mac OS X
>>> was
>>>> one).  I don't necessarily want to lump another XML parser  
>>>> requirement
>>> for
>>>> bioperl users on top of the four or so already present, so I'm
>>> considering
>>>> changing.
>>>>
>>>> I have a preference for SAX (hehe) but XML::Twig might also be an
>>> option.
>>>> Any thoughts?
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher - Switzer Lab
>>>> Dept. of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> Robert Buels
>>> SGN Bioinformatics Analyst
>>> 252A Emerson Hall, Cornell University
>>> Ithaca, NY  14853
>>> Tel: 503-889-8539
>>> rmb32 at cornell.edu
>>> http://www.sgn.cornell.edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Genética
> Unidad de Morfofisiología y Función
> Facultad de Estudios Superiores Iztacala, UNAM
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign







More information about the Bioperl-l mailing list