[Biopython-dev] XML parsing library for new modules

Peter Cock p.j.a.cock at googlemail.com
Mon May 4 08:15:04 EDT 2009


On Fri, May 1, 2009 at 1:28 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Eric;
> Thanks for summarizing the issues. I know Peter is taking a few well
> deserved days off but I suspect he will have some thoughts when he
> returns. We'd love to hear the experience of others who have used
> different python XML parsers.

I would be interested to hear Michiel's views on this, as he knows
more about the specifics of the existing XML parsers in Biopython
(e.g. Bio.Entrez).

> My lean is towards ElementTree for reasons of code clarity. SAX
> parsers require a lot of boilerplate style code. They also can be
> tricky with nested elements; I always find myself using a lot of "if
> in_tag; else if in_tag" style code. ElementTree eliminates a lot of
> these issues which should result in easier to maintain code.

We have been trying to avoid external library dependencies where
possible (moving away from Martel for parsing has really helped here).
Given ElementTree and cElementTree are included with Python 2.5+,
this is only an issue for Biopython running on Python 2.4.  Both
ElementTree and cElementTree are available as separate downloads
(with Windows installers).  I think under their licence we could even
bundle it with Biopython if need be.

So, while it is a shame ElementTree isn't part of Python 2.4, if it is
the best technical solution, that shouldn't stop us from using it.  Note
we should ONLY use those core features which are included with
Python 2.5+ inself.

Peter

P.S. I wonder if our BLAST XML parser would get a big speed boost
if we switched it to ElementTree instead of xml.sax?


More information about the Biopython-dev mailing list