[Biopython] processing XML files in Biopython

Peter Cock p.j.a.cock at googlemail.com
Tue Jun 7 07:47:27 UTC 2011


On Tue, Jun 7, 2011 at 3:24 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> --- On Mon, 6/6/11, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> > And, if that is correct, what is the advantage of
>> > using Bio.Entrez.parse
>> > over using another Python XML lib?
>>
>> If you're not scared of XML, not much.
>>
> That is a misconception, to say the least.
> Bio.Entrez parses the DTD associated with the XML file, and
> is therefore able to store the information in the XML file as a
> Python object in a sensible way. In addition, Bio.Entrez.parse
> can handle multi-gigabyte XML files (such as the ones from
> the Entrez Gene database). I'd like to see you do that with
> another Python XML lib.

I was probably being too glib. My point was if you are already
experienced with another Python XML lib, you may find it more
productive to use that. The particular case where you only want
to pull out one or two fields is an interesting one, because here
there is no need to parse all the other data into objects in
memory.

Peter



More information about the Biopython mailing list