[Biopython-dev] Bio.Entrez XML parsing

Sean Davis sdavis2 at mail.nih.gov
Mon Mar 31 00:51:07 UTC 2008


On Sun, Mar 30, 2008 at 10:49 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
>  > Eric, could you attach your taxonomy XML code to this bug?
>  > We'd probably want to start by adding taxonomy XML parsing
>  > to Bio.Entrez (which I assume you are using to fetch the XML data).
>
>  I've done some thinking about XML parsers for Bio.Entrez.
>
>  I propose to add a function read() to Bio.Entrez, which returns a record suitable for the type of XML file we're trying to read (as determined by the corresponding DTD file).
>
>  Now, the various XML types can be very different from each other, and I think the actual parsing should be done by a specialized submodule of Bio.Entrez. For example, one Bio.Entrez.EInfo, one Bio.Entrez.ESummary, and so on. For Bio.Entrez.EFetch, there seem to be many different XMLs, so we'd probably have a number of submodules for it (one of them for the taxonomy XML).
>
>  The first tag received by the read() function in Bio.Entrez tells it which type of XML it is receiving (have a look at the XML files shown in chapter 6 of the tutorial for some examples), and can then decide which of the submodules of Bio.Entrez should be used for the actual parsing. Otherwise, the read() function in Bio.Entrez does very little; the actual work is done by the submodules.
>
>  If the read() function encounters an XML type for which no parser is yet available, it can raise a NotImplementedError exception.
>
>  Comments, anybody?

This makes sense.  However, it seems that there needs to be a way to
"register" a parser with read() so that users can extend their local
installation with a specialized parser.  In other words, it seems that
a way to dynamically register a parser with read() would be helpful.
Or am I missing something?

Sean



More information about the Biopython-dev mailing list