[Biopython] processing XML files in Biopython

Peter Cock p.j.a.cock at googlemail.com
Mon Jun 6 14:37:53 UTC 2011


On Mon, Jun 6, 2011 at 3:30 PM, Reece Hart <reece at harts.net> wrote:
> On Mon, Jun 6, 2011 at 6:35 AM, Peter Cock wrote:
>>
>> If you want to use the XML, then the Bio.Entrez.parse() function should
>> turn it into a nested structure of Python objects (dicts and lists). Or,
>> there are several built in XML parsers that come with Python, such
>> as ElementTree. That could be more efficient if you just wanted to
>> get one or two bits of information like a GeneID.
>
> In addition, the Bio.Entrez parser is not namespace-aware and therefore
> won't parse some NCBI XML at all (e.g., downloaded dbSNP files). Can
> someone with more experience here please corroborate?

See http://bugzilla.open-bio.org/show_bug.cgi?id=2771 for dbSNP.
Do you have any other problem databases with Entrez XML?

> And, if that is correct, what is the advantage of using Bio.Entrez.parse
> over using another Python XML lib?

If you're not scared of XML, not much.

Peter




More information about the Biopython mailing list