[Biopython] parsing Entrez SNP XML files

Peter Cock p.j.a.cock at googlemail.com
Fri Sep 6 08:42:22 UTC 2013


On Fri, Sep 6, 2013 at 8:38 AM, Gerard Schaafsma
<Gerard.Schaafsma at med.lu.se> wrote:
> Hi,
>
> I am trying to parse XML files which I downloaded from the NCBI site
> (ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/XML/) containing
> records from the SNP (dbSNP) database.
>
> When I do:
>
> import sys
> from Bio import Entrez
>
> handle = open(xmlFile)
> records = Entrez.parse(handle)
>
> for record in records:
>   for k, v in record.items():
>     print k, v
>
> I get the following error message:
>
> NotImplementedError: The Bio.Entrez parser cannot handle XML data that
> make use of XML namespaces

Yes, sadly unlike most of the NCBI XML files, for dbSNP they don't
provide a DTD file describing the object model, and the Bio.Entrez
parser requires that:

http://bugzilla.open-bio.org/show_bug.cgi?id=2771

Unless the NCBI change this, you will have to use an alternative
XML parser - Python comes with several including ElementTree
which is quite popular.

Peter



More information about the Biopython mailing list