[Biopython] parsing Entrez SNP XML files

Michiel de Hoon mjldehoon at yahoo.com
Fri Sep 6 11:37:52 UTC 2013


It's a bit more complicated than that. Bio.Entrez can parse XML files that come with a DTD, which is the vast majority of XML files from NCBI Entrez. Apparently the dbSNP database uses an XML Schema instead of a DTD, so Bio.Entrez would need a parser for an XML Schema to be able to parse XML files from dbSNP. I won't be able to look into this, but any volunteers are strongly encouraged.

Best,
-Michiel.




________________________________
 From: Peter Cock <p.j.a.cock at googlemail.com>
To: Gerard Schaafsma <Gerard.Schaafsma at med.lu.se> 
Cc: Biopython Mailing List <biopython at lists.open-bio.org> 
Sent: Friday, September 6, 2013 5:42 PM
Subject: Re: [Biopython] parsing Entrez SNP XML files
 

On Fri, Sep 6, 2013 at 8:38 AM, Gerard Schaafsma
<Gerard.Schaafsma at med.lu.se> wrote:
> Hi,
>
> I am trying to parse XML files which I downloaded from the NCBI site
> (ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/XML/) containing
> records from the SNP (dbSNP) database.
>
> When I do:
>
> import sys
> from Bio import Entrez
>
> handle = open(xmlFile)
> records = Entrez.parse(handle)
>
> for record in records:
>   for k, v in record.items():
>     print k, v
>
> I get the following error message:
>
> NotImplementedError: The Bio.Entrez parser cannot handle XML data that
> make use of XML namespaces

Yes, sadly unlike most of the NCBI XML files, for dbSNP they don't
provide a DTD file describing the object model, and the Bio.Entrez
parser requires that:

http://bugzilla.open-bio.org/show_bug.cgi?id=2771

Unless the NCBI change this, you will have to use an alternative
XML parser - Python comes with several including ElementTree
which is quite popular.

Peter
_______________________________________________
Biopython mailing list  -  Biopython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython



More information about the Biopython mailing list