[Biopython] parsing Entrez SNP XML files

Gerard Schaafsma Gerard.Schaafsma at med.lu.se
Fri Sep 6 07:38:33 UTC 2013


Hi,

I am trying to parse XML files which I downloaded from the NCBI site
(ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/XML/) containing
records from the SNP (dbSNP) database.

When I do:

import sys
from Bio import Entrez

handle = open(xmlFile)
records = Entrez.parse(handle)

for record in records:
  for k, v in record.items():
    print k, v

I get the following error message:

NotImplementedError: The Bio.Entrez parser cannot handle XML data that
make use of XML namespaces

I am using Biopython 1.62 on a PC with Linux 3.2.0-52-generic x86_64
GNU/Linux

Looking for this error message showed that it might have something to do
with the DTD files from NCBI, but since I am using the newest Biopython
version, I would expect these to be OK.

Moreover, in the first 2 lines of the XML file there is no mention of
any DTD file, just:

<?xml version="1.0" encoding="UTF-8"?>
<ExchangeSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.ncbi.nlm.nih.gov/SNP/docsum"
xsi:schemaLocation="http://www.ncbi.nlm.nih.gov/SNP/docsum
ftp://ftp.ncbi.nlm.nih.gov/snp/specs/docsum_3.4.xsd" specVersion="3.4"
dbSnpBuild="138" generated="2013-08-01 17:06">


Anyone with the same problem, and a solution?

Best regards,
Gerard


-- 
Gerard Schaafsma
Lund University
Department of Experimental Medical Science
Protein Structure and Bioinformatics Group
Hs 66, BMC D10
Box 117
22100 Lund
Sweden




More information about the Biopython mailing list