[Biopython-dev] NCBI DTD File

Peter Cock p.j.a.cock at googlemail.com
Wed Sep 9 09:50:02 UTC 2015


Hi Lev,

Which version of Biopython do you have, and which GI number(s) fail?

The very fact the problem tag was "Error" suggests it was actually
an error message, not a sequence record - perhaps a temporary error?

This worked for me:

from Bio import Entrez
Entrez.email = "..."
handle = Entrez.efetch(db="protein", id="12345678", retmode="xml")
record = Entrez.read(handle, validate=True)
handle.close()
print(record)

Using some id values like "1" could give an "empty" XML record,
which to me looks like an NCBI bug:

<?xml version="1.0"?>
 <!DOCTYPE GBSet PUBLIC "-//NCBI//NCBI GBSeq/EN"
"http://www.ncbi.nlm.nih.gov/dtd/NCBI_GBSeq.dtd">
 <GBSet>

</GBSet>

This is parsed as [] which is reasonable (empty list).

Other values like "0" and "-1" give an HTTP Error 400: Bad Request
(which is good - a nice clear and explicit error).

See also:

Peter


On Fri, Sep 4, 2015 at 8:16 PM, Lev Tsypin <ltsypin at uchicago.edu> wrote:
> Hi Peter,
>
> This is me trying to get protein sequences from the protein database. I have
> a gi code in the variable 'gi' that I pass into the Entrez.efetch function.
> Specifically, I use:
>
>         handle = Entrez.efetch(db='protein', id=gi, retmode='xml')
>         record = Entrez.read(handle)
>
> Best,
> Lev
>
> On Fri, Sep 4, 2015 at 11:12 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>> Hi Lev,
>>
>> Which database was this with? Each has somewhat different XML behaviour./
>>
>> The NCBI have been quite good about versioning the DTD files -
>> normally they add new files rather than edit an existing DTD file. So
>> unless you've had a warning from Biopython there should be no reason
>> to download a new DTD file.
>>
>> Peter
>>
>> On Fri, Sep 4, 2015 at 3:44 PM, Lev Tsypin <ltsypin at uchicago.edu> wrote:
>> > Hi all,
>> >
>> > I am encountering this error when using Bio.Entrez:
>> >
>> > Bio.Entrez.Parser.ValidationError: Failed to find tag 'Error' in the
>> > DTD. To
>> > skip all tags that are not represented in the DTD, please call
>> > Bio.Entrez.read or Bio.Entrez.parse with validate=False.
>> >
>> > I've found a discussion of the same issue from about a year ago, so I
>> > figure
>> > the the NCBI updated their DTD file in a strange way. I found several
>> > solutions: would you recommend that I download the new DTD file into my
>> > local copy of Biopython or run Entrez.read with validate=False?
>> >
>> > Best regards,
>> > Lev Tsypin
>> >
>> > _______________________________________________
>> > Biopython-dev mailing list
>> > Biopython-dev at mailman.open-bio.org
>> > http://mailman.open-bio.org/mailman/listinfo/biopython-dev
>
>


More information about the Biopython-dev mailing list