[Biopython-dev] NCBI DTD File

Lev Tsypin ltsypin at uchicago.edu
Wed Sep 9 15:28:06 UTC 2015


Hi Peter,

It seems that it was indeed a temporary error. Thanks for your help!

Best,
Lev

On Wed, Sep 9, 2015 at 4:50 AM, Peter Cock <p.j.a.cock at googlemail.com>
wrote:

> Hi Lev,
>
> Which version of Biopython do you have, and which GI number(s) fail?
>
> The very fact the problem tag was "Error" suggests it was actually
> an error message, not a sequence record - perhaps a temporary error?
>
> This worked for me:
>
> from Bio import Entrez
> Entrez.email = "..."
> handle = Entrez.efetch(db="protein", id="12345678", retmode="xml")
> record = Entrez.read(handle, validate=True)
> handle.close()
> print(record)
>
> Using some id values like "1" could give an "empty" XML record,
> which to me looks like an NCBI bug:
>
> <?xml version="1.0"?>
>  <!DOCTYPE GBSet PUBLIC "-//NCBI//NCBI GBSeq/EN"
> "http://www.ncbi.nlm.nih.gov/dtd/NCBI_GBSeq.dtd">
>  <GBSet>
>
> </GBSet>
>
> This is parsed as [] which is reasonable (empty list).
>
> Other values like "0" and "-1" give an HTTP Error 400: Bad Request
> (which is good - a nice clear and explicit error).
>
> See also:
>
> Peter
>
>
> On Fri, Sep 4, 2015 at 8:16 PM, Lev Tsypin <ltsypin at uchicago.edu> wrote:
> > Hi Peter,
> >
> > This is me trying to get protein sequences from the protein database. I
> have
> > a gi code in the variable 'gi' that I pass into the Entrez.efetch
> function.
> > Specifically, I use:
> >
> >         handle = Entrez.efetch(db='protein', id=gi, retmode='xml')
> >         record = Entrez.read(handle)
> >
> > Best,
> > Lev
> >
> > On Fri, Sep 4, 2015 at 11:12 AM, Peter Cock <p.j.a.cock at googlemail.com>
> > wrote:
> >>
> >> Hi Lev,
> >>
> >> Which database was this with? Each has somewhat different XML
> behaviour./
> >>
> >> The NCBI have been quite good about versioning the DTD files -
> >> normally they add new files rather than edit an existing DTD file. So
> >> unless you've had a warning from Biopython there should be no reason
> >> to download a new DTD file.
> >>
> >> Peter
> >>
> >> On Fri, Sep 4, 2015 at 3:44 PM, Lev Tsypin <ltsypin at uchicago.edu>
> wrote:
> >> > Hi all,
> >> >
> >> > I am encountering this error when using Bio.Entrez:
> >> >
> >> > Bio.Entrez.Parser.ValidationError: Failed to find tag 'Error' in the
> >> > DTD. To
> >> > skip all tags that are not represented in the DTD, please call
> >> > Bio.Entrez.read or Bio.Entrez.parse with validate=False.
> >> >
> >> > I've found a discussion of the same issue from about a year ago, so I
> >> > figure
> >> > the the NCBI updated their DTD file in a strange way. I found several
> >> > solutions: would you recommend that I download the new DTD file into
> my
> >> > local copy of Biopython or run Entrez.read with validate=False?
> >> >
> >> > Best regards,
> >> > Lev Tsypin
> >> >
> >> > _______________________________________________
> >> > Biopython-dev mailing list
> >> > Biopython-dev at mailman.open-bio.org
> >> > http://mailman.open-bio.org/mailman/listinfo/biopython-dev
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20150909/50a83fb1/attachment.html>


More information about the Biopython-dev mailing list