[Biopython] Entrez.einfo(db='pubmed') error

ming.xue at boehringer-ingelheim.com ming.xue at boehringer-ingelheim.com
Wed Nov 20 17:57:25 UTC 2013


Peter,

You are right and thanks for the quick help.

Ming Xue

-----Original Message-----
From: Peter Cock [mailto:p.j.a.cock at googlemail.com] 
Sent: Wednesday, November 20, 2013 12:39 PM
To: Xue,Ming (IS BP R&DM) BI-US-R
Cc: Biopython Mailing List
Subject: Re: [Biopython] Entrez.einfo(db='pubmed') error

On Wed, Nov 20, 2013 at 4:54 PM,  <ming.xue at boehringer-ingelheim.com> wrote:
> Hello,
>
> I am using python 2.7.3 and biopython 1.6.2 (1.6.3b had the same issue).
>
>>>> hd = Entrez.einfo(db='pubmed')
>>>> Entrez.read(hd)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "Bio/Entrez/__init__.py", line 367, in read
>     record = handler.read(handle)
>   File "Bio/Entrez/Parser.py", line 184, in read
>     self.parser.ParseFile(handle)
>   File "Bio/Entrez/Parser.py", line 300, in startElementHandler
>     raise ValidationError(name)
> Bio.Entrez.Parser.ValidationError: Failed to find tag 'DbBuild' in the DTD. To skip all tags that are not represented in the DTD, please call Bio.Entrez.read or Bio.Entrez.parse with validate=False.
>
>
>>>> Entrez.read(hd, validate=False)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "Bio/Entrez/__init__.py", line 367, in read
>     record = handler.read(handle)
>   File "Bio/Entrez/Parser.py", line 194, in read
>     raise NotXMLError(e)
> Bio.Entrez.Parser.NotXMLError: Failed to parse the XML data (syntax error: line 1, column 0). Please make sure that the input data are in XML format.

Hi Ming,

I think your mistake is trying to parse the *same* handle which has already been partly read from. This should work:

hd = Entrez.einfo(db='pubmed')
record = Entrez.read(hd, validate=False)
hd.close()

i.e. The problem is that the failed parsing attempt read (and threw away) the first part of the file (or maybe all the file).

With a file-based handle, you could do handle.seek(0) to return to the start - but network handles cannot be restarted like this.

Regards,

Peter




More information about the Biopython mailing list