[Biopython] Entrez.einfo(db='pubmed') error

Peter Cock p.j.a.cock at googlemail.com
Wed Nov 20 17:38:31 UTC 2013


On Wed, Nov 20, 2013 at 4:54 PM,  <ming.xue at boehringer-ingelheim.com> wrote:
> Hello,
>
> I am using python 2.7.3 and biopython 1.6.2 (1.6.3b had the same issue).
>
>>>> hd = Entrez.einfo(db='pubmed')
>>>> Entrez.read(hd)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "Bio/Entrez/__init__.py", line 367, in read
>     record = handler.read(handle)
>   File "Bio/Entrez/Parser.py", line 184, in read
>     self.parser.ParseFile(handle)
>   File "Bio/Entrez/Parser.py", line 300, in startElementHandler
>     raise ValidationError(name)
> Bio.Entrez.Parser.ValidationError: Failed to find tag 'DbBuild' in the DTD. To skip all tags that are not represented in the DTD, please call Bio.Entrez.read or Bio.Entrez.parse with validate=False.
>
>
>>>> Entrez.read(hd, validate=False)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "Bio/Entrez/__init__.py", line 367, in read
>     record = handler.read(handle)
>   File "Bio/Entrez/Parser.py", line 194, in read
>     raise NotXMLError(e)
> Bio.Entrez.Parser.NotXMLError: Failed to parse the XML data (syntax error: line 1, column 0). Please make sure that the input data are in XML format.

Hi Ming,

I think your mistake is trying to parse the *same* handle
which has already been partly read from. This should work:

hd = Entrez.einfo(db='pubmed')
record = Entrez.read(hd, validate=False)
hd.close()

i.e. The problem is that the failed parsing attempt read (and
threw away) the first part of the file (or maybe all the file).

With a file-based handle, you could do handle.seek(0) to
return to the start - but network handles cannot be
restarted like this.

Regards,

Peter



More information about the Biopython mailing list