[Biopython] Entrez.einfo(db='pubmed') error
Peter Cock
p.j.a.cock at googlemail.com
Wed Nov 20 17:38:31 UTC 2013
On Wed, Nov 20, 2013 at 4:54 PM, <ming.xue at boehringer-ingelheim.com> wrote:
> Hello,
>
> I am using python 2.7.3 and biopython 1.6.2 (1.6.3b had the same issue).
>
>>>> hd = Entrez.einfo(db='pubmed')
>>>> Entrez.read(hd)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "Bio/Entrez/__init__.py", line 367, in read
> record = handler.read(handle)
> File "Bio/Entrez/Parser.py", line 184, in read
> self.parser.ParseFile(handle)
> File "Bio/Entrez/Parser.py", line 300, in startElementHandler
> raise ValidationError(name)
> Bio.Entrez.Parser.ValidationError: Failed to find tag 'DbBuild' in the DTD. To skip all tags that are not represented in the DTD, please call Bio.Entrez.read or Bio.Entrez.parse with validate=False.
>
>
>>>> Entrez.read(hd, validate=False)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "Bio/Entrez/__init__.py", line 367, in read
> record = handler.read(handle)
> File "Bio/Entrez/Parser.py", line 194, in read
> raise NotXMLError(e)
> Bio.Entrez.Parser.NotXMLError: Failed to parse the XML data (syntax error: line 1, column 0). Please make sure that the input data are in XML format.
Hi Ming,
I think your mistake is trying to parse the *same* handle
which has already been partly read from. This should work:
hd = Entrez.einfo(db='pubmed')
record = Entrez.read(hd, validate=False)
hd.close()
i.e. The problem is that the failed parsing attempt read (and
threw away) the first part of the file (or maybe all the file).
With a file-based handle, you could do handle.seek(0) to
return to the start - but network handles cannot be
restarted like this.
Regards,
Peter
More information about the Biopython
mailing list