[Biopython] Issue with Bio.Entrez and protein in Biopython 1.60

Michiel de Hoon mjldehoon at yahoo.com
Sat Jan 26 22:28:29 EST 2013


Hi Cristian,

--- On Fri, 1/25/13, Cristian Alejandro Rojas <alejandro.0317 at gmail.com> wrote:
> I'm having a issue using Bio.Entrez to search a protein. I'm
> doing  this:
> 
> >>> handle=Entrez.esearch(db="protein",
> term="insulin AND homo")
> >>> record=Entrez.read(handle)
> Traceback (most recent call last):

It works for me now, so this may have been a temporary glitch at the E-Utilities:

>>> from Bio import Entrez
>>> handle=Entrez.esearch(db="protein", term="insulin AND homo")
>>> record = Entrez.read(handle)
>>> print record
{u'Count': '3956', u'RetMax': '20', u'IdList': ['443497968', '443497970', '443428106', '443428104', '443428107', '443428105', '83700231', '21361212', '4505143', '66472382', '443287675', '443287677', '419636284', '375298744', '341940804', '341940253', '332278248', '317373577', '317373571', '317373494'], u'TranslationStack': [{u'Count': '32050', u'Field': 'All Fields', u'Term': 'insulin[All Fields]', u'Explode': 'N'}, {u'Count': '0', u'Field': 'Organism', u'Term': '"Homo"[Organism]', u'Explode': 'N'}, {u'Count': '10253279', u'Field': 'All Fields', u'Term': 'homo[All Fields]', u'Explode': 'N'}, 'OR', 'GROUP', 'AND'], u'TranslationSet': [{u'To': '"Homo"[Organism] OR homo[All Fields]', u'From': 'homo'}], u'RetStart': '0', u'QueryTranslation': 'insulin[All Fields] AND ("Homo"[Organism] OR homo[All Fields])'}

> I'm having a issue with einfo() too, check at this:
> 
> >>> handler=Entrez.einfo(db="protein")
> >>> record=Entrez.read(handler)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File
> "/usr/lib/pymodules/python2.7/Bio/Entrez/__init__.py", line
> 351, in
> read
>     record = handler.read(handle)
>   File
> "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line
> 169, in
> read
>     self.parser.ParseFile(handle)
>   File
> "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line
> 285, in
> startElementHandler
>     raise ValidationError(name)
> Bio.Entrez.Parser.ValidationError: Failed to find tag
> 'Build' in the DTD.
> To skip all tags that are not represented in the DTD, please
> call
> Bio.Entrez.read or Bio.Entrez.parse with validate=False.
> 

This error message means exactly what it says. To see what the E-Utilities returns, try

>>> handle = Entrez.einfo(db="protein")
>>> print handle.read()
<?xml version="1.0"?>
<!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD eInfoResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eInfo_020511.dtd">
<eInfoResult>
	<DbInfo>
	<DbName>protein</DbName>
	<MenuName>Protein</MenuName>
	<Description>Protein sequence record</Description>
	<Build>Build130126-0031m.1</Build>
...

If you look at the DTD file at http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eInfo_020511.dtd, you'll see that "Build" is not mentioned anywhere. But it is present in the XML file. The error message tells you that the XML file is not consistent with its DTD, but that you can ignore such tags by using validate=False:

>>> handle = Entrez.einfo(db="protein")
>>> record = Entrez.read(handle, validate=False)
>>> print record
{u'DbInfo': {u'Count': '73259352', u'LastUpdate': '2013/01/26 08:18', u'MenuName': 'Protein', u'Description': 'Protein sequence record', u'LinkList': [{u'DbTo': 'bioproject', u'Menu': 'BioProject Links', u'Name': 'protein_bioproject', u'Description': 'Proteins related to BioProjects'}, {u'DbTo': 'biosystems', u'Menu': 'BioSystem Links', u'Name': 'protein_biosystems', u'Description': 'Pathways and other biosystems containing the current prot...

Best,
-Michiel.



More information about the Biopython mailing list