[Biopython] Issue with Bio.Entrez and protein in Biopython 1.60

Sun Jan 27 06:18:17 UTC 2013

Hi Michiel,

It was not a glitch at E-utilities, I had to download and compile 
Biopython from official web (previously was the Biopython from Ubuntu 
repositories). After of this it works perfectly.

Thank you

On 26/01/13 22:28, Michiel de Hoon wrote:
> Hi Cristian,
>
> --- On Fri, 1/25/13, Cristian Alejandro Rojas <alejandro.0317 at gmail.com> wrote:
>> I'm having a issue using Bio.Entrez to search a protein. I'm
>> doing  this:
>>
>>>>> handle=Entrez.esearch(db="protein",
>> term="insulin AND homo")
>>>>> record=Entrez.read(handle)
>> Traceback (most recent call last):
> It works for me now, so this may have been a temporary glitch at the E-Utilities:
>
>>>> from Bio import Entrez
>>>> handle=Entrez.esearch(db="protein", term="insulin AND homo")
>>>> record = Entrez.read(handle)
>>>> print record
> {u'Count': '3956', u'RetMax': '20', u'IdList': ['443497968', '443497970', '443428106', '443428104', '443428107', '443428105', '83700231', '21361212', '4505143', '66472382', '443287675', '443287677', '419636284', '375298744', '341940804', '341940253', '332278248', '317373577', '317373571', '317373494'], u'TranslationStack': [{u'Count': '32050', u'Field': 'All Fields', u'Term': 'insulin[All Fields]', u'Explode': 'N'}, {u'Count': '0', u'Field': 'Organism', u'Term': '"Homo"[Organism]', u'Explode': 'N'}, {u'Count': '10253279', u'Field': 'All Fields', u'Term': 'homo[All Fields]', u'Explode': 'N'}, 'OR', 'GROUP', 'AND'], u'TranslationSet': [{u'To': '"Homo"[Organism] OR homo[All Fields]', u'From': 'homo'}], u'RetStart': '0', u'QueryTranslation': 'insulin[All Fields] AND ("Homo"[Organism] OR homo[All Fields])'}
>
>> I'm having a issue with einfo() too, check at this:
>>
>>>>> handler=Entrez.einfo(db="protein")
>>>>> record=Entrez.read(handler)
>> Traceback (most recent call last):
>>    File "<stdin>", line 1, in <module>
>>    File
>> "/usr/lib/pymodules/python2.7/Bio/Entrez/__init__.py", line
>> 351, in
>> read
>>      record = handler.read(handle)
>>    File
>> "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line
>> 169, in
>> read
>>      self.parser.ParseFile(handle)
>>    File
>> "/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py", line
>> 285, in
>> startElementHandler
>>      raise ValidationError(name)
>> Bio.Entrez.Parser.ValidationError: Failed to find tag
>> 'Build' in the DTD.
>> To skip all tags that are not represented in the DTD, please
>> call
>> Bio.Entrez.read or Bio.Entrez.parse with validate=False.
>>
> This error message means exactly what it says. To see what the E-Utilities returns, try
>
>>>> handle = Entrez.einfo(db="protein")
>>>> print handle.read()
> <?xml version="1.0"?>
> <!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD eInfoResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eInfo_020511.dtd">
> <eInfoResult>
> 	<DbInfo>
> 	<DbName>protein</DbName>
> 	<MenuName>Protein</MenuName>
> 	<Description>Protein sequence record</Description>
> 	<Build>Build130126-0031m.1</Build>
> ...
>
> If you look at the DTD file at http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eInfo_020511.dtd, you'll see that "Build" is not mentioned anywhere. But it is present in the XML file. The error message tells you that the XML file is not consistent with its DTD, but that you can ignore such tags by using validate=False:
>
>>>> handle = Entrez.einfo(db="protein")
>>>> record = Entrez.read(handle, validate=False)
>>>> print record
> {u'DbInfo': {u'Count': '73259352', u'LastUpdate': '2013/01/26 08:18', u'MenuName': 'Protein', u'Description': 'Protein sequence record', u'LinkList': [{u'DbTo': 'bioproject', u'Menu': 'BioProject Links', u'Name': 'protein_bioproject', u'Description': 'Proteins related to BioProjects'}, {u'DbTo': 'biosystems', u'Menu': 'BioSystem Links', u'Name': 'protein_biosystems', u'Description': 'Pathways and other biosystems containing the current prot...
>
> Best,
> -Michiel.