[Biopython] Eftech and db='bioproject'... DTD problem?

Nicolas Joannin nicolas.joannin at gmail.com
Tue Jun 18 10:41:38 UTC 2013


Hi again,

I got a reply from Scott at NCBI:

"Yes this is the "normal" but it is an oversight as a dtd was never created
for this database. I will have to open a ticket to the developers to create
this and have it included in the XML and on the DTD web page."

Hopefully it will be updated soon!
Best regards,

Nicolas



Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan



On Thu, Jun 13, 2013 at 9:45 PM, Nicolas Joannin
<nicolas.joannin at gmail.com>wrote:

> Hi Michiel,
>
> Thanks for the suggestion. Will do so and post any response I'll get!
>
> Best regards,
> Nicolas
>
>
>
>
> On Thu, Jun 13, 2013 at 12:32 AM, Michiel de Hoon <mjldehoon at yahoo.com>wrote:
>
>> The XML returned by Entrez for this query starts like this:
>> <?xml version="1.0"?>
>> <RecordSet><DocumentSummary>
>>     <Project>
>>         <ProjectID>
>> ...
>> so it does not contain any information regarding the relevant DTD needed
>> to parse this XML.
>> I would suggest to check with NCBI to find out what to appropriate way is
>> to access the bioproject database through Entrez.
>>
>> Best,
>> -Michiel.
>>
>>
>>   ------------------------------
>>  *From:* Nicolas Joannin <nicolas.joannin at gmail.com>
>> *To:* Biopython Mailing List <biopython at lists.open-bio.org>
>> *Sent:* Wednesday, June 12, 2013 12:04 PM
>> *Subject:* [Biopython] Eftech and db='bioproject'... DTD problem?
>>
>> Hello,
>>
>> I'm trying to use Entrez.efetch to retrieve info about a BioProject.
>> However, I get DTD error (see ouput below).
>> Using "validate=False" avoids the error, but results in an empty string
>> output..
>>
>> Any idea as to how I can read BioProject data from Entrez?
>>
>> Best regards,
>> Nicolas
>>
>> Example output:
>>
>> >>> h=Entrez.efetch(db='bioproject', id='20431')
>> >>> p=Entrez.read(h)
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>   File
>>
>> "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/Bio/Entrez/__init__.py",
>> line 368, in read
>>     record = handler.read(handle)
>>   File
>>
>> "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/Bio/Entrez/Parser.py",
>> line 184, in read
>>     self.parser.ParseFile(handle)
>>   File
>>
>> "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/Bio/Entrez/Parser.py",
>> line 300, in startElementHandler
>>     raise ValidationError(name)
>> Bio.Entrez.Parser.ValidationError: Failed to find tag 'RecordSet' in the
>> DTD. To skip all tags that are not represented in the DTD, please call
>> Bio.Entrez.read or Bio.Entrez.parse with validate=False.
>>
>>
>> Nicolas Joannin, Ph.D.
>> Bioinformatics Center
>> Kyoto University, Uji campus, Japan
>> _______________________________________________
>> Biopython mailing list  -  Biopython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
>>
>>
>>
>



More information about the Biopython mailing list