[BioPython] BLAST XML problem?

Iddo Friedberg idoerg at burnham.org
Wed Jan 11 12:08:01 EST 2006


Peter wrote:

> Iddo Friedberg wrote:
>
>> Slight correction to my previous email: using biopython from CVS, and 
>> python 2.3 as you can see from the stack dump
>>
>> Iddo Friedberg wrote:
>>
>>> Not sure what we're doing wrong here...
>>>
>>> Using the cookbook example, biopython 1.41, python 2.2 (our Zope 
>>> needs that Python version, sorry):
>>>
>>> from Bio.Blast import NCBIXML
>>>
>>> b_parser = NCBIXML.BlastParser()
>>> b_record = b_parser.parse(blast_out)
>>>
>>>
>>> Breaks on "Alejandro Schäffer",  in the XML <BlastOutput_reference> 
>>> tag. The ä seems to cause the error. Replace it with a regular "a" 
>>> everything is hunky-dory
>>
>
> Is the lower-case a with umlaut in the XML file as ä, or using an 
> encoding like &auml; or &#228; instead? (ampersand characters, aka 
> character entities)


It's an ä not a character entity.

>
> Also, what character set does the blast_out XML file claim to be in? 
> And does that fit with the inclusion of an a-umlaut as a character?


I haven't the foggiest... :)

>
> It may be the NCBI's fault for producing a bad XML file...
>

Yeah, well, I still have to deal with it :(  In any case, why is this 
cropping up now? Schäffer has been in NCBI for years...

The file is available at http://iddo-friedberg.org/biopy_bad_blast.xml

in case anyone wants to have a look-see.

Thanks,

Iddo



-- 
Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037 USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9949
http://iddo-friedberg.org



More information about the BioPython mailing list