[BioPython] BLAST XML problem?

Iddo Friedberg idoerg at gmail.com
Wed Jan 11 17:14:09 EST 2006




Sebastian Bassi wrote:

>On 1/11/06, Peter <biopython at maubp.freeserve.co.uk> wrote:
>  
>
>><?xml version="1.0" encoding="ISO-8859-1"?>
>>Instead its just:
>><?xml version="1.0"?>
>>Short term solutions which I have just tried and got to work:
>>(1) Edit the offending character by hand (as you did)
>>(2) Specify encoding="ISO-8859-1" by editing the first line by hand
>>(2) Covert the file to unicode (doubles the size)
>>    
>>
>
>I have a 4th solution, that doesn't involve XML editing, so it will
>"fix" the problem for other users:
>4) Change Biopython or XML parser to assume encoding = ISO-8859-1 when
>there is no encoding information.
>  
>

OK, I was actually going to do this.

I found a bit of code that will detect file encoding from the first two 
bytes. I was planning to put the return value  into the BLAST XML parser.

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/363841

If this would not have worked, I would have force-plugged the  ISO-8859-1

But...

When I generated a new XML file from NCBI to test the encoding-detection 
module, the code used for the ä actually changed! Everything works now.

So... there are there  biopython fans with a (very) quick response time 
in NCBI?

Spooky...

> I wonder if <?xml version="1.0"?> is a W3C valid first line for a XML
> file. If this is OK (from the point of view of the XML standard), then
> the parser should be corrected, if not, according to the standard, the
> file should be rejected for non compliance (this is not HTML where the
> browser client can accept and correct invalid code, the specifications
> states that XML should validate before being used).


I believe that the default is UTF-8, and that <?xml version="1.0"?> is 
valid.

./I

-- 

Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
Tel: (858) 646 3100 x3516
Fax: (858) 713 9949
http://iddo-friedberg.org




-- 

Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
Tel: (858) 646 3100 x3516
Fax: (858) 713 9949
http://iddo-friedberg.org



More information about the BioPython mailing list