[BioPython] BLAST XML problem?
Iddo Friedberg
idoerg at gmail.com
Wed Jan 11 15:36:31 EST 2006
Sebastian Bassi wrote:
>On 1/11/06, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
>
>><?xml version="1.0" encoding="ISO-8859-1"?>
>>Instead its just:
>><?xml version="1.0"?>
>>Short term solutions which I have just tried and got to work:
>>(1) Edit the offending character by hand (as you did)
>>(2) Specify encoding="ISO-8859-1" by editing the first line by hand
>>(2) Covert the file to unicode (doubles the size)
>>
>>
>
>I have a 4th solution, that doesn't involve XML editing, so it will
>"fix" the problem for other users:
>4) Change Biopython or XML parser to assume encoding = ISO-8859-1 when
>there is no encoding information.
>
>
OK, I was actually going to do this.
I found a bit of code that will detect file encoding from the first two
bytes. I was planning to put the return value into the BLAST XML parser.
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/363841
If this would not have worked, I would have force-plugged the ISO-8859-1
But...
When I generated a new XML file from NCBI to test the encoding-detection
module, the code used for the ä actually changed! Everything works now.
So... there are there biopython fans with a (very) quick response time
in NCBI?
Spooky...
> I wonder if <?xml version="1.0"?> is a W3C valid first line for a XML
> file. If this is OK (from the point of view of the XML standard), then
> the parser should be corrected, if not, according to the standard, the
> file should be rejected for non compliance (this is not HTML where the
> browser client can accept and correct invalid code, the specifications
> states that XML should validate before being used).
I believe that the default is UTF-8, and that <?xml version="1.0"?> is
valid.
./I
--
Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
Tel: (858) 646 3100 x3516
Fax: (858) 713 9949
http://iddo-friedberg.org
More information about the BioPython
mailing list