[BioPython] BLAST XML problem?

Peter biopython at maubp.freeserve.co.uk
Wed Jan 11 15:13:42 EST 2006


Sebastian Bassi wrote:
> On 1/11/06, Peter <biopython at maubp.freeserve.co.uk> wrote:
> 
>><?xml version="1.0" encoding="ISO-8859-1"?>
>>Instead its just:
>><?xml version="1.0"?>
>>Short term solutions which I have just tried and got to work:
>>(1) Edit the offending character by hand (as you did)
>>(2) Specify encoding="ISO-8859-1" by editing the first line by hand
>>(2) Covert the file to unicode (doubles the size)
> 
> 
> I have a 4th solution, that doesn't involve XML editing, so it will
> "fix" the problem for other users:
> 4) Change Biopython or XML parser to assume encoding = ISO-8859-1 when
> there is no encoding information.

Well yes, that did cross my mind.  I even went off to try and find out
how to do this, but failed.  Any ideas?

> I wonder if <?xml version="1.0"?> is a W3C valid first line for a XML
> file. If this is OK (from the point of view of the XML standard), then
> the parser should be corrected, if not, according to the standard, the
> file should be rejected for non compliance

You sound like you know a lot more about XML than I do, would you be 
able to find out one way or the other?  This would be useful information 
for trying to get the NCBI to make a change.

Iddo's bad file is fine, according to www.xmlvalidation.com (cut and 
pasting).  The NCBI DTD files are here:

http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd
http://www.ncbi.nlm.nih.gov/dtd/NCBI_Entity.mod.dtd
http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.mod.dtd

I think this does mean declaring the encoding may be optional, but this 
validation program could identify the encoding on its own.

 > (this is not HTML where the
 > browser client can accept and correct invalid code, the specifications
 > states that XML should validate before being used).

Which is good, unless you are trying to deal with bad XML produced by a 
third party.  I'm sure the NCBI will fix this, if it is their problem. 
It just might take a while.

Peter



More information about the BioPython mailing list