[BioPython] BLAST XML problem?
Peter
biopython at maubp.freeserve.co.uk
Wed Jan 11 15:13:42 EST 2006
Sebastian Bassi wrote:
> On 1/11/06, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
>><?xml version="1.0" encoding="ISO-8859-1"?>
>>Instead its just:
>><?xml version="1.0"?>
>>Short term solutions which I have just tried and got to work:
>>(1) Edit the offending character by hand (as you did)
>>(2) Specify encoding="ISO-8859-1" by editing the first line by hand
>>(2) Covert the file to unicode (doubles the size)
>
>
> I have a 4th solution, that doesn't involve XML editing, so it will
> "fix" the problem for other users:
> 4) Change Biopython or XML parser to assume encoding = ISO-8859-1 when
> there is no encoding information.
Well yes, that did cross my mind. I even went off to try and find out
how to do this, but failed. Any ideas?
> I wonder if <?xml version="1.0"?> is a W3C valid first line for a XML
> file. If this is OK (from the point of view of the XML standard), then
> the parser should be corrected, if not, according to the standard, the
> file should be rejected for non compliance
You sound like you know a lot more about XML than I do, would you be
able to find out one way or the other? This would be useful information
for trying to get the NCBI to make a change.
Iddo's bad file is fine, according to www.xmlvalidation.com (cut and
pasting). The NCBI DTD files are here:
http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd
http://www.ncbi.nlm.nih.gov/dtd/NCBI_Entity.mod.dtd
http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.mod.dtd
I think this does mean declaring the encoding may be optional, but this
validation program could identify the encoding on its own.
> (this is not HTML where the
> browser client can accept and correct invalid code, the specifications
> states that XML should validate before being used).
Which is good, unless you are trying to deal with bad XML produced by a
third party. I'm sure the NCBI will fix this, if it is their problem.
It just might take a while.
Peter
More information about the BioPython
mailing list