[Biopython-dev] "Your XML file did not start with <?xml"

Cymon Cox cy at cymon.org
Sun Jun 14 11:23:23 EDT 2009


Folks,

I've been using qblast recently, and got a lot of invalid replies from NCBI
of this sort:

Traceback (most recent call last):
  File "test_NCBI_qblast.py", line 71, in <module>
    record = NCBIXML.read(handle)
  File "/home/cymon/git/github-master/Bio/Blast/NCBIXML.py", line 564, in
read
    first = iterator.next()
  File "/home/cymon/git/github-master/Bio/Blast/NCBIXML.py", line 611, in
parse
    % XML_START)
ValueError: Your XML file did not start with <?xml...

Which is true: NCBI is returning "\n\n". If you code around this and just
keep going the results eventually arrive:

$ git diff NCBIWWW.py
diff --git a/Bio/Blast/NCBIWWW.py b/Bio/Blast/NCBIWWW.py
index 324ab2a..e4243fa 100644
--- a/Bio/Blast/NCBIWWW.py
+++ b/Bio/Blast/NCBIWWW.py
@@ -807,10 +807,14 @@ def qblast(program, database, sequence,
         results = handle.read()
         # XML results don't have the Status tag when finished
         if results.find("Status=") < 0:
+            if results == "\n\n":
+                print "Results == '\\n\\n': continuing..."
+                continue
             break
         i = results.index("Status=")
         j = results.index("\n", i)
         status = results[i+len("Status="):j].strip()
+        print "Status=%s" % status
         if status.upper() == "READY":
             break

$ python test_NCBI_qblast.py
Checking Bio.Blast.NCBIWWW.qblast() with various queries
qblast('blastp', 'nr', '160837788', ...)
Status=WAITING
Status=WAITING
Status=WAITING
qblast('blastn', 'nr',
'GTACCTTGATTTCGTATTCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGACTCTACTACCTTTACCC', ...)
qblast('blastx', 'nr', ">gi|116660609|gb|EG558220.1|EG558220 CR02019H04 Leaf
CR02 cDNA library Catharanthus roseus cDNA clone CR02019H04 5', mRNA
sequence\nCTCCATTCCCTCTCTATTTTCAGTCTAATCAAATTAGAGCTTAAAAGAATGAGATTTTTAACAAATAAAA\nAAACATAGGGGAGATTTCATAAAAGTTATATTAGTGATTTGAAGAATATTTTAGTCTATTTTTTTTTTTT\nTCTTTTTTTGATGAAGAAAGGGTATATAAAATCAAGAATCTGGGGTGTTTGTGTTGACTTGGGTCGGGTG\nTGTATAATTCTTGATTTTTTCAGGTAGTTGAAAAGGTAGGGAGAAAAGTGGAGAAGCCTAAGCTGATATT\nGAAATTCATATGGATGGAAAAGAACATTGGTTTAGGATTGGATCAAAAAATAGGTGGACATGGAACTGTA\nCCACTACGTCCTTACTATTTTTGGCCGAGGAAAGATGCTTGGGAAGAACTTAAAACAGTTTTAGAAAGCA\nAGCCATGGATTTCTCAGAAGAAAATGATTATACTTCTTAATCAGGCAACTGATATTATCAATTTATGGCA\nGCAGAGTGGTGGCTCCTTGTCCCAGCAGCAGTAATTACTTTTTTTTCTCTTTTTGTTTCCAAATTAAGAA\nACATTAGTATCATATGGCTATTTGCTCAATTGCAGATTTCTTTCTTTTGTGAATG",
...)
Status=WAITING
Results == '\n\n': continuing...
Results == '\n\n': continuing...
Results == '\n\n': continuing...
Done

Anyone else seen this? Am I just unlucky enough to have a flaky internet
connection?

Cheers, C.
--



More information about the Biopython-dev mailing list