[Biopython-dev] [Bug 2821] NCBIXML.parse only returns results for non-empty hits rather than one per query sequence

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri Apr 24 05:02:35 EDT 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2821





------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-04-24 05:02 EST -------
What version of BLAST do you have, and (assuming its less than say 10 MB) could
you attach the XML file to this bug?

>From memory this is a limitation of the raw XML file from the NCBI - there is
no way to tell if there were additional queries with no hits (so Biopython
can't help directly).  I have not checked BLAST 2.2.20, but had been meaning to
ask the NCBI about this.  They may not regard it as a "bug", but it was
annoying.

I have used two workarounds in my own code.

(1) Load a list of the query IDs into memory, and as you go though the BLAST
results you can see which queries don't appear - and therefore had no hits.

(2) Use the .next() methods on a FASTA iterator on the query file, and the
NCBIXML iterator on the BLAST XML file to step through the two files in sync. 
I have some code to do this somewhere... maybe I should turn this into a
cookbook recipe for the wiki: http://biopython.org/wiki/Category:Cookbook

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list