[Biopython-dev] [Bug 2821] NCBIXML.parse only returns results for non-empty hits rather than one per query sequence
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Fri Apr 24 09:02:35 UTC 2009
http://bugzilla.open-bio.org/show_bug.cgi?id=2821
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-04-24 05:02 EST -------
What version of BLAST do you have, and (assuming its less than say 10 MB) could
you attach the XML file to this bug?
>From memory this is a limitation of the raw XML file from the NCBI - there is
no way to tell if there were additional queries with no hits (so Biopython
can't help directly). I have not checked BLAST 2.2.20, but had been meaning to
ask the NCBI about this. They may not regard it as a "bug", but it was
annoying.
I have used two workarounds in my own code.
(1) Load a list of the query IDs into memory, and as you go though the BLAST
results you can see which queries don't appear - and therefore had no hits.
(2) Use the .next() methods on a FASTA iterator on the query file, and the
NCBIXML iterator on the BLAST XML file to step through the two files in sync.
I have some code to do this somewhere... maybe I should turn this into a
cookbook recipe for the wiki: http://biopython.org/wiki/Category:Cookbook
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list