[Biopython-dev] [Bug 2821] New: NCBIXML.parse only returns results for non-empty hits rather than one per query sequence
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Thu Apr 23 23:56:49 EDT 2009
http://bugzilla.open-bio.org/show_bug.cgi?id=2821
Summary: NCBIXML.parse only returns results for non-empty hits
rather than one per query sequence
Product: Biopython
Version: 1.50b
Platform: Other
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: camilla at ip.id.au
I used NCBIStandalone.blastall to BLAST all records in query database VEKY.faa
(a FASTA-format file of 226 proteins) significantly similar in proteins in
target database VPOO.faa (a FASTA-format file of 80 proteins).
Many of the 'VEKY' proteins do not have a significant hit in the 'VPOO'
database (which is what I expect and this is fine).
To access the results, I iterate using a loop like the following to parse the
raw BLAST results in XML format:
blast_out = _open_file(outraw_file, 'r')
blast_records = NCBIXML.parse(blast_out)
for b_record in blast_records:
# deal with each record here
However, instead of getting 226 records as I expect, some of which have a
description of alignments field of length zero, this returns 64 records - the
records that did not have 'no hits'.
My problem is that I'd like to work out which VEKY query sequence each
'b_record' corresponds to. But so far I have not been able to find any such
information in the b_record. And because it doesn't produce one per query
sequence, I cannot infer that information from the order of the query sequences
in my input VEKY.faa file.
Do you know how I can get around this problem?
Warm thanks in advance for any help or tips,
Camilla
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list