[Biopython-dev] [Bug 2821] New: NCBIXML.parse only returns results for non-empty hits rather than one per query sequence

Fri Apr 24 03:56:49 UTC 2009

http://bugzilla.open-bio.org/show_bug.cgi?id=2821

           Summary: NCBIXML.parse only returns results for non-empty hits
                    rather than one per query sequence
           Product: Biopython
           Version: 1.50b
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: camilla at ip.id.au

I used NCBIStandalone.blastall to BLAST all records in query database VEKY.faa
(a FASTA-format file of 226 proteins) significantly similar in proteins in
target database VPOO.faa (a FASTA-format file of 80 proteins).

Many of the 'VEKY' proteins do not have a significant hit in the 'VPOO'
database (which is what I expect and this is fine).

To access the results, I iterate using a loop like the following to parse the
raw BLAST results in XML format:

    blast_out = _open_file(outraw_file, 'r')
    blast_records = NCBIXML.parse(blast_out)
    for b_record in blast_records:
      # deal with each record here

However, instead of getting 226 records as I expect, some of which have a
description of alignments field of length zero, this returns 64 records - the
records that did not have 'no hits'.

My problem is that I'd like to work out which VEKY query sequence each
'b_record' corresponds to. But so far I have not been able to find any such
information in the b_record. And because it doesn't produce one per query
sequence, I cannot infer that information from the order of the query sequences
in my input VEKY.faa file.

Do you know how I can get around this problem?

Warm thanks in advance for any help or tips,
Camilla

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.