[Biopython-dev] [Bug 2157] New: Blast.NCBIXML looses query information for queries 2-n

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Mon Dec 4 18:54:03 UTC 2006


           Summary: Blast.NCBIXML looses query information for queries 2-n
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: critical
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: kael.fischer at gmail.com

This is untested for blastall output versions other than 2.2.14-15 and I have
only looked at blastn.

XMLParser: 1 Blast Record instance = all submitted query sequences
Traditional BlastParser: 1 Blast Record instance = 1 query sequence 
(for versions of BlastParser/blastall where it can parse)

The name of all the queries (after the first one) and their lengths is lost
during parsing.  The data are in the XML output at the top level of each
<iteration>. For the data structure to be isomorphus to the original
BlastParser and capture this important information, NCBIXML.parser should
return a list of records (one per XML <iteration>).  Also, having some sort of
iterator/generator mechanism for the <iteration>s would have the added benefit
of a smaller memory footprint for very large results.

It has been suggested that XMLParser be used in lieu of BlastParser, as
BlastParser is broken for new-ish versions of blastall (see bug 2090).  All
code that uses record.query or record.query_letters, or in some other way
relies on the documented
(http://www.bioinformatics.org/bradstuff/bp/tut/images/BlastRecord.png) data
structure of 1 record per query is broken when using NCBIXML because of this

Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

More information about the Biopython-dev mailing list