[BioPython] Problem with blast xml

Sebastian Bassi sbassi at gmail.com
Thu Oct 4 06:47:44 UTC 2007

I am having a problem that it is not originated in Biopython, but it
is affecting the Biopython (1.43) xml blast parser.
I have two xml files, one can be parsed and the other can't.
Here are the commands I run to get the xml files:

sbassi at xubuntu:~/blast-2.2.16/bin$  ./blastall -p blastn -d
/media/vic300/BLASTdb/ecoli.nt -i
/media/vic300/INTA/mitofragsB2-TAB.fasta -e 0.0001 -m 7 -o TABB2.xml
sbassi at xubuntu:~/blast-2.2.16/bin$  ./blastall -p blastn -d
/media/vic300/BLASTdb/ecoli.nt -i
/media/vic300/INTA/mitofragsB2-TABv2.fasta -e 0.0001 -m 7 -o

The relevant difference is the input file, the sequences are
different, but the output file should have the same format (shouldn't
When I am parsing the files, I find that this is not true.
This is the file that can be parsed without problem:

>>> bout=open('bioinfo/INTA/TABB2.xml')
>>> b_records=NCBIXML.parse(bout)
>>> x=b_records.next()
>>> y=b_records.next()
>>> x.query
u'fragment 31'
>>> y.query
u'fragment 67'
>>> x.alignments
[<Bio.Blast.Record.Alignment instance at 0xb659850c>]
>>> y.alignments
[<Bio.Blast.Record.Alignment instance at 0xb65a3c6c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3cec>,
<Bio.Blast.Record.Alignment instance at 0xb65a3d8c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3e8c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3f8c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3e4c>,
<Bio.Blast.Record.Alignment instance at 0xb65aa1ac>]

Let's see what seems to be a malformed? xml file:

>>> bout=open('bioinfo/INTA/TABB2v2.xml')
>>> b_records=NCBIXML.parse(bout)
>>> x=b_records.next()
>>> y=b_records.next()
>>> x.query
u'fragment 1'
>>> y.query
u'fragment 57'
>>> x.alignments
>>> y.alignments
[<Bio.Blast.Record.Alignment instance at 0xb65a374c>]

There is a record with an empty list.

Here is a fragment of the "normal" one (TABB2.xml):

      <Iteration_query-def>fragment 31 </Iteration_query-def>
          <Hit_def>Escherichia coli K-12 MG1655 section 199 of 400 of
the complete genome</Hit_def>

Here is a fragment of the "malformed" one (TABB2v2.xml):


Why is this happening? Is this a expected behavior?

I uploaded the xml files here:

Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318

More information about the Biopython mailing list