[BioPython] help with NCBIWWW parser

Fri Sep 15 09:14:38 UTC 2006

Edoardo Saccenti wrote:
>> Hi Folks!
>>
>> I'm trying to parse the output of blast search done using the NCBIWWW
>> qblast.

Thanks for sending me the file, it looks like you have got an XML file 
back from the NCBI using NCBIWWW.qblast but you are trying to use the 
HTML parser to read it.

qblast takes an optional argument of format_type which now defaults to 
XML.  You can also choose "HTML", "Text", "ASN.1"

If you have plain text output, try NCBIStandalone.BlastParser()
If you have HTML output, try NCBIWWW.BlastParser()
If you have XML output, try NCBIXML.BlastParser()

In theory, using XML should be the most reliable as it is a file format 
designed for computers to read.

The HTML output also contains lots of formatting to make it look pretty 
on a web browser - and also changes fairly often.

The plain text output is fairly simple, but again the NBCI makes minor 
changes every so often (and their standalone tools produce a slightly 
different format to the web tools).

I can read your XML file using:

from Bio.Blast import NCBIXML
blast_out = open("my_blast","r")
b_parser = NCBIXML.BlastParser()
b_record = b_parser.parse(blast_out)
print b_record.query

I hope that helps,

(If you are submitting multiple queries, then you will need to use an 
iterator... but that is another can of worms).

Peter