[BioPython] help with NCBIWWW parser
Peter
biopython at maubp.freeserve.co.uk
Fri Sep 15 09:14:38 UTC 2006
Edoardo Saccenti wrote:
>> Hi Folks!
>>
>> I'm trying to parse the output of blast search done using the NCBIWWW
>> qblast.
Thanks for sending me the file, it looks like you have got an XML file
back from the NCBI using NCBIWWW.qblast but you are trying to use the
HTML parser to read it.
qblast takes an optional argument of format_type which now defaults to
XML. You can also choose "HTML", "Text", "ASN.1"
If you have plain text output, try NCBIStandalone.BlastParser()
If you have HTML output, try NCBIWWW.BlastParser()
If you have XML output, try NCBIXML.BlastParser()
In theory, using XML should be the most reliable as it is a file format
designed for computers to read.
The HTML output also contains lots of formatting to make it look pretty
on a web browser - and also changes fairly often.
The plain text output is fairly simple, but again the NBCI makes minor
changes every so often (and their standalone tools produce a slightly
different format to the web tools).
I can read your XML file using:
from Bio.Blast import NCBIXML
blast_out = open("my_blast","r")
b_parser = NCBIXML.BlastParser()
b_record = b_parser.parse(blast_out)
print b_record.query
I hope that helps,
(If you are submitting multiple queries, then you will need to use an
iterator... but that is another can of worms).
Peter
More information about the Biopython
mailing list