[BioPython] help with NCBIWWW parser

Fri Sep 15 13:23:37 UTC 2006

I realised indeed it was an xml....my fault not to have
read with more attention istructions....

thanks a lot for the time you waist
Edoardo

On Fri, 2006-09-15 at 10:14 +0100, Peter wrote:
> Edoardo Saccenti wrote:
> >> Hi Folks!
> >>
> >> I'm trying to parse the output of blast search done using the NCBIWWW
> >> qblast.
> 
> Thanks for sending me the file, it looks like you have got an XML file 
> back from the NCBI using NCBIWWW.qblast but you are trying to use the 
> HTML parser to read it.
> 
> qblast takes an optional argument of format_type which now defaults to 
> XML.  You can also choose "HTML", "Text", "ASN.1"
> 
> If you have plain text output, try NCBIStandalone.BlastParser()
> If you have HTML output, try NCBIWWW.BlastParser()
> If you have XML output, try NCBIXML.BlastParser()
> 
> In theory, using XML should be the most reliable as it is a file format 
> designed for computers to read.
> 
> The HTML output also contains lots of formatting to make it look pretty 
> on a web browser - and also changes fairly often.
> 
> The plain text output is fairly simple, but again the NBCI makes minor 
> changes every so often (and their standalone tools produce a slightly 
> different format to the web tools).
> 
> I can read your XML file using:
> 
> from Bio.Blast import NCBIXML
> blast_out = open("my_blast","r")
> b_parser = NCBIXML.BlastParser()
> b_record = b_parser.parse(blast_out)
> print b_record.query
> 
> I hope that helps,
> 
> (If you are submitting multiple queries, then you will need to use an 
> iterator... but that is another can of worms).
> 
> Peter
> 
>