[Biopython-dev] Blast
Michiel De Hoon
mdehoon at c2b2.columbia.edu
Wed Oct 5 13:37:22 EDT 2005
Hi everybody,
Fixing the Blast problem turned out to be easier than I thought, as there was
already a parser (written by Bertrand Frottier) in Biopython that parses
Blast XML output. This Biopython project keeps amazing me.
So I just made XML output the default for qblast, and updated the
Tutorial/Cookbook chapter on Blast. Feel free to test it, and let me know if
there are any problems.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: biopython-dev-bounces at portal.open-bio.org on behalf of Michiel De Hoon
Sent: Sat 10/1/2005 8:50 PM
To: Jeffrey Chang; biopython-dev at biopython.org
Subject: RE: [Biopython-dev] Blast
Thanks, Jeff. Currently, qblast in Bio.Blast.NCBIWWW can already return text
output via the format_type argument. Unfortunately, the standalone blast and
www-blast return slightly different text output, so we'd have to fix the
parser in Bio.Blast.NCBIStandalone for it to handle www-blast text output.
I found out that both standalone blast and www-blast can also return XML
output, which is identical (as far as I can tell) in both cases. I would
think that a parser that can read this XML output is most stable.
So I propose the following:
1) Let qblast return XML output by default; text and html output can be
returned by setting the format_type argument to qblast appropriately.
2) Write an XML parser that can read blast output from standalone and www
blast.
3) In a few versions, deprecate the text parser in NCBIStandalone and the
html parser in NCBIWWW. (This will only affect users of the text parser in
NCBIStandalone, since the html parser in NCBIWWW is already behind and cannot
parse blast output as it is).
Any objections, anybody?
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: Jeffrey Chang [mailto:jeffrey.chang at duke.edu]
Sent: Thu 9/29/2005 10:16 PM
To: Michiel De Hoon
Cc: biopython-dev at biopython.org
Subject: Re: [Biopython-dev] Blast
On Sep 29, 2005, at 1:46 PM, Michiel De Hoon wrote:
> To my surprise, the parser in Blast.NCBIWWW tries to parse HTML output
> instead of text output. My guess is that the HTML output changes
> more often
> and is more difficult to parse than text output. So isn't it
> possible to make
> NCBIWWW.qblast return text output instead of HTML and parse that
> instead?
> So my question is, why was the choice made to parse HTML instead of
> text? Is
> it simply because blast-on-the-web couldn't return text output in
> the past?
You are right. It was done that way in the past when the only way to
use NCBI's BLAST was to use the HTML output. (Actually, there was a
version that you could access through a proprietary non-HTTP
protocol, but the databases were not updated as frequently.) Now
that we can get text, perhaps it is time to encourage users to use
the text one. I believe the HTML parser is a few versions behind
now, and unable to parse current BLAST output anymore.
Jeff
_______________________________________________
Biopython-dev mailing list
Biopython-dev at biopython.org
http://biopython.org/mailman/listinfo/biopython-dev
More information about the Biopython-dev
mailing list