[Biopython-dev] [BioPython] Need help parsing Blastoutput
Michiel De Hoon
mdehoon at c2b2.columbia.edu
Wed Apr 19 20:14:00 UTC 2006
> Peter wrote:
> If anyone has a matched set of Blast output files which BioPython can
> parse they could email me that would be great. Might even turn it into
> a short addition to the test suite. i.e. same data, in both the XML and
> plain text formats.
> According to my notes, I was getting lists for the following with the
> plain text output, which are now integers using the XML parser:
> hsp.gaps
> hsp.positives
> hsp.identities
I took the query from the blast text output from the first Blast test in the
Biopython test suite and ran it with the online blast, generating XML and
plain text output. The text-based parser chokes on the blast text output, but
anyway we can see from the text output what the result should have been.
With the XML parser, you are right that hsp.gaps, hsp.positives, and
hsp.identities are integers now, while they are lists with the text-based
parser (running the text-based parser on the blast text output in the test
suite gives indeed lists). What happens is that if the Blast output looks
like this:
Identities = 28/87 (32%), Positives = 44/87 (50%), Gaps = 12/87 (13%)
then the text-based parser returns
hsp.identities = (28, 87)
hsp.positives = (44, 87)
hsp.gaps = (12, 87)
while the XML parser returns
hsp.identities = 28
hsp.positives = 44
hsp.gaps = 12
; we can get the 87 from len(hsp.query).
Actually, I like the XML parser output a bit better, but we can change it to
the text parser's output if preferred.
Do you know of any other inconsistencies between the parsers?
If not, I suggest raising a deprecation warning with the text-based Blast
parser, so users won't waste time trying to figure out why it doesn't work.
--Michiel.
More information about the Biopython-dev
mailing list