[Biopython-dev] Proposed addition to Standalone BLAST

Brad Chapman chapmanb at arches.uga.edu
Thu Oct 12 14:27:24 EDT 2000


Hello again;
	More blast stuff from me -- can you tell I've had to parse a lot
of BLAST reports recently? :-).

	Anyways, I've been using the standalone BLAST parser to parse
some big ol' BLAST runs that I'm doing, and I noticed that occassionally
blastall will report an error while running. This a pretty uninformative
error, and will generally either say something about being unable to
calculate parameters during the BLAST. Well, I investigated further and
found out that BLAST quits trying to run a search when it gets to a junk
sequence like this:

>gi|9854647|gb|BE599574.1|BE599574 PI1_77_C09.g1_A002 Pathogen induced 1
(PI1) Sorghum bicolor cDNA, mRNA sequence
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGTTTTTTTTTTTTTTTTTTTTTTTT
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAA

Right, so this is useless junk sequence and BLAST is right to bomb out on
it. 

The report that BLAST generates on something like this is attached.
Basically, the problem is that a truncated report missing all of the
statistics at the end. This causes the parser to run out of lines without
finding the statistics it is looking for and generate a SyntaxError.

What I'd like to propose is that the parser generate a new exception for
these kind of reports, a NCBIStandalone.BlastError exception, indicating
a failure in Blast, not in the parser.

The reason I want to do this is that I would like to rig the exception up
to return the query that failed in this way, so that I can easily send
some messages to the owners of these sequences, asking them to kindly
remove the sequence from GenBank.

Anyways, attached is a patch (NCBIStandalone.diff) that implements this
type of exception-raising behavior for the BlastParser, which allows you
to parse like this:

try:
     b_record = iterator.next()
except NCBIStandalone.BlastError, info:
     print 'Got a blast error on query', info[1]

Do people think this is a good idea and something that can get into the
standalone parser? Comments are very welcome!

Brad
-------------- next part --------------
A non-text attachment was scrubbed...
Name: problem.blast
Type: application/x-unknown
Size: 834 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython-dev/attachments/20001012/60782c48/problem.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: NCBIStandalone.diff
Type: application/x-unknown
Size: 2024 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython-dev/attachments/20001012/60782c48/NCBIStandalone.bin


More information about the Biopython-dev mailing list