[Biopython-dev] Proposed addition to Standalone BLAST

Jeffrey Chang jchang at SMI.Stanford.EDU
Thu Oct 19 19:57:40 EDT 2000


I'm not sure what's going on, but it looks like BLAST may be masking out
low-complexity regions and ending up with little or nothing to search
with.  Unfortunately, there's nothing in the output that clearly tells us
what's going on.  For example, it'd be nice if there were a message
explaining why the parameters are missing.

Although something's clearly wrong here, I'm hesitant to try and diagnose
the error within the parser.  I don't know what's a real syntax error and
what's a BLAST error.

However, perhaps we can push the error detection higher up.  Possible 
solutions might be:
1) developed a Parser that could catch a SyntaxError, do some diagnostics
on the Record, and then raise a BlastError
2) make the parameters section optional in the Scanner, and then let the
user either check the Record, or adapt the Consumer to check

Would either of these be helpful?  Or something else?

Jeff





On 12 Oct 2000, Brad Chapman wrote:

> Hello again;
> 	More blast stuff from me -- can you tell I've had to parse a lot
> of BLAST reports recently? :-).
> 
> 	Anyways, I've been using the standalone BLAST parser to parse
> some big ol' BLAST runs that I'm doing, and I noticed that occassionally
> blastall will report an error while running. This a pretty uninformative
> error, and will generally either say something about being unable to
> calculate parameters during the BLAST. Well, I investigated further and
> found out that BLAST quits trying to run a search when it gets to a junk
> sequence like this:
> 
> >gi|9854647|gb|BE599574.1|BE599574 PI1_77_C09.g1_A002 Pathogen induced 1
> (PI1) Sorghum bicolor cDNA, mRNA sequence
> TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGTTTTTTTTTTTTTTTTTTTTTTTT
> TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
> TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAA
> 
> Right, so this is useless junk sequence and BLAST is right to bomb out on
> it. 
> 
> The report that BLAST generates on something like this is attached.
> Basically, the problem is that a truncated report missing all of the
> statistics at the end. This causes the parser to run out of lines without
> finding the statistics it is looking for and generate a SyntaxError.
> 
> What I'd like to propose is that the parser generate a new exception for
> these kind of reports, a NCBIStandalone.BlastError exception, indicating
> a failure in Blast, not in the parser.
> 
> The reason I want to do this is that I would like to rig the exception up
> to return the query that failed in this way, so that I can easily send
> some messages to the owners of these sequences, asking them to kindly
> remove the sequence from GenBank.
> 
> Anyways, attached is a patch (NCBIStandalone.diff) that implements this
> type of exception-raising behavior for the BlastParser, which allows you
> to parse like this:
> 
> try:
>      b_record = iterator.next()
> except NCBIStandalone.BlastError, info:
>      print 'Got a blast error on query', info[1]
> 
> Do people think this is a good idea and something that can get into the
> standalone parser? Comments are very welcome!
> 
> Brad




More information about the Biopython-dev mailing list