[Biopython-dev] Proposed addition to Standalone BLAST
Brad Chapman
chapmanb at arches.uga.edu
Tue Nov 7 19:30:55 EST 2000
Me:
> > I use copy.deepcopy() to copy the handle
Jeff checks on me:
> Are you sure you can copy file handles in this way? It's not working for
> me using Python 2.0 on Solaris:
[]
Ooops -- I should have checked a simplest case. Doh! Thanks for the
good catch. Apparently, copy.deepcopy() can copy your magical
File.StringHandles but not regular ol' file handles. I was just using
the output from the iterator to parse, so I completely missed this. A new
version is attached which should work for this case -- it converts
things that aren't StringHandles to a StringHandle before
proceeding. This way there shouldn't be any extra overhead for using
the iterator, but it can handle taking a simple file.
[BlastErrorParser taking a file to write bad reports to]
Jeff asks:
> Can we make this function take a handle instead of the name of a file?
> That would allow people to use sys.stderr, if they want the bad files to
> go to STDERR. The tradeoff is that it would place the burden of creating
> a handle on the client.
Andrew agrees:
> Guess I'm a purist. (Does that mean I should be using Lisp? :)
> Passing file handles is The Right Thing.
Agreed on all accounts. Biopython does use file handles for almost
everything, so not having a handle here is actually strange and
awkward. I've switched this over in the new attached patch.
Thanks for the comments! Please let me know of anything else at all.
Brad
-------------- next part --------------
*** NCBIStandalone.py.orig Thu Oct 12 13:32:21 2000
--- NCBIStandalone.py Tue Nov 7 19:17:35 2000
***************
*** 36,41 ****
--- 36,42 ----
import re
import popen2
from types import *
+ import copy
from Bio import File
from Bio.ParserSupport import *
***************
*** 471,476 ****
--- 472,563 ----
consumer.end_parameters()
+ class LowQualityBlastError(Exception):
+ """Error caused by running a low quality sequence through BLAST.
+
+ When low quality sequences (like GenBank entries containing only
+ stretches of a single nucleotide) are BLASTed, they will result in
+ BLAST generating an error and not being able to perform the BLAST.
+ search. This error should be raised for the BLAST reports produced
+ in this case.
+ """
+ pass
+
+ class BlastErrorParser:
+ """Attempt to catch and diagnose BLAST errors while parsing.
+
+ This utilizes the BlastParser module but adds an additional layer
+ of complexity on top of it by attempting to diagnose SyntaxError's
+ that may actually indicate problems during BLAST parsing.
+
+ Current BLAST problems this detects are:
+ o LowQualityBlastError - When BLASTing really low quality sequences
+ (ie. some GenBank entries which are just short streches of a single
+ nucleotide), BLAST will report an error with the sequence and be
+ unable to search with this. This will lead to a badly formatted
+ BLAST report that the parsers choke on. The parser will convert the
+ SyntaxError to a LowQualityBlastError and attempt to provide useful
+ information.
+ """
+ def __init__(self, bad_report_handle = None):
+ """Initialize a parser that tries to catch BlastErrors.
+
+ Arguments:
+ o bad_report_handle - An optional argument specifying a handle
+ where bad reports should be sent. This would allow you to save
+ all of the bad reports to a file, for instance. If no handle
+ is specified, the bad reports will not be saved.
+ """
+ self._bad_report_handle = bad_report_handle
+
+ self._b_parser = BlastParser()
+
+ def parse(self, handle):
+ """Parse a handle, attempting to diagnose errors.
+ """
+ if isinstance(handle, File.StringHandle):
+ shandle = handle
+ else:
+ shandle = File.StringHandle(handle.read())
+
+ # copy the handle so we have it if we find an error
+ copy_handle = copy.deepcopy(shandle)
+
+ try:
+ return self._b_parser.parse(shandle)
+ except SyntaxError, msg:
+ # if we have a bad_report_file, save the info to it first
+ if self._bad_report_handle:
+ # copy the handle so we can write it
+ error_handle = copy.deepcopy(copy_handle)
+ # send the info to the error handle
+ self._bad_report_handle.write(error_handle.read())
+
+ # now we want to try and diagnose the error
+ self._diagnose_error(copy_handle, self._b_parser._consumer.data)
+
+ # if we got here we can't figure out the problem
+ # so we should pass along the syntax error we got
+ raise SyntaxError, msg
+
+ def _diagnose_error(self, handle, data_record):
+ """Attempt to diagnose an error in the passed handle.
+
+ Arguments:
+ o handle - The handle potentially containing the error
+ o data_record - The data record partially created by the consumer.
+ """
+ line = handle.readline()
+
+ while line:
+ # 'Searchingdone' instead of 'Searching......done' seems
+ # to indicate a failure to perform the BLAST due to
+ # low quality sequence
+ if line[:13] == 'Searchingdone':
+ raise LowQualityBlastError("Blast failure occured on query: ",
+ data_record.query)
+ line = handle.readline()
+
class BlastParser:
"""Parses BLAST data into a Record.Blast object.
More information about the Biopython-dev
mailing list