[Biopython-dev] Proposed addition to Standalone BLAST

Brad Chapman chapmanb at arches.uga.edu
Tue Nov 7 19:30:55 EST 2000


Me:
> > I use copy.deepcopy() to copy the handle

Jeff checks on me:
> Are you sure you can copy file handles in this way?  It's not working for
> me using Python 2.0 on Solaris:
[]

Ooops -- I should have checked a simplest case. Doh! Thanks for the
good catch. Apparently, copy.deepcopy() can copy your magical
File.StringHandles but not regular ol' file handles. I was just using
the output from the iterator to parse, so I completely missed this. A new
version is attached which should work for this case -- it converts
things that aren't StringHandles to a StringHandle before
proceeding. This way there shouldn't be any extra overhead for using
the iterator, but it can handle taking a simple file.

[BlastErrorParser taking a file to write bad reports to]
Jeff asks:
> Can we make this function take a handle instead of the name of a file?  
> That would allow people to use sys.stderr, if they want the bad files to
> go to STDERR.  The tradeoff is that it would place the burden of creating
> a handle on the client.

Andrew agrees:
> Guess I'm a purist. (Does that mean I should be using Lisp? :) 
> Passing file handles is The Right Thing.

Agreed on all accounts. Biopython does use file handles for almost
everything, so not having a handle here is actually strange and
awkward. I've switched this over in the new attached patch.

Thanks for the comments! Please let me know of anything else at all.

Brad

-------------- next part --------------
*** NCBIStandalone.py.orig	Thu Oct 12 13:32:21 2000
--- NCBIStandalone.py	Tue Nov  7 19:17:35 2000
***************
*** 36,41 ****
--- 36,42 ----
  import re
  import popen2
  from types import *
+ import copy
  
  from Bio import File
  from Bio.ParserSupport import *
***************
*** 471,476 ****
--- 472,563 ----
  
          consumer.end_parameters()
  
+ class LowQualityBlastError(Exception):
+     """Error caused by running a low quality sequence through BLAST.
+ 
+     When low quality sequences (like GenBank entries containing only
+     stretches of a single nucleotide) are BLASTed, they will result in
+     BLAST generating an error and not being able to perform the BLAST.
+     search. This error should be raised for the BLAST reports produced
+     in this case.
+     """
+     pass
+ 
+ class BlastErrorParser:
+     """Attempt to catch and diagnose BLAST errors while parsing.
+ 
+     This utilizes the BlastParser module but adds an additional layer
+     of complexity on top of it by attempting to diagnose SyntaxError's
+     that may actually indicate problems during BLAST parsing.
+ 
+     Current BLAST problems this detects are:
+     o LowQualityBlastError - When BLASTing really low quality sequences
+     (ie. some GenBank entries which are just short streches of a single
+     nucleotide), BLAST will report an error with the sequence and be
+     unable to search with this. This will lead to a badly formatted
+     BLAST report that the parsers choke on. The parser will convert the
+     SyntaxError to a LowQualityBlastError and attempt to provide useful
+     information.
+     """
+     def __init__(self, bad_report_handle = None):
+         """Initialize a parser that tries to catch BlastErrors.
+ 
+         Arguments:
+         o bad_report_handle - An optional argument specifying a handle
+         where bad reports should be sent. This would allow you to save
+         all of the bad reports to a file, for instance. If no handle
+         is specified, the bad reports will not be saved.
+         """
+         self._bad_report_handle = bad_report_handle
+         
+         self._b_parser = BlastParser()
+ 
+     def parse(self, handle):
+         """Parse a handle, attempting to diagnose errors.
+         """
+         if isinstance(handle, File.StringHandle):
+             shandle = handle
+         else:
+             shandle = File.StringHandle(handle.read())
+ 
+         # copy the handle so we have it if we find an error
+         copy_handle = copy.deepcopy(shandle)
+ 
+         try:
+             return self._b_parser.parse(shandle)
+         except SyntaxError, msg:
+             # if we have a bad_report_file, save the info to it first
+             if self._bad_report_handle:
+                 # copy the handle so we can write it
+                 error_handle = copy.deepcopy(copy_handle)
+                 # send the info to the error handle
+                 self._bad_report_handle.write(error_handle.read())
+ 
+             # now we want to try and diagnose the error
+             self._diagnose_error(copy_handle, self._b_parser._consumer.data)
+ 
+             # if we got here we can't figure out the problem
+             # so we should pass along the syntax error we got
+             raise SyntaxError, msg
+ 
+     def _diagnose_error(self, handle, data_record):
+         """Attempt to diagnose an error in the passed handle.
+ 
+         Arguments:
+         o handle - The handle potentially containing the error
+         o data_record - The data record partially created by the consumer.
+         """
+         line = handle.readline()
+ 
+         while line:
+             # 'Searchingdone' instead of 'Searching......done' seems
+             # to indicate a failure to perform the BLAST due to
+             # low quality sequence
+             if line[:13] == 'Searchingdone':
+                 raise LowQualityBlastError("Blast failure occured on query: ",
+                                            data_record.query)
+             line = handle.readline()
+             
  class BlastParser:
      """Parses BLAST data into a Record.Blast object.
  


More information about the Biopython-dev mailing list