[Bioperl-l] SeqIO::genbank crash special case

jdiggans@genelogic.com jdiggans@genelogic.com
Tue, 11 Dec 2001 17:30:27 -0500


I recently came across a horribly mis-formatted GenBank record on our local
copy that caused SeqIO::genbank to choke. I've fixed the problem in my
local copy but was wondering if bioperl has a policy for what to do in
bizarre use cases?

The problem appears here:

285      # to the last line read before returning
286      my $ftunit = $self->_read_FTHelper_GenBank(\$buffer);
287      # process ftunit
288      $ftunit->_generic_seqfeature($seq);

$ftunit is never tested to ensure it's defined before being used. In the
event something happens in _read_FTHelper_GenBank (my current issue) the
script ends up dying messily. I've patched mine to:

     # to the last line read before returning
     my $ftunit = $self->_read_FTHelper_GenBank(\$buffer);

     # process ftunit - if there is a problem, warn and skip this FT unit
     if( defined($ftunit) ) {
          $ftunit->_generic_seqfeature($seq);
     } else {
          $self->warn("Unexpected feature error - FTUnit undefined,
skipping");
          unless( ($buffer =~ /^\s{5,5}\S+/) or ($buffer =~ /^\S+/)) {
               $buffer = $self->_readline;
          }
     }

Is it worth adding some version of this to genbank.pm to allow a parse to
recover from a single poorly-formatted entry in a feature table? Or within
the bioperl mentality 'should' this kind of error be considered something
terminal?

This particular record happened to have an oddly-placed carriage return in
the middle of a feature range, completely confusing the
_read_FTHelper_GenBank routine and returning undef which then had a sub
called on it.

-j

-------------------------------------------------
James Diggans
Bioinformatics Programmer
Gene Logic, Inc.
Phone: 301.987.1756
FAX: 301.987.1701