[Bioperl-l] SeqIO::genbank crash special case

Ewan Birney birney@ebi.ac.uk
Wed, 12 Dec 2001 07:23:08 +0000 (GMT)


On Tue, 11 Dec 2001 jdiggans@genelogic.com wrote:

> I recently came across a horribly mis-formatted GenBank record on our local
> copy that caused SeqIO::genbank to choke. I've fixed the problem in my
> local copy but was wondering if bioperl has a policy for what to do in
> bizarre use cases?

There is an philosphocial point whether we should throw or we should warn
in these cases. I guess we warn and then if someone sets the severity flag
we blow up. Hmmm.



> 
> The problem appears here:
> 
> 285      # to the last line read before returning
> 286      my $ftunit = $self->_read_FTHelper_GenBank(\$buffer);
> 287      # process ftunit
> 288      $ftunit->_generic_seqfeature($seq);
> 
> $ftunit is never tested to ensure it's defined before being used. In the
> event something happens in _read_FTHelper_GenBank (my current issue) the
> script ends up dying messily. I've patched mine to:
> 
>      # to the last line read before returning
>      my $ftunit = $self->_read_FTHelper_GenBank(\$buffer);
> 
>      # process ftunit - if there is a problem, warn and skip this FT unit
>      if( defined($ftunit) ) {
>           $ftunit->_generic_seqfeature($seq);
>      } else {
>           $self->warn("Unexpected feature error - FTUnit undefined,
> skipping");
>           unless( ($buffer =~ /^\s{5,5}\S+/) or ($buffer =~ /^\S+/)) {
>                $buffer = $self->_readline;
>           }
>      }
> 
> Is it worth adding some version of this to genbank.pm to allow a parse to
> recover from a single poorly-formatted entry in a feature table? Or within
> the bioperl mentality 'should' this kind of error be considered something
> terminal?
> 
> This particular record happened to have an oddly-placed carriage return in
> the middle of a feature range, completely confusing the
> _read_FTHelper_GenBank routine and returning undef which then had a sub
> called on it.
> 


Ok. I am going to apply you patch.



> -j
> 
> -------------------------------------------------
> James Diggans
> Bioinformatics Programmer
> Gene Logic, Inc.
> Phone: 301.987.1756
> FAX: 301.987.1701
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>