[Bioperl-l] possible bug printing GenBank feature qualfiers

Chris Fields cjfields at uiuc.edu
Fri Mar 31 20:27:18 UTC 2006


Well, I'm running off bioperl-live now using WinXP and latest ActivePerl
(5.8.8.817) and, although all tests pass (SeqIO and genbank), I keep getting
the same error and erroneous output using Scott's (albeit very simple)
sequence example.  The problem seems to pop up on the input end, not output
(the 'no feature key' in the below output only shows up when -verbose is
turned on with the input SeqIO object).  

Questions:

1) Is the below example Scott gave valid GenBank format?  I don't know, but
it looks okay.
2) If so, should it work?  Yes, no question.
3) And if it is supposed to, why isn't it working here?  Don't know, but any
of the mentioned fixes don't do anything (get rid of the error) for me.
Scott gets it to work but my guess is that it is b/c he's using a Linux/UNIX
flavor.  Can't wait 'til I get my MacTel (4 more months....)

I'm personally not too worried about it at the moment as anything I passed
through SeqIO has worked w/o a problem, even on WinXP.  It's just a bit
frustrating to see something fail here that seems to work elsewhere.

So here's what I did:

input.gbff
====================================
LOCUS       MY_LOCUS                  10 aa            linear   UNK
DEFINITION  my description.
ACCESSION   12345
FEATURES             Location/Qualifiers
      misc_feature    1..10
                      /foo="0"
ORIGIN
         1 atggagaact
//
====================================

Run through this:
====================================
use Bio::SeqIO;

my $inputFilename = "input.gbff";
my $outputFilename = "output.gbff";

my $in  = Bio::SeqIO->new(-verbose  => 1,
                          -file   => $inputFilename,
                           -format => "genbank");

my $out = Bio::SeqIO->new(-verbose => 0,
                          -file => ">$outputFilename",
                           -format => "genbank");

my $sequence = $in->next_seq();
$out->write_seq($sequence);
====================================

Gets this error:

====================================
C:\Perl\Scripts\gb_test>test.pl
no feature key!

-------------------- WARNING ---------------------
MSG: Unexpected error in feature table for  Skipping feature, attempting to
recover
STACK Bio::SeqIO::genbank::next_seq
C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583
STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18
sequence length is 10
====================================

And this output, which is missing the feature, somewhat expected judging
from the error (output.gbff)

====================================
LOCUS       MY_LOCUS                  10 aa            linear   linear 
DEFINITION  my description.
ACCESSION   12345
KEYWORDS    .
FEATURES             Location/Qualifiers
ORIGIN      
        1 atggagaact
//
====================================

I wouldn't be a bit surprised if it is a WinXP-specific issue, so I'll give
it a try this weekend on Mac OS X using the latest CVS to see what happens.


Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar
> Lapp
> Sent: Friday, March 31, 2006 1:51 PM
> To: Chris Fields
> Cc: Scott Markel; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] possible bug printing GenBank feature qualfiers
> 
> The only problem with Heikki's version of the line is that if the
> value is undefined you get a (ugly) warning from perl stating that you
> printed an undefined value. Since normally tags should always have a
> value (even if an empty string) this is a rather theoretical issue.
> 
>    -hilmar
> 
> On 3/31/06, Chris Fields <cjfields at uiuc.edu> wrote:
> > Sorry about that; stupid Outlook sent my mail before I had a chance to
> > finish it up.
> >
> > The Bio::Annotation::Simple fix sounds best, but the problem is that CVS
> > shows a fix on this line by Heikki after 1.5.1 was released:
> >
> >        fix to allow 0 values despite operator overload (Paul Mooney)
> >
> > which changed the overload to:
> >
> >          use overload '""' => sub { $_[0]->value};
> >
> > I'll try out your fix here to see if it breaks anything (can't see why
> it
> > would), but I may need to dig through the archives a little to see why
> this
> > latest change was made.  If everything works and passes tests I'll roll
> back
> > the commit I made to Bio::SeqIO::genbank earlier today.
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> > > Sent: Friday, March 31, 2006 12:23 PM
> > > To: Scott Markel
> > > Cc: bioperl-l at lists.open-bio.org; Chris Fields
> > > Subject: Re: [Bioperl-l] possible bug printing GenBank feature
> qualfiers
> > >
> > > Scott,
> > >
> > > your fix assumes that $value in reality is not a scalar but a hash ref
> > > and that it has a key "value".
> > >
> > > Apparently in your test environment this is all indeed true, but there
> > > is no guarantee that this will still be true tomorrow when you next
> > > update from CVS (or install a new version).
> > >
> > > It seems to me that making feature tag values Bio::AnnotationI objects
> > > and the stringification overload is what is interfering here. More
> > > specifically, the broken overload in Bio::Annotation::SimpleValue
> > >
> > >       use overload '""' => sub { $_[0]->value || ''};
> > >
> > > will lead exactly to the behavior you see (b/c $_[0]->value evaluates
> > > to false if the value is '0').
> > >
> > > You say you build and populate the feature dynamically - are you using
> > > Bio::SeqFeature::Annotated for this? Bio::SeqFeature::Generic is
> slated
> > > to get this behavior reverted, i.e., will return to using scalars for
> > > tag values. (Or so I recall ...)
> > >
> > > To fix the problem for you now, I suggest you either fix the overload
> > > statement above to be
> > >
> > >       use overload '""' => sub { defined($_[0]->value) ? $_[0]->value
> : ''
> > > };
> > >
> > > I suppose this should in fact be committed to the repository - does
> > > anybody see any damage from this change?
> > >
> > > Or, if you do want to mess with the GenBank format writer, protect the
> > > conversion to string and use the object access method:
> > >
> > >       if (ref($value) && $value->isa("Bio::Annotation::SimpleValue"))
> {
> > >               # convert SimpleValue object to represented (string)
> value
> > >               $value = $value->value;
> > >       }
> > >
> > > Hth,
> > >
> > >       -hilmar
> > >
> > > On Mar 31, 2006, at 9:31 AM, Scott Markel wrote:
> > >
> > > > Chris,
> > > >
> > > > Looks like I made my test case too simple.  In our application,
> > > > which calls BioPerl, I'm creating the feature with the zero-
> > > > valued qualifier.  It's not being read in from a file, so
> > > > my only issue is with writing GenBank files.  The real feature
> > > > is one for a primer binding site.  The qualifier contains the
> > > > number of mismatches.  The one line change of
> > > >
> > > >      $value = $value->{"value"}
> > > >
> > > > definitely fixes our problem and causes no regression
> > > > failures in our application.
> > > >
> > > > Scott
> > > >
> > > > Chris Fields wrote:
> > > >
> > > >> I tried this on WinXP (I'm using bioperl-live) and got a warning:
> > > >>
> > > >> -------------------- WARNING ---------------------
> > > >> MSG: Unexpected error in feature table for  Skipping feature,
> > > >> attempting to
> > > >> recover
> > > >> ---------------------------------------------------
> > > >>
> > > >> Running using debugging shows that no feature key was found in
> > > >> _read_FTHelper_GenBank.  So I'm getting an error, but on input not
> > > >> output.
> > > >> In fact, turning on -verbose in the SeqIO input object gives the
> > > >> below extra
> > > >> output, whereas turning -verbose on only in the output object just
> > > >> gives the
> > > >> warning above.
> > > >>
> > > >> ====================================
> > > >> C:\Perl\Scripts\gb_test>test.pl
> > > >> no feature key!
> > > >>
> > > >> -------------------- WARNING ---------------------
> > > >> MSG: Unexpected error in feature table for  Skipping feature,
> > > >> attempting to
> > > >> recover
> > > >> STACK Bio::SeqIO::genbank::next_seq
> > > >> C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583
> > > >> STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18
> > > >> sequence length is 10
> > > >> ====================================
> > > >>
> > > >> The sequence came back w/o any features in the feature table, which
> > > >> is what
> > > >> I would expect from this error:
> > > >> ====================================
> > > >> LOCUS       MY_LOCUS                  10 aa            linear
> linear
> > > >> DEFINITION  my description.
> > > >> ACCESSION   12345
> > > >> KEYWORDS    .
> > > >> FEATURES             Location/Qualifiers
> > > >> ORIGIN
> > > >>         1 atggagaact
> > > >> //
> > > >> ====================================
> > > >>
> > > >> Adding the extra line before the s/// didn't help any (warning
> still
> > > >> pops
> > > >> up, no change in output).  Anybody out there with any ideas?
> > > >>
> > > >> Christopher Fields
> > > >> Postdoctoral Researcher - Switzer Lab
> > > >> Dept. of Biochemistry
> > > >> University of Illinois Urbana-Champaign
> > > >>
> > > >>
> > > >>





More information about the Bioperl-l mailing list