[Bioperl-l] possible bug printing GenBank feature qualfiers
Scott Markel
smarkel at scitegic.com
Fri Mar 31 19:17:19 UTC 2006
Hilmar,
Thanks for the detailed reply. The dynamic features we create
are added using Bio::SeqFeature::Generic.
I'm happy to use any of your suggestions. If the overload
statement in Bio::Annotation::SimpleValue could be changed
in the repository, that's probably the cleanest solution from
my point of view. The GenBank format writer is just an
instance of SimpleValue's use.
Since I don't use bioperl-live in our product environment, I
typically handle issues like this either in our code or try
to be as faithful as I can to the direction bioperl-live is
heading. In the latter case, I usually make the same change(s)
in our copy of the released version.
Scott
Hilmar Lapp wrote:
> Scott,
>
> your fix assumes that $value in reality is not a scalar but a hash ref
> and that it has a key "value".
>
> Apparently in your test environment this is all indeed true, but there
> is no guarantee that this will still be true tomorrow when you next
> update from CVS (or install a new version).
>
> It seems to me that making feature tag values Bio::AnnotationI objects
> and the stringification overload is what is interfering here. More
> specifically, the broken overload in Bio::Annotation::SimpleValue
>
> use overload '""' => sub { $_[0]->value || ''};
>
> will lead exactly to the behavior you see (b/c $_[0]->value evaluates to
> false if the value is '0').
>
> You say you build and populate the feature dynamically - are you using
> Bio::SeqFeature::Annotated for this? Bio::SeqFeature::Generic is slated
> to get this behavior reverted, i.e., will return to using scalars for
> tag values. (Or so I recall ...)
>
> To fix the problem for you now, I suggest you either fix the overload
> statement above to be
>
> use overload '""' => sub { defined($_[0]->value) ? $_[0]->value : '' };
>
> I suppose this should in fact be committed to the repository - does
> anybody see any damage from this change?
>
> Or, if you do want to mess with the GenBank format writer, protect the
> conversion to string and use the object access method:
>
> if (ref($value) && $value->isa("Bio::Annotation::SimpleValue")) {
> # convert SimpleValue object to represented (string) value
> $value = $value->value;
> }
>
> Hth,
>
> -hilmar
>
> On Mar 31, 2006, at 9:31 AM, Scott Markel wrote:
>
>> Chris,
>>
>> Looks like I made my test case too simple. In our application,
>> which calls BioPerl, I'm creating the feature with the zero-
>> valued qualifier. It's not being read in from a file, so
>> my only issue is with writing GenBank files. The real feature
>> is one for a primer binding site. The qualifier contains the
>> number of mismatches. The one line change of
>>
>> $value = $value->{"value"}
>>
>> definitely fixes our problem and causes no regression
>> failures in our application.
>>
>> Scott
>>
>> Chris Fields wrote:
>>
>>> I tried this on WinXP (I'm using bioperl-live) and got a warning:
>>>
>>> -------------------- WARNING ---------------------
>>> MSG: Unexpected error in feature table for Skipping feature,
>>> attempting to
>>> recover
>>> ---------------------------------------------------
>>>
>>> Running using debugging shows that no feature key was found in
>>> _read_FTHelper_GenBank. So I'm getting an error, but on input not
>>> output.
>>> In fact, turning on -verbose in the SeqIO input object gives the
>>> below extra
>>> output, whereas turning -verbose on only in the output object just
>>> gives the
>>> warning above.
>>>
>>> ====================================
>>> C:\Perl\Scripts\gb_test>test.pl
>>> no feature key!
>>>
>>> -------------------- WARNING ---------------------
>>> MSG: Unexpected error in feature table for Skipping feature,
>>> attempting to
>>> recover
>>> STACK Bio::SeqIO::genbank::next_seq
>>> C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583
>>> STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18
>>> sequence length is 10
>>> ====================================
>>>
>>> The sequence came back w/o any features in the feature table, which
>>> is what
>>> I would expect from this error:
>>> ====================================
>>> LOCUS MY_LOCUS 10 aa linear linear
>>> DEFINITION my description.
>>> ACCESSION 12345
>>> KEYWORDS .
>>> FEATURES Location/Qualifiers
>>> ORIGIN
>>> 1 atggagaact
>>> //
>>> ====================================
>>>
>>> Adding the extra line before the s/// didn't help any (warning still
>>> pops
>>> up, no change in output). Anybody out there with any ideas?
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher - Switzer Lab
>>> Dept. of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Scott Markel
>>>> Sent: Thursday, March 30, 2006 7:18 PM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] possible bug printing GenBank feature qualfiers
>>>>
>>>> In our upgrade from BioPerl 1.4 to 1.5.1 we tripped over the
>>>> following.
>>>>
>>>> Annotation tags used by Bio::SeqIO::FTHelper were strings and
>>>> are now Bio::Annotation::SimpleValue. In the _print_GenBank_FTHelper
>>>> subroutine of Bio::SeqIO::genbank the following code still
>>>> assumes that tags are strings.
>>>>
>>>> foreach my $tag ( keys %{$fth->field} ) {
>>>> foreach my $value ( @{$fth->field->{$tag}} ) {
>>>> $value =~ s/\"/\"\"/g;
>>>>
>>>> If the tag value was a zero, an empty string is written.
>>>>
>>>> We think that
>>>>
>>>> $value = $value->{"value"};
>>>>
>>>> should be added before the s/// call.
>>>>
>>>> Here's our test case. Note that the qualifier value for "foo"
>>>> is changed to an empty string.
>>>>
>>>> Input file
>>>>
>>>> ====================================
>>>> LOCUS MY_LOCUS 10 aa linear UNK
>>>> DEFINITION my description.
>>>> ACCESSION 12345
>>>> FEATURES Location/Qualifiers
>>>> misc_feature 1..10
>>>> /foo="0"
>>>> ORIGIN
>>>> 1 atggagaact
>>>> //
>>>> ====================================
>>>>
>>>> Perl code
>>>> ====================================
>>>> use strict;
>>>> use warnings;
>>>>
>>>> use Bio::SeqIO;
>>>>
>>>> my $inputFilename = "input.gbff";
>>>> my $outputFilename = "output.gbff";
>>>>
>>>> my $in = Bio::SeqIO->new(-file => $inputFilename,
>>>> -format => "genbank");
>>>> my $out = Bio::SeqIO->new(-file => ">$outputFilename",
>>>> -format => "genbank");
>>>>
>>>> my $sequence = $in->next_seq();
>>>> $out->write_seq($sequence);
>>>> ====================================
>>>>
>>>> Output file
>>>> ====================================
>>>> LOCUS MY_LOCUS 10 aa linear linear
>>>> DEFINITION my description.
>>>> ACCESSION 12345
>>>> KEYWORDS .
>>>> FEATURES Location/Qualifiers
>>>> misc_feature 1..10
>>>> /foo=""
>>>> ORIGIN
>>>> 1 atggagaact
>>>> //
>>>> ====================================
>>>>
>>>> I'll add this to bugzilla, but first I want to make sure
>>>> I'm not missing something obvious.
>>>>
>>>> Scott
--
Scott Markel, Ph.D.
Principal Bioinformatics Architect email: smarkel at scitegic.com
SciTegic Inc. mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253
San Diego, CA 92123 fax: +1 858 279 8804
USA web: http://www.scitegic.com
More information about the Bioperl-l
mailing list