[Bioperl-l] Long /labels are wrapped, but can't be read
Chris Fields
cjfields at illinois.edu
Tue Sep 29 23:54:04 EDT 2009
Adam,
Not sure, but this could be a case of 'both'. Labels that are quoted
and aren't are currently distinguished via a global hash lookup
(%FTQUAL_NO_QUOTE) due to the way the parser works; there is some
logic behind this, just can't quite recall at the moment why it is
this way. You could set a hash key for the label in cases where it
isn't quoted, that should work. You can also test out the
Bio::SeqIO::embldriver version (-format => 'embldriver').
If the above doesn't work out it's worth filing a bug for this
behavior, though I'm not sure how easily it will be to fix.
chris
On Sep 28, 2009, at 2:51 AM, Adam Sjøgren wrote:
> Hi.
>
>
> I am wondering whether this is a buglet or just a case of "Don't do
> that":
>
> If I set a very long /label on a feature and output the sequence in
> EMBL
> format, the qualifier value gets wrapped, but not quoted.
>
> When BioPerl reads such a file, an exception is thrown.
>
> I probably shouldn't be setting very long labels... But oughtn't
> BioPerl
> throw an exception when a too long label is set, or automatically
> quote
> the value when it is long enough to be wrapped, or know how to read a
> wrapped yet unquoted value?
>
> I will be happy to try and provide a patch for whichever solution is
> preferred.
>
> Here is an example script:
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> use IO::String;
>
> use Bio::Seq;
> use Bio::SeqFeature::Generic;
> use Bio::SeqIO;
>
> print 'BioPerl ' . $Bio::Root::Version::VERSION . "\n";
>
> my $seq=Bio::Seq->new(-seq=>'ATG');
> my $feature=Bio::SeqFeature::Generic->new(-primary=>'misc_feature',
> -start=>1, -end=>3);
> $feature->add_tag_value
> (label
> =>'averylonglabelthisisindeedbutitoughttoworkanywaydontyouthink');
> $seq->add_SeqFeature($feature);
>
> my $out_string=out($seq);
> print $out_string;
>
> my $fh=IO::String->new($out_string);
> my $in=Bio::SeqIO->new(-fh=>$fh, -format=>'EMBL');
> my $in_seq=$in->next_seq;
>
> print "Done\n";
>
> sub out {
> my ($seq)=@_;
>
> my $string='';
> my $fh=IO::String->new($string);
> my $out=Bio::SeqIO->new(-fh=>$fh, -format=>'EMBL');
> $out->write_seq($seq);
>
> return $string;
> }
>
> Which gives this output when run:
>
> BioPerl 1.0069
> ID unknown; SV 1; linear; unassigned DNA; STD; UNC; 3 BP.
> XX
> AC unknown;
> XX
> XX
> FH Key Location/Qualifiers
> FH
> FT misc_feature 1..3
> FT /
> label=averylonglabelthisisindeedbutitoughttoworkanywaydont
> FT youthink
> XX
> SQ Sequence 3 BP; 1 A; 0 C; 1 G; 1 T; 0 other;
>
> atg
> 3
> //
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Can't see new qualifier in: youthink
> from:
> /label=averylonglabelthisisindeedbutitoughttoworkanywaydont
> youthink
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw Bio/Root/Root.pm:368
> STACK: Bio::SeqIO::embl::_read_FTHelper_EMBL Bio/SeqIO/embl.pm:1294
> STACK: Bio::SeqIO::embl::next_seq Bio/SeqIO/embl.pm:392
> STACK: /z/home/adsj/bugs/bioperl/embl/embl.pl:24
> -----------------------------------------------------------
>
> If I change the value to include "-quotes ("simulating" that embl.pm
> quotes the value), BioPerl can read the EMBL string it produces fine:
>
> -----------------------------------------------------------
> adsj at ala:~/work/bioperl/bioperl-live$ perl -I. ~/bugs/bioperl/embl/
> embl.pl
> BioPerl 1.0069
> ID unknown; SV 1; linear; unassigned DNA; STD; UNC; 3 BP.
> XX
> AC unknown;
> XX
> XX
> FH Key Location/Qualifiers
> FH
> FT misc_feature 1..3
> FT /
> label=""averylonglabelthisisindeedbutitoughttoworkanywaydo
> FT ntyouthink""
> XX
> SQ Sequence 3 BP; 1 A; 0 C; 1 G; 1 T; 0 other;
>
> atg
> 3
> //
> Done
>
>
> Best regards,
>
> Adam
>
> --
> Adam Sjøgren
> adsj at novozymes.com
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list