[Bioperl-l] Long /labels are wrapped, but can't be read

Adam Sjøgren adsj at novozymes.com
Mon Sep 28 07:51:15 UTC 2009


  Hi.


I am wondering whether this is a buglet or just a case of "Don't do
that":

If I set a very long /label on a feature and output the sequence in EMBL
format, the qualifier value gets wrapped, but not quoted.

When BioPerl reads such a file, an exception is thrown.

I probably shouldn't be setting very long labels... But oughtn't BioPerl
throw an exception when a too long label is set, or automatically quote
the value when it is long enough to be wrapped, or know how to read a
wrapped yet unquoted value?

I will be happy to try and provide a patch for whichever solution is
preferred.

Here is an example script:

  #!/usr/bin/perl

  use strict;
  use warnings;

  use IO::String;

  use Bio::Seq;
  use Bio::SeqFeature::Generic;
  use Bio::SeqIO;

  print 'BioPerl ' . $Bio::Root::Version::VERSION . "\n";

  my $seq=Bio::Seq->new(-seq=>'ATG');
  my $feature=Bio::SeqFeature::Generic->new(-primary=>'misc_feature', -start=>1, -end=>3);
  $feature->add_tag_value(label=>'averylonglabelthisisindeedbutitoughttoworkanywaydontyouthink');
  $seq->add_SeqFeature($feature);

  my $out_string=out($seq);
  print $out_string;

  my $fh=IO::String->new($out_string);
  my $in=Bio::SeqIO->new(-fh=>$fh, -format=>'EMBL');
  my $in_seq=$in->next_seq;

  print "Done\n";

  sub out {
      my ($seq)=@_;

      my $string='';
      my $fh=IO::String->new($string);
      my $out=Bio::SeqIO->new(-fh=>$fh, -format=>'EMBL');
      $out->write_seq($seq);

      return $string;
  }

Which gives this output when run:

  BioPerl 1.0069
  ID   unknown; SV 1; linear; unassigned DNA; STD; UNC; 3 BP.
  XX
  AC   unknown;
  XX
  XX
  FH   Key             Location/Qualifiers
  FH
  FT   misc_feature    1..3
  FT                   /label=averylonglabelthisisindeedbutitoughttoworkanywaydont
  FT                   youthink
  XX
  SQ   Sequence 3 BP; 1 A; 0 C; 1 G; 1 T; 0 other;
       atg                                                                       3
  //

  ------------- EXCEPTION: Bio::Root::Exception -------------
  MSG: Can't see new qualifier in: youthink
  from:
  /label=averylonglabelthisisindeedbutitoughttoworkanywaydont
  youthink

  STACK: Error::throw
  STACK: Bio::Root::Root::throw Bio/Root/Root.pm:368
  STACK: Bio::SeqIO::embl::_read_FTHelper_EMBL Bio/SeqIO/embl.pm:1294
  STACK: Bio::SeqIO::embl::next_seq Bio/SeqIO/embl.pm:392
  STACK: /z/home/adsj/bugs/bioperl/embl/embl.pl:24
  -----------------------------------------------------------

If I change the value to include "-quotes ("simulating" that embl.pm
quotes the value), BioPerl can read the EMBL string it produces fine:

  -----------------------------------------------------------
  adsj at ala:~/work/bioperl/bioperl-live$ perl -I. ~/bugs/bioperl/embl/embl.pl 
  BioPerl 1.0069
  ID   unknown; SV 1; linear; unassigned DNA; STD; UNC; 3 BP.
  XX
  AC   unknown;
  XX
  XX
  FH   Key             Location/Qualifiers
  FH
  FT   misc_feature    1..3
  FT                   /label=""averylonglabelthisisindeedbutitoughttoworkanywaydo
  FT                   ntyouthink""
  XX
  SQ   Sequence 3 BP; 1 A; 0 C; 1 G; 1 T; 0 other;
       atg                                                                       3
  //
  Done


  Best regards,

     Adam

-- 
                                                          Adam Sjøgren
                                                    adsj at novozymes.com




More information about the Bioperl-l mailing list