[Bioperl-l] Long /labels are wrapped, but can't be read

Chris Fields cjfields at illinois.edu
Tue Sep 29 23:54:04 EDT 2009


Adam,

Not sure, but this could be a case of 'both'.  Labels that are quoted  
and aren't are currently distinguished via a global hash lookup  
(%FTQUAL_NO_QUOTE) due to the way the parser works; there is some  
logic behind this, just can't quite recall at the moment why it is  
this way.  You could set a hash key for the label in cases where it  
isn't quoted, that should work.  You can also test out the  
Bio::SeqIO::embldriver version (-format => 'embldriver').

If the above doesn't work out it's worth filing a bug for this  
behavior, though I'm not sure how easily it will be to fix.

chris

On Sep 28, 2009, at 2:51 AM, Adam Sjøgren wrote:

>  Hi.
>
>
> I am wondering whether this is a buglet or just a case of "Don't do
> that":
>
> If I set a very long /label on a feature and output the sequence in  
> EMBL
> format, the qualifier value gets wrapped, but not quoted.
>
> When BioPerl reads such a file, an exception is thrown.
>
> I probably shouldn't be setting very long labels... But oughtn't  
> BioPerl
> throw an exception when a too long label is set, or automatically  
> quote
> the value when it is long enough to be wrapped, or know how to read a
> wrapped yet unquoted value?
>
> I will be happy to try and provide a patch for whichever solution is
> preferred.
>
> Here is an example script:
>
>  #!/usr/bin/perl
>
>  use strict;
>  use warnings;
>
>  use IO::String;
>
>  use Bio::Seq;
>  use Bio::SeqFeature::Generic;
>  use Bio::SeqIO;
>
>  print 'BioPerl ' . $Bio::Root::Version::VERSION . "\n";
>
>  my $seq=Bio::Seq->new(-seq=>'ATG');
>  my $feature=Bio::SeqFeature::Generic->new(-primary=>'misc_feature',  
> -start=>1, -end=>3);
>  $feature->add_tag_value 
> (label 
> =>'averylonglabelthisisindeedbutitoughttoworkanywaydontyouthink');
>  $seq->add_SeqFeature($feature);
>
>  my $out_string=out($seq);
>  print $out_string;
>
>  my $fh=IO::String->new($out_string);
>  my $in=Bio::SeqIO->new(-fh=>$fh, -format=>'EMBL');
>  my $in_seq=$in->next_seq;
>
>  print "Done\n";
>
>  sub out {
>      my ($seq)=@_;
>
>      my $string='';
>      my $fh=IO::String->new($string);
>      my $out=Bio::SeqIO->new(-fh=>$fh, -format=>'EMBL');
>      $out->write_seq($seq);
>
>      return $string;
>  }
>
> Which gives this output when run:
>
>  BioPerl 1.0069
>  ID   unknown; SV 1; linear; unassigned DNA; STD; UNC; 3 BP.
>  XX
>  AC   unknown;
>  XX
>  XX
>  FH   Key             Location/Qualifiers
>  FH
>  FT   misc_feature    1..3
>  FT                   / 
> label=averylonglabelthisisindeedbutitoughttoworkanywaydont
>  FT                   youthink
>  XX
>  SQ   Sequence 3 BP; 1 A; 0 C; 1 G; 1 T; 0 other;
>        
> atg 
>                                                                        3
>  //
>
>  ------------- EXCEPTION: Bio::Root::Exception -------------
>  MSG: Can't see new qualifier in: youthink
>  from:
>  /label=averylonglabelthisisindeedbutitoughttoworkanywaydont
>  youthink
>
>  STACK: Error::throw
>  STACK: Bio::Root::Root::throw Bio/Root/Root.pm:368
>  STACK: Bio::SeqIO::embl::_read_FTHelper_EMBL Bio/SeqIO/embl.pm:1294
>  STACK: Bio::SeqIO::embl::next_seq Bio/SeqIO/embl.pm:392
>  STACK: /z/home/adsj/bugs/bioperl/embl/embl.pl:24
>  -----------------------------------------------------------
>
> If I change the value to include "-quotes ("simulating" that embl.pm
> quotes the value), BioPerl can read the EMBL string it produces fine:
>
>  -----------------------------------------------------------
>  adsj at ala:~/work/bioperl/bioperl-live$ perl -I. ~/bugs/bioperl/embl/ 
> embl.pl
>  BioPerl 1.0069
>  ID   unknown; SV 1; linear; unassigned DNA; STD; UNC; 3 BP.
>  XX
>  AC   unknown;
>  XX
>  XX
>  FH   Key             Location/Qualifiers
>  FH
>  FT   misc_feature    1..3
>  FT                   / 
> label=""averylonglabelthisisindeedbutitoughttoworkanywaydo
>  FT                   ntyouthink""
>  XX
>  SQ   Sequence 3 BP; 1 A; 0 C; 1 G; 1 T; 0 other;
>        
> atg 
>                                                                        3
>  //
>  Done
>
>
>  Best regards,
>
>     Adam
>
> -- 
>                                                          Adam Sjøgren
>                                                    adsj at novozymes.com
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list