[Bioperl-l] Invalid EMBL files generated in rare circumstances; line wrapping

Fields, Christopher J cjfields at illinois.edu
Mon Sep 29 15:41:10 UTC 2014


I can reproduce that on master branch.  It’s a weird consequence/side-effect of the text wrapping I think; if you remove the space at the end of the string of X’s and allow the module to text wrap the line it works fine.  I don’t think we’ve ever run into it frankly.  

If possible can you file it as a bug on GitHub?

chris

On Sep 29, 2014, at 10:17 AM, Adam Sjøgren <adsj at novozymes.com> wrote:

>  Hi.
> 
> If you craft a tag on a feature sneakily (or if you are unlucky)
> Bio::SeqIO will create invalid EMBL, separating the "/" from the
> qualifier name:
> 
>    ID   unknown; SV 1; linear; unassigned DNA; STD; UNC; 4 BP.
>    XX
>    AC   unknown;
>    XX
>    XX
>    XX
>    FH   Key             Location/Qualifiers
>    FH
>    FT   CDS             1..4
>    FT                   /
>    FT                   note="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>    FT                   X"
>    XX
>    SQ   Sequence 4 BP; 1 A; 1 C; 1 G; 1 T; 0 other;
>         actg                                                                      4
>    //
> 
> In this example "/" and "note" are on separate lines, which is wrong; at
> least BioPerl does not accept it itself.
> 
> Here is a script to create the above output (BioPerl 1.6.901 used):
> 
>    #!/usr/bin/perl
> 
>    use strict;
>    use warnings;
> 
>    use Bio::Seq::RichSeq;
>    use Bio::SeqFeature::Generic;
>    use IO::String;
>    use Bio::SeqIO;
> 
>    my $seq=Bio::Seq::RichSeq->new(-display_id=>'TEST', -seq=>'actg');
>    my $cds=Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', -start=>1, -end=>4);
>    $cds->add_tag_value(note=>'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX X');
>    $seq->add_SeqFeature($cds);
> 
>    my $string;
>    my $str=IO::String->new($string);
>    my $io=Bio::SeqIO->new(-fh=>$str, -format=>'embl');
>    $io->write_seq($seq);
>    print $string;
> 
> Changing the position of the space in the note makes a/the difference.
> 
> Maybe there is a bug lurking in the line wrapping/formatting code
> somewhere...
> 
> Does this sound like a bug to anyone else?
> 
>  Best regards,
> 
>    Adam
> 
> -- 
>                                                          Adam Sjøgren
>                                                    adsj at novozymes.com
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list