[Bioperl-l] Invalid EMBL files generated in rare circumstances; line wrapping

Adam Sjøgren adsj at novozymes.com
Mon Sep 29 15:17:31 UTC 2014


  Hi.

If you craft a tag on a feature sneakily (or if you are unlucky)
Bio::SeqIO will create invalid EMBL, separating the "/" from the
qualifier name:

    ID   unknown; SV 1; linear; unassigned DNA; STD; UNC; 4 BP.
    XX
    AC   unknown;
    XX
    XX
    XX
    FH   Key             Location/Qualifiers
    FH
    FT   CDS             1..4
    FT                   /
    FT                   note="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    FT                   X"
    XX
    SQ   Sequence 4 BP; 1 A; 1 C; 1 G; 1 T; 0 other;
         actg                                                                      4
    //

In this example "/" and "note" are on separate lines, which is wrong; at
least BioPerl does not accept it itself.

Here is a script to create the above output (BioPerl 1.6.901 used):

    #!/usr/bin/perl

    use strict;
    use warnings;

    use Bio::Seq::RichSeq;
    use Bio::SeqFeature::Generic;
    use IO::String;
    use Bio::SeqIO;

    my $seq=Bio::Seq::RichSeq->new(-display_id=>'TEST', -seq=>'actg');
    my $cds=Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', -start=>1, -end=>4);
    $cds->add_tag_value(note=>'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX X');
    $seq->add_SeqFeature($cds);

    my $string;
    my $str=IO::String->new($string);
    my $io=Bio::SeqIO->new(-fh=>$str, -format=>'embl');
    $io->write_seq($seq);
    print $string;

Changing the position of the space in the note makes a/the difference.

Maybe there is a bug lurking in the line wrapping/formatting code
somewhere...

Does this sound like a bug to anyone else?

  Best regards,

    Adam

-- 
                                                          Adam Sjøgren
                                                    adsj at novozymes.com



More information about the Bioperl-l mailing list