[Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values

Lincoln Stein lstein at cshl.edu
Fri Feb 23 17:16:01 UTC 2007


Hi Malcom,

You're quite right, and I appreciate your work in tracking down and fixing
it. Before you commit the patch, can you confirm that the loader is working
correctly so that comma-separated values are read back into the data
structure as multiple attributes?

Lincoln

On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
> Lincoln, and other Bio::DB::SeqFeature wanderers:
>
> I find that generating GFF from a Bio::DB::SeqFeature using gff3_string
> does not respect the following:
>
> "Multiple attributes of the same type are indicated by separating the
> values with the comma "," character"  (c.f.
> http://www.sequenceontology.org/gff3.shtml)
>
> This one-liner demonstrates the problem:
>
> perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id =>
> "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A',
> -name => 'mec', -attributes => {foo =>  [qw(bar blat)]})->gff3_string'
> J       A       PH      1       2       .       .       .
> foo=bar;foo=blat;Name=mec
>
> Do you agree this is a problem?
>
> The fix is in the post-sig patch to
> /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the
> stylistic privilege of promoting any ID, Parent, or Name attribute to
> the front of column 9, so output is now:
>
> J       A       PH      1       2       .       .       .
> Name=mec;foo=bar,blat
>
> Do you agree this is better?
>
> I am poised to commit it, as well as the functionally same patch to the
> equivilent function in Bio/Graphics/FeatureBase.pm
>
> All clear?
>
> -- Malcolm Cook
>
>
>
> *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
> --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
> ***************
> *** 481,494 ****
>       next if $t eq 'load_id';
>       next if $t eq 'parent_id';
>       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> !
> !     push @result,join '=',$self->escape($t),$self->escape($_) foreach
> @values;
>     }
>     my $id   = $self->primary_id;
>     my $name = $self->display_name;
> !   push @result,"ID=".$self->escape($id)                     if defined
> $id;
> !   push @result,"Parent=".$self->escape($parent->primary_id) if defined
> $parent;
> !   push @result,"Name=".$self->escape($name)                   if
> defined $name;
>     return join ';', at result;
>   }
>
> --- 481,498 ----
>       next if $t eq 'load_id';
>       next if $t eq 'parent_id';
>       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> !
> !      push @result,join '=',$self->escape($t),$self->escape($_) foreach
> @values;
> !     # NO! Multiple attributes of the same type are indicated by
> !     # separating the values with the comma "," character - per
> !     # http://www.sequenceontology.org/gff3.shtml.  Do it this way:
> !     #push @result,join '=',$self->escape($t),join(',', map
> {$self->escape($_)} @values);
>     }
>     my $id   = $self->primary_id;
>     my $name = $self->display_name;
> !   unshift @result,"ID=".$self->escape($id)                     if
> defined $id;
> !   unshift @result,"Parent=".$self->escape($parent->primary_id) if
> defined $parent;
> !   unshift @result,"Name=".$self->escape($name)                   if
> defined $name;
>     return join ';', at result;
>   }
>
>
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu



More information about the Bioperl-l mailing list