[Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values

Cook, Malcolm MEC at stowers-institute.org
Fri Feb 23 18:46:00 UTC 2007


Lincoln,
 
OK.  I'll do that...
 
...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ .... 
 
...ok - parse_attributes _looks_ right to me
 
...so, let's try it
 
#load a feature into a new database:
 
bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev'
-create -user test -pass test <(echo -e
"J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar,blat;Name=mec\n")
 
#It loaded ok.  Now, let's print it out in GFF3:
 
perl -MBio::DB::SeqFeature::Store -e 'foreach
(Bio::DB::SeqFeature::Store->new(-dsn =>
"dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->featu
res(-type => "PH:A")) {print $_->gff3_string . "\n"}'
J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat

#output looks good to me

Note, I tried loading attributes foo=bar;foo=blat and it came back
foo=bar,blat.  So, you can load either way.

I'll commit later today.

--Malcolm  

 



________________________________

	From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com]
On Behalf Of Lincoln Stein
	Sent: Friday, February 23, 2007 11:16 AM
	To: Cook, Malcolm
	Cc: bioperl list; lstein at cshl.org
	Subject: Re: Bio::DB::SeqFeature to GFF mishandles attributes
with multiple values
	
	
	Hi Malcom,
	
	You're quite right, and I appreciate your work in tracking down
and fixing it. Before you commit the patch, can you confirm that the
loader is working correctly so that comma-separated values are read back
into the data structure as multiple attributes? 
	
	Lincoln
	
	
	On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote: 

		Lincoln, and other Bio::DB::SeqFeature wanderers:
		
		I find that generating GFF from a Bio::DB::SeqFeature
using gff3_string
		does not respect the following:
		
		"Multiple attributes of the same type are indicated by
separating the 
		values with the comma "," character"  (c.f.
		http://www.sequenceontology.org/gff3.shtml)
		
		This one-liner demonstrates the problem:
		
		perl -MBio::DB::SeqFeature -e 'print
Bio::DB::SeqFeature->new(-seq_id =>
		"J", -start => 1, -end => 2, -primary_tag => 'PH',
-source => 'A',
		-name => 'mec', -attributes => {foo =>  [qw(bar
blat)]})->gff3_string' 
		J       A       PH      1       2       .       .
.
		foo=bar;foo=blat;Name=mec
		
		Do you agree this is a problem?
		
		The fix is in the post-sig patch to
		/Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also
took the 
		stylistic privilege of promoting any ID, Parent, or Name
attribute to
		the front of column 9, so output is now:
		
		J       A       PH      1       2       .       .
.
		Name=mec;foo=bar,blat
		
		Do you agree this is better? 
		
		I am poised to commit it, as well as the functionally
same patch to the
		equivilent function in Bio/Graphics/FeatureBase.pm
		
		All clear?
		
		-- Malcolm Cook
		
		
		
		*** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
		--- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
		***************
		*** 481,494 ****
		      next if $t eq 'load_id';
		      next if $t eq 'parent_id';
		      foreach (@values) { s/\s+$// } # get rid of
trailing whitespace 
		!
		!     push @result,join
'=',$self->escape($t),$self->escape($_) foreach
		@values;
		    }
		    my $id   = $self->primary_id;
		    my $name = $self->display_name;
		!   push @result,"ID=".$self->escape($id)
if defined 
		$id;
		!   push
@result,"Parent=".$self->escape($parent->primary_id) if defined
		$parent;
		!   push @result,"Name=".$self->escape($name)
if
		defined $name;
		    return join ';', at result; 
		  }
		
		--- 481,498 ----
		      next if $t eq 'load_id';
		      next if $t eq 'parent_id';
		      foreach (@values) { s/\s+$// } # get rid of
trailing whitespace
		!
		!      push @result,join
'=',$self->escape($t),$self->escape($_) foreach 
		@values;
		!     # NO! Multiple attributes of the same type are
indicated by
		!     # separating the values with the comma ","
character - per
		!     # http://www.sequenceontology.org/gff3.shtml.  Do
it this way:
		!     #push @result,join '=',$self->escape($t),join(',',
map
		{$self->escape($_)} @values);
		    }
		    my $id   = $self->primary_id; 
		    my $name = $self->display_name;
		!   unshift @result,"ID=".$self->escape($id)
if
		defined $id;
		!   unshift
@result,"Parent=".$self->escape($parent->primary_id) if 
		defined $parent;
		!   unshift @result,"Name=".$self->escape($name)
if
		defined $name;
		    return join ';', at result;
		  }
		
		
		
		




	-- 
	Lincoln D. Stein
	Cold Spring Harbor Laboratory
	1 Bungtown Road
	Cold Spring Harbor, NY 11724
	(516) 367-8380 (voice)
	(516) 367-8389 (fax)
	FOR URGENT MESSAGES & SCHEDULING, 
	PLEASE CONTACT MY ASSISTANT, 
	SANDRA MICHELSEN, AT michelse at cshl.edu 





More information about the Bioperl-l mailing list