[Bioperl-l] Odd problem with get_tag_values

Adlai Burman adlai at refenestration.com
Fri Feb 24 21:55:57 UTC 2012


Jason,
Your first solution, indeed, did the trick (though I'm not sure why). There was no need to for checking "else." I'm not sure why some records with a full set of "gene" tags would not parse without the check, but everything parsed with it.

Brian, you were right.

Thanks again,

Adlai
On Feb 24, 2012, at 10:21 PM, Jason Stajich wrote:

> not all CDS will be annotated with a 'gene' tag, this is due to variation in how annotation is done and that there is not a requirement that there be a gene tag for all CDS features.
> 
> You can protect your query - we often do this when dealing with data from the wild by testing for has_tag first.
> 
> my %strands;
> for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) {
> if( $cds->has_tag('gene') ) {
> 	my ($gene) = $cds->get_tag_values('gene'); # get the 1st one, this returns a list
> 	$strands{$gene} = $cds->strand; 
> } else { # look in alternative places for a name, e.g. locus, 
>  ...
> }
> }
> 
> An alternative is to loop through your list of tags in order of preference
> 
> my %strands;
> for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) {
> for my $tag ( qw(gene locus name product accession note) ) {  
>   if( $cds->has_tag($tag) ) {
> 	my ($name) = $cds->get_tag_values($tag); # get the 1st one, this returns a list
> 	$strands{$name} = $cds->strand;
>        $seen = 1;
>        last;
> }
> if( ! $seen ) { 
> 	warn("not tag found for feature at ", $cds->location->to_FTstring, "\n");
> }
> }
> 
> On Feb 24, 2012, at 12:43 PM, Adlai Burman wrote:
> 
>> I have come across a perplexing problem with trying to parse sequence features into hashes from gb records. This is the minimal code which shows my problem:
>> 
>> #!/usr/bin/perl                                                                                                     
>> use strict;
>> use warnings;
>> use IO::String;
>> use Bio::Perl;
>> use Bio::SeqIO;
>> use IO::String;
>> 
>> my @files = </Users/adlai/Dropbox/atrsh/*>;
>> foreach my $file(@files){
>> 
>> 
>> my @cds_features = grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures;
>> my %strands = map {$_->get_tag_values('gene'), $_->strand} @cds_features; ##This Is The Culprit. 
>> .
>> .
>> .
>> #do nifty stuff
>> }
>> 
>> For some files this approach works just fine.
>> For others the script dies immediately with the error message:
>> 
>> ------------- EXCEPTION -------------
>> MSG: asking for tag value that does not exist gene
>> STACK Bio::SeqFeature::Generic::get_tag_values /Users/adlai/Downloads/BioPerl-1.6.1/Bio/SeqFeature/Generic.pm:517
>> STACK toplevel tosend.pl:16
>> -------------------------------------
>> 
>> The difference in the files that parse and those that don't seems to be that the files that crash have "intron" and "exon" tags. They ALL have "gene" tags.
>> Does anyone know why this is a problem and what can be done to circumvent it?
>> 
>> Thanks,
>> Adlai
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> 
> 





More information about the Bioperl-l mailing list