[Bioperl-l] genbank parsing of multiple 'function' tags within primary tag

Fields, Christopher J cjfields at illinois.edu
Thu Sep 8 12:51:22 EDT 2011


There is no need to do that if one is using the Bio::SeqFeatureI interface.  Note that get_tag_values always returns a list, so to snag a single value for a tag in a scalar, force list context on the LHS by enclosing the variable in ().

chris

-----------------------------
#!/usr/bin/env perl

use Modern::Perl;
use Bio::SeqIO;

my $in = Bio::SeqIO->new(-format => 'genbank',
                         -file => shift);

while (my $seq = $in->next_seq) {
   for my $feat ($seq->get_SeqFeatures) {
       next unless $feat->primary_tag eq 'CDS';
       my ($locus) = $feat->has_tag('locus_tag') ? 
                     $feat->get_tag_values('locus_tag') : '';
       my @funcs = $feat->has_tag('function') ?
           $feat->get_tag_values('function') : ();
       say join("\t", $locus, join(',', at funcs));
   }
}



On Sep 8, 2011, at 11:28 AM, Surya Saha wrote:

> You might want to explore using a hash of complex records that are very
> similar to structures in C/C++. More info at
> http://perldoc.perl.org/perldsc.html#Declaration-of-a-HASH-OF-COMPLEX-RECORDS
> 
> -Surya
> 
> On Thu, Sep 8, 2011 at 12:14 PM, galeb abu-ali <abualiga2 at gmail.com> wrote:
> 
>> I only had a quick look at your code, so maybe I'm missing something but
>> you are currently pushing all products of all CDSs into the same array,
>> i.e. you do not assign them to a datastructure that links a particular
>> CDS to a list of products. You then use the same index to print out a
>> locus from the @loci array and a product from @products, but the two
>> will not match up because you will have more products than loci.
>> 
>> 
>> 
>> That's right. Products are not the issue in this particular case, as it's
>> E.coli and there's no alternate splicing as far as I know so there is a
>> single product per gene. But there are plenty more 'function' qualifiers,
>> for example, than loci. And I don't know how to create a data structure
>> that
>> will link a 'gene' (as primary tag) to all other qualifiers, whether they
>> belong to 'CDS', 'Misc_RNA', 'Misc_feature', or other primary tags.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list