[Bioperl-l] genbank parsing of multiple 'function' tags within primary tag
Fields, Christopher J
cjfields at illinois.edu
Thu Sep 8 12:51:22 EDT 2011
There is no need to do that if one is using the Bio::SeqFeatureI interface. Note that get_tag_values always returns a list, so to snag a single value for a tag in a scalar, force list context on the LHS by enclosing the variable in ().
chris
-----------------------------
#!/usr/bin/env perl
use Modern::Perl;
use Bio::SeqIO;
my $in = Bio::SeqIO->new(-format => 'genbank',
-file => shift);
while (my $seq = $in->next_seq) {
for my $feat ($seq->get_SeqFeatures) {
next unless $feat->primary_tag eq 'CDS';
my ($locus) = $feat->has_tag('locus_tag') ?
$feat->get_tag_values('locus_tag') : '';
my @funcs = $feat->has_tag('function') ?
$feat->get_tag_values('function') : ();
say join("\t", $locus, join(',', at funcs));
}
}
On Sep 8, 2011, at 11:28 AM, Surya Saha wrote:
> You might want to explore using a hash of complex records that are very
> similar to structures in C/C++. More info at
> http://perldoc.perl.org/perldsc.html#Declaration-of-a-HASH-OF-COMPLEX-RECORDS
>
> -Surya
>
> On Thu, Sep 8, 2011 at 12:14 PM, galeb abu-ali <abualiga2 at gmail.com> wrote:
>
>> I only had a quick look at your code, so maybe I'm missing something but
>> you are currently pushing all products of all CDSs into the same array,
>> i.e. you do not assign them to a datastructure that links a particular
>> CDS to a list of products. You then use the same index to print out a
>> locus from the @loci array and a product from @products, but the two
>> will not match up because you will have more products than loci.
>>
>>
>>
>> That's right. Products are not the issue in this particular case, as it's
>> E.coli and there's no alternate splicing as far as I know so there is a
>> single product per gene. But there are plenty more 'function' qualifiers,
>> for example, than loci. And I don't know how to create a data structure
>> that
>> will link a 'gene' (as primary tag) to all other qualifiers, whether they
>> belong to 'CDS', 'Misc_RNA', 'Misc_feature', or other primary tags.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list