[Bioperl-l] Navigating a genbank file

Adlai Burman adlai at refenestration.com
Fri Feb 24 00:43:41 UTC 2012


I am struggling with Bioperl to do something which I know is simple (I stumbled on it before, but it was just by dumb luck and I forgot how it is done). Basically what I am trying to do is, given a target gene symbol extract the intergenic region between that CDS and the next one 3'. One ugly way I could do this would be:
 (1) Make an array of all of the CDSs in each gb file
(2) Make a second run through the file using either the symbol prior or post the target symbol (depending on the strand of the target symbol).

This is, of course, cumbersome and unnecessary but I can't figure out how it should be done.
Here is a skeletal version of what I understand about getting feature info with bioperl.
Can anyone help me figure out how to access the CDS features on either side?
Please?
Thank you,

Adlai

#!/usr/bin/perl                                                                                                     
use strict;
use warnings;
use IO::String;
use Bio::Perl;
use Bio::SeqIO;
use IO::String;


my $target_sym = shift;
my $file = "../Dropbox/local_gb/*";

my $seqio = Bio::SeqIO-> new(
                             -file     => $file,
                             -format => 'GenBank',
    );

my $seq = $seqio->next_seq;
for my $feats ($seq->get_SeqFeatures){
if ($feats->primary_tag eq "CDS"){
  my $start = $feats->location->start;
my $end = $feats->location->end;
my $strand = $feats->strand;
 if ($feats->has_tag('gene')) {
     for my $val ($feats->get_tag_values('gene')){
if ($val eq $target_sym){
print $start."\n";
print "$val\n";
  }

}
}
}
}



More information about the Bioperl-l mailing list