[Bioperl-l] automation of translation based on alignment

Ross KK Leung ross at cuhk.edu.hk
Mon Mar 22 13:17:05 UTC 2010


Chris,

The following codes are what I use to retrieve sequences from GenBank. I
know that I can use something like:

  for my $feature ($seqobj->get_SeqFeatures){

        if ($feature->primary_tag eq "CDS") {
...

To get features, but how should                         

Bio::Location::SplitLocation                                    

be used? Do you mean something like:

If ($seq->is_circular()) {
  spliced_seq();
}

? But the genome indeed has several such spliced sequences then how can I
specify which is to retrieve? Thanks for your advice again~

#!/usr/bin/perl

use Bio::SeqIO::genbank; use Bio::DB::GenBank;

use Bio::DB::RefSeq;

 

$gb = new Bio::DB::GenBank;

 

my ($acc, $start, $stop) = @ARGV;

 

my $gb = Bio::DB::GenBank->new(-format     => 'Fasta',

        -seq_start  => "$start",

        -seq_stop   => "$stop",

        -strand     => 1);

 

$gbout = $acc;

 

$seq = $gb->get_Seq_by_acc($acc);

print "seq is ", $seq->seq, "\n";

 

$seqio_obj = Bio::SeqIO->new(-file => ">$gbout.fa", -format => 'fasta' );

$seqio_obj->write_seq($seq);

exit;                              


-----Original Message-----
From: Chris Fields [mailto:cjfields at illinois.edu] 
Sent: Monday, March 22, 2010 8:58 PM
To: Ross KK Leung
Cc: 'Florent Angly'; 'bioperl-l List'
Subject: Re: [Bioperl-l] automation of translation based on alignment

On Mar 22, 2010, at 12:30 AM, Ross KK Leung wrote:

> Dear Chris,
> 
> It seems that Bioperl is "clever" enough to "rectify" my start and stop by
> reversing the order.
> 
> e.g.
> start = 2300
> stop = 1600
> 
> It will reverse back to 1600 and then 2300.
> What else to tell that I'm now working on a circular genome?

Reverse it where, the alignment or the feature?  The svn version of BioPerl,
for alignments, retains strand information (this was a bug that was fixed).
For features, start is always less than end, with directionality determined
by strand.  For a circular genome, the feature is split across the origin,
as you have seen in the original sequence you posted:

...
    gene            join(2307..3215,1..1623)
                    /gene="P"
...


This would be represented as a Bio::Location::SplitLocation in the feature;
it would joined based on that order if $seq->is_circular() is true (or at
least it should).  In cases like this, the safe bet is to call spliced_seq()
to get the joined sequence in all cases, then call translate() to get the
protein sequence.

chris




More information about the Bioperl-l mailing list