[Bioperl-l] automation of translation based on alignment
Chris Fields
cjfields at illinois.edu
Mon Mar 22 16:20:24 EDT 2010
On Mar 22, 2010, at 8:17 AM, Ross KK Leung wrote:
> Chris,
>
> The following codes are what I use to retrieve sequences from GenBank. I
> know that I can use something like:
>
> for my $feature ($seqobj->get_SeqFeatures){
>
> if ($feature->primary_tag eq "CDS") {
> ...
>
> To get features, but how should
>
> Bio::Location::SplitLocation
>
> be used? Do you mean something like:
>
> If ($seq->is_circular()) {
> spliced_seq();
> }
You probably won't directly see the SplitLocation itself unless you explicitly request it (it is contained in the sequence feature).
Okay, so if you are trying to retrieve the sequence for a specific feature, you can use $sf->seq() (simple subsequence from start to end corrected for strand of feature). However, in the case where the feature crosses the origin it will contain a split location. In this case, you should call $sf->spliced_seq() to retrieve spliced sequence. For convenience, you could call spliced_seq on all sequence features; for simple locations it will just return the ordinary subseq().
So, if one had a generic sequence feature, one could call:
$sf->spliced_seq->translate;
to get the Bio::Seq object that is the translation of the seq feature region.
> ? But the genome indeed has several such spliced sequences then how can I
> specify which is to retrieve? Thanks for your advice again~
Do you mean alternatively spliced variants? These would be designated as separate features in a GenBank file, so you would check for those. Otherwise you'll have to clarify.
If you haven't read them yet I suggest looking over the HOWTOs, specifically ones covering Seq/SeqIO and Feature/Annotation to get an idea of what is possible.
chris
> #!/usr/bin/perl
>
> use Bio::SeqIO::genbank; use Bio::DB::GenBank;
>
> use Bio::DB::RefSeq;
>
>
>
> $gb = new Bio::DB::GenBank;
>
>
>
> my ($acc, $start, $stop) = @ARGV;
>
>
>
> my $gb = Bio::DB::GenBank->new(-format => 'Fasta',
>
> -seq_start => "$start",
>
> -seq_stop => "$stop",
>
> -strand => 1);
>
>
>
> $gbout = $acc;
>
>
>
> $seq = $gb->get_Seq_by_acc($acc);
>
> print "seq is ", $seq->seq, "\n";
>
>
>
> $seqio_obj = Bio::SeqIO->new(-file => ">$gbout.fa", -format => 'fasta' );
>
> $seqio_obj->write_seq($seq);
>
> exit;
>
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Monday, March 22, 2010 8:58 PM
> To: Ross KK Leung
> Cc: 'Florent Angly'; 'bioperl-l List'
> Subject: Re: [Bioperl-l] automation of translation based on alignment
>
> On Mar 22, 2010, at 12:30 AM, Ross KK Leung wrote:
>
>> Dear Chris,
>>
>> It seems that Bioperl is "clever" enough to "rectify" my start and stop by
>> reversing the order.
>>
>> e.g.
>> start = 2300
>> stop = 1600
>>
>> It will reverse back to 1600 and then 2300.
>> What else to tell that I'm now working on a circular genome?
>
> Reverse it where, the alignment or the feature? The svn version of BioPerl,
> for alignments, retains strand information (this was a bug that was fixed).
> For features, start is always less than end, with directionality determined
> by strand. For a circular genome, the feature is split across the origin,
> as you have seen in the original sequence you posted:
>
> ...
> gene join(2307..3215,1..1623)
> /gene="P"
> ...
>
>
> This would be represented as a Bio::Location::SplitLocation in the feature;
> it would joined based on that order if $seq->is_circular() is true (or at
> least it should). In cases like this, the safe bet is to call spliced_seq()
> to get the joined sequence in all cases, then call translate() to get the
> protein sequence.
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list