[Bioperl-l] automation of translation based on alignment

Chris Fields cjfields at illinois.edu
Tue Mar 23 00:43:03 EDT 2010


On Mar 22, 2010, at 8:32 PM, Ross KK Leung wrote:

> Chris L,
> 
> Your comment is insightful and as a non-virologist, I have never known that
> before. My strategy is just to extract the genomic fragments encoding
> proteins and derive the putative translated sequences. I'll do another round
> of MSA for the protein sequences in order to discover any outliners. There
> may be truncations, but as long as the protease acts post-translationally,
> it's acceptable. 
> 
> Chris F,
> 
> What makes me feel frustrated is the verisimilar data structures and naming
> of Bio objects in Bioperl. If I want to retrieve a genbank file over the
> internet by:
> 
> $gb = new Bio::DB::GenBank;
> 
> $seq = $gb->get_Seq_by_acc('J00522');
> 
> And from:
> http://doc.bioperl.org/releases/bioperl-1.4/Bio/DB/GenBank.html
> 
> it says it returns a Bio::Seq object, but in fact it's a Bio::Seq::RichSeq
> so I can't do something like:

A Bio::Seq::RichSeq is-a Bio::Seq (it inherits Bio::Seq and augments it).  I believe 'Bio::Seq' in the documents refers to the fact one can retrieve FASTA sequence data (which returns a simple Bio::Seq) or richer records, such as a GenBank record (which returns a Bio::Seq::RichSeq).  In this case, it should probably read 'Bio::SeqI' to be more accurate (implements the Bio::SeqI interface).  

Beyond the addition of a few accessor methods they are essentially the same, in they both have annotation, features, etc.  

> my $seqobj = $seq->next_seq;

You're either not reading the demos or the relevant documentation correctly, or there is a spot in the docs that needs to be fixed (if the latter, please let us know).  Bio::Seq does not implement a next_seq() method, but sequence *streams* (ala Bio::SeqIO) do.  You are probably thinking of something like this:

my $streamobj = $gb->get_Stream_by_acc(@ids);

while (my $seqobj = $stream->next_seq) {
   # do stuff here
}

The above retrieves a stream of Bio::Seq objects (specifically, a Bio::SeqIO stream). '$stream->next_seq()' iterates through them one at a time.  Unless you call a stream in some way, that code will not work.  If you call the methods below directly on the *sequence* object ($seqobj, retrieved from get_Seq_by_*), NOT the *stream* object (get_Stream_by_*), it should work.

>  for my $feat_object ($seqobj->get_SeqFeatures) {
> 
>      if ($feat_object->primary_tag eq "CDS") {
> 
>          print $feat_object->spliced_seq->seq,"\n";
> 
>          if ($feat_object->has_tag('gene')) {
> 
>              for my $val ($feat_object->get_tag_values('gene')){
> 
>                  print "gene: ",$val,"\n";
> 
>              }
> 
>          }
> 
>      }
> 
>  }                                         
> 
>> From http://doc.bioperl.org/releases/bioperl-1.4/Bio/Seq/RichSeq.html, the
> methods there mention nothing about how to get the features or inter-convert
> among the object types.

Just a note, but make sure to read up-to-date documentation, particularly if you are using the latest code.  Here is the pdoc for the latest release:

http://doc.bioperl.org/releases/bioperl-1.6.1/Bio/Seq/RichSeqI.html 

This is definitely worth pointing out, and is a good example where we can improve our documentation; I've added some links to classes that would explain more.  In the meantime, the best thing to do in this case is to point you to the online documentation (which I think I did already, but just in case):

http://www.bioperl.org/wiki/HOWTO:Beginners
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

chris


More information about the Bioperl-l mailing list