[Bioperl-l] Help with Bio::SeqIO

Mon Nov 5 00:08:34 UTC 2007

Pass in (-nosort => 1) to spliced_seq:

print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";

This ensures no sorting of sublocations occurs, if you want for  
instance typical GenBank/EMBL 'join' behavior.

To the other devs: shouldn't -nosort be the default behavior when the  
split location is a 'join'?  In other words, should spliced_seq() be  
modified to take into account the split location type when returning  
sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join' explicitly  
indicates the order of the sequences is important when joined  
together; the current behavior is more like that for 'order'.

chris

On Nov 4, 2007, at 12:39 PM, download on demand wrote:

> Hi to all.
>
> I have a problem with a simplest script:
>
>
>
>          use Bio::SeqIO;
>          # get command-line arguments, or die with a usage statement
>          my $usage = "x2y.pl infile infileformat outfile  
> outfileformat\n";
>          my $infile = shift or die $usage;
>          my $infileformat = shift or die $usage;
> #         my $outfile = shift or die $usage;
>          my $outfileformat = shift or die $usage;
>
>          # create one SeqIO object to read in,and another to write out
>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>                                       '-format' => $infileformat);
>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>                                        '-format' => $outfileformat);
>
>          # write each entry in the input file to the output file
>          while (my $inseq = $seq_in->next_seq) {
>
> #            $seq_out->write_seq($inseq); # Whole sequence not needed
>
> for my $feat_object ($inseq->get_SeqFeatures)
>     {
>     if ($feat_object->primary_tag eq "CDS")
>         {
>         print $feat_object->get_tag_values('product'),"\n";
>         print
> $feat_object->location->start,"..",$feat_object->location->end,"\n";
>         print $feat_object->spliced_seq->seq,"\n\n";
>         }
>     }
>
>
>
> The result seems OK to me, but in case of first CDS of  
> NC_005213.gbk from
> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Nanoarchaeum_equitans/ 
> > the
> output is wrong:
>
> It is:
> hypothetical protein
> 1..490885
> TAAATGCGATTGCTATTAGAA..................................Truncated
> sequence...................................
>
> Should be:
> hypothetical protein
> 879..490883
> ATGCGATTGCTATTAGAA...................................Truncated
> sequence....................................TAA
>
>
>
> This CDS have an unnatural location string:
> CDS             complement(join(490883..490885,1..879)), but  
> spliced_seq
> should handle these things?
>
> Please help me!
> Best regards, N.
> _______________________________________________
>