[Bioperl-l] Help with Bio::SeqIO

Jason Stajich jason at bioperl.org
Mon Nov 5 17:07:10 UTC 2007


At one point the location order was not respected/saved I believe. I  
guess we will just assume the user will build up a SplitLocation in  
order (i.e. add_SubLocation).  I'll try and remember if there were  
any other particular reasons.


-jason
On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:

> I agree that there should be a meaningful default that results in
> "doing the right thing" in most cases if the user doesn't intervene.
> I'm not sure I understand all the details, but it sounds sorting or
> not sorting should depend on the split location type unless the user
> overrides it by argument. That's what you're suggesting, right?
>
> 	-hilmar
>
> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:
>
>> Pass in (-nosort => 1) to spliced_seq:
>>
>> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
>>
>> This ensures no sorting of sublocations occurs, if you want for
>> instance typical GenBank/EMBL 'join' behavior.
>>
>> To the other devs: shouldn't -nosort be the default behavior when
>> the split location is a 'join'?  In other words, should spliced_seq
>> () be modified to take into account the split location type when
>> returning sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join'
>> explicitly indicates the order of the sequences is important when
>> joined together; the current behavior is more like that for 'order'.
>>
>> chris
>>
>> On Nov 4, 2007, at 12:39 PM, download on demand wrote:
>>
>>> Hi to all.
>>>
>>> I have a problem with a simplest script:
>>>
>>>
>>>
>>>          use Bio::SeqIO;
>>>          # get command-line arguments, or die with a usage statement
>>>          my $usage = "x2y.pl infile infileformat outfile
>>> outfileformat\n";
>>>          my $infile = shift or die $usage;
>>>          my $infileformat = shift or die $usage;
>>> #         my $outfile = shift or die $usage;
>>>          my $outfileformat = shift or die $usage;
>>>
>>>          # create one SeqIO object to read in,and another to write
>>> out
>>>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>>>                                       '-format' => $infileformat);
>>>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>>>                                        '-format' => $outfileformat);
>>>
>>>          # write each entry in the input file to the output file
>>>          while (my $inseq = $seq_in->next_seq) {
>>>
>>> #            $seq_out->write_seq($inseq); # Whole sequence not  
>>> needed
>>>
>>> for my $feat_object ($inseq->get_SeqFeatures)
>>>     {
>>>     if ($feat_object->primary_tag eq "CDS")
>>>         {
>>>         print $feat_object->get_tag_values('product'),"\n";
>>>         print
>>> $feat_object->location->start,"..",$feat_object->location->end,"\n";
>>>         print $feat_object->spliced_seq->seq,"\n\n";
>>>         }
>>>     }
>>>
>>>
>>>
>>> The result seems OK to me, but in case of first CDS of
>>> NC_005213.gbk from
>>> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/
>>> Nanoarchaeum_equitans/> the
>>> output is wrong:
>>>
>>> It is:
>>> hypothetical protein
>>> 1..490885
>>> TAAATGCGATTGCTATTAGAA..................................Truncated
>>> sequence...................................
>>>
>>> Should be:
>>> hypothetical protein
>>> 879..490883
>>> ATGCGATTGCTATTAGAA...................................Truncated
>>> sequence....................................TAA
>>>
>>>
>>>
>>> This CDS have an unnatural location string:
>>> CDS             complement(join(490883..490885,1..879)), but
>>> spliced_seq
>>> should handle these things?
>>>
>>> Please help me!
>>> Best regards, N.
>>> _______________________________________________
>>>
>>
>>
>>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org




More information about the Bioperl-l mailing list