[Bioperl-l] LocatableSeq::subseq(): bug or not?

Chris Fields cjfields at illinois.edu
Tue Nov 25 12:34:37 EST 2008


Mark,

Your subseq() patch appears to work just fine; no apparent tests are  
failing, API doesn't change, so that will be added for the release.   
We may need to define a new subseq()-like method to work properly with  
start/end coordinates that match only residues and are consistent with  
different coordinate systems (i.e. mapping), or we can add that in as  
a flag.

Related to this, I have made a few commits defining groups of symbols  
for LocatableSeq ($GAP_SYMBOLS, $RESIDUE_SYMBOLS, $FRAMESHIFT_SYMBOLS,  
and the catchall $OTHER_SYMBOLS).  I had already started down this  
path anyway, so might as well finish it.  A remaining problem: they  
are currently set as class global variables, so there are some odd  
scoping issues when using them globally or locally (detailed in the  
test suite as a TODO), and they do not reset the $MATCHPATTERN.  I'll  
set them up to be object-scoped attributes in a future release.

chris

On Nov 24, 2008, at 8:04 AM, Mark A. Jensen wrote:

> Bug #2682 contains a patch that modifies subseq() to strip gaps if  
> desired. It also tries to fix the $replace weirdness.
>
> perldb transcript:
> DB<11> $seq  = new Bio::PrimarySeq(-seq=>'--atg---gta--')
>
> DB<12> x $seq->subseq(1,3)
> 0  '--a'
> DB<13> x $seq->subseq(1,3,NOGAP)
> 0  'a'
> DB<15> x $seq->seq
> 0  '--atg---gta--'
> DB<16> x $seq->subseq(-START=>1, -END=>3, -REPLACE_WITH=>'tga')
> 0  '--a'
> DB<18> x $seq->seq
> 0  'tgatg---gta--'
> ## silly gap-stripper:
> DB<21> x $seq->subseq(-START=>1, -END=>$seq->length,
>                                         -REPLACE_WITH=>$seq->subseq(- 
> START=>1,
>                                                                                                   -END 
> =>$seq->length,
>                                                                                                   -NOGAP 
> =>1))
> 0  'tgatg---gta--'
> DB<22> x $seq->seq
> 0  'tgatggta'
>
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu 
> >
> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
> Sent: Sunday, November 23, 2008 7:31 PM
> Subject: [Bioperl-l] LocatableSeq::subseq(): bug or not?
>
>
>> Currently, we have Bio::LocatableSeq use the default  
>> (Bio::PrimarySeq)  implementation of subseq().  However the  
>> returned data apparently  clashes with the actual PrimarySeq  
>> documentation:
>>
>> Function: returns the subseq from start to end, where the first base
>>           is 1 and the number is inclusive, ie 1-2 are the first two
>>           bases of the sequence
>>
>> So, should the following actually return the indicated range of  
>> bases  (no gaps)?  Or should we clarify the above documentation to  
>> indicate  subseq() returns the first x positions/columns (anything)  
>> instead of  'bases' (no gaps)?
>>
>> my $seq = Bio::LocatableSeq->new(
>>   -seq => '--atg---gta--',
>>   -strand => 1,
>>   -start => 1,
>>   -end => 6,
>>   -alphabet => 'dna'
>> );
>>
>> # comments indicate current returned val
>> $seq->subseq(1,3);  # returns '--a'
>> $seq->subseq(3,6);  # returns 'atg-'
>> $seq->subseq(1,10); # returns '--atg---gt'
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign






More information about the Bioperl-l mailing list