[Bioperl-l] LocatableSeq::subseq(): bug or not?
Chris Fields
cjfields at illinois.edu
Tue Nov 25 12:34:37 EST 2008
Mark,
Your subseq() patch appears to work just fine; no apparent tests are
failing, API doesn't change, so that will be added for the release.
We may need to define a new subseq()-like method to work properly with
start/end coordinates that match only residues and are consistent with
different coordinate systems (i.e. mapping), or we can add that in as
a flag.
Related to this, I have made a few commits defining groups of symbols
for LocatableSeq ($GAP_SYMBOLS, $RESIDUE_SYMBOLS, $FRAMESHIFT_SYMBOLS,
and the catchall $OTHER_SYMBOLS). I had already started down this
path anyway, so might as well finish it. A remaining problem: they
are currently set as class global variables, so there are some odd
scoping issues when using them globally or locally (detailed in the
test suite as a TODO), and they do not reset the $MATCHPATTERN. I'll
set them up to be object-scoped attributes in a future release.
chris
On Nov 24, 2008, at 8:04 AM, Mark A. Jensen wrote:
> Bug #2682 contains a patch that modifies subseq() to strip gaps if
> desired. It also tries to fix the $replace weirdness.
>
> perldb transcript:
> DB<11> $seq = new Bio::PrimarySeq(-seq=>'--atg---gta--')
>
> DB<12> x $seq->subseq(1,3)
> 0 '--a'
> DB<13> x $seq->subseq(1,3,NOGAP)
> 0 'a'
> DB<15> x $seq->seq
> 0 '--atg---gta--'
> DB<16> x $seq->subseq(-START=>1, -END=>3, -REPLACE_WITH=>'tga')
> 0 '--a'
> DB<18> x $seq->seq
> 0 'tgatg---gta--'
> ## silly gap-stripper:
> DB<21> x $seq->subseq(-START=>1, -END=>$seq->length,
> -REPLACE_WITH=>$seq->subseq(-
> START=>1,
> -END
> =>$seq->length,
> -NOGAP
> =>1))
> 0 'tgatg---gta--'
> DB<22> x $seq->seq
> 0 'tgatggta'
>
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu
> >
> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
> Sent: Sunday, November 23, 2008 7:31 PM
> Subject: [Bioperl-l] LocatableSeq::subseq(): bug or not?
>
>
>> Currently, we have Bio::LocatableSeq use the default
>> (Bio::PrimarySeq) implementation of subseq(). However the
>> returned data apparently clashes with the actual PrimarySeq
>> documentation:
>>
>> Function: returns the subseq from start to end, where the first base
>> is 1 and the number is inclusive, ie 1-2 are the first two
>> bases of the sequence
>>
>> So, should the following actually return the indicated range of
>> bases (no gaps)? Or should we clarify the above documentation to
>> indicate subseq() returns the first x positions/columns (anything)
>> instead of 'bases' (no gaps)?
>>
>> my $seq = Bio::LocatableSeq->new(
>> -seq => '--atg---gta--',
>> -strand => 1,
>> -start => 1,
>> -end => 6,
>> -alphabet => 'dna'
>> );
>>
>> # comments indicate current returned val
>> $seq->subseq(1,3); # returns '--a'
>> $seq->subseq(3,6); # returns 'atg-'
>> $seq->subseq(1,10); # returns '--atg---gt'
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list