[Bioperl-l] LocatableSeq::subseq(): bug or not?
Mark A. Jensen
maj at fortinbras.us
Tue Nov 25 19:00:10 UTC 2008
----- Original Message -----
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Tuesday, November 25, 2008 12:34 PM
Subject: Re: [Bioperl-l] LocatableSeq::subseq(): bug or not?
> Mark,
>
> Your subseq() patch appears to work just fine; no apparent tests are
> failing, API doesn't change, so that will be added for the release.
> We may need to define a new subseq()-like method to work properly
> with start/end coordinates that match only residues and are
> consistent with different coordinate systems (i.e. mapping), or we
> can add that in as a flag.
I'm willing to try my hand at this, if desired-- can you point me to
the modules involved off the top of yer head?
MAJ
>
> Related to this, I have made a few commits defining groups of
> symbols for LocatableSeq ($GAP_SYMBOLS, $RESIDUE_SYMBOLS,
> $FRAMESHIFT_SYMBOLS, and the catchall $OTHER_SYMBOLS). I had
> already started down this path anyway, so might as well finish it.
> A remaining problem: they are currently set as class global
> variables, so there are some odd scoping issues when using them
> globally or locally (detailed in the test suite as a TODO), and
> they do not reset the $MATCHPATTERN. I'll set them up to be
> object-scoped attributes in a future release.
>
> chris
>
> On Nov 24, 2008, at 8:04 AM, Mark A. Jensen wrote:
>
>> Bug #2682 contains a patch that modifies subseq() to strip gaps if
>> desired. It also tries to fix the $replace weirdness.
>>
>> perldb transcript:
>> DB<11> $seq = new Bio::PrimarySeq(-seq=>'--atg---gta--')
>>
>> DB<12> x $seq->subseq(1,3)
>> 0 '--a'
>> DB<13> x $seq->subseq(1,3,NOGAP)
>> 0 'a'
>> DB<15> x $seq->seq
>> 0 '--atg---gta--'
>> DB<16> x $seq->subseq(-START=>1, -END=>3, -REPLACE_WITH=>'tga')
>> 0 '--a'
>> DB<18> x $seq->seq
>> 0 'tgatg---gta--'
>> ## silly gap-stripper:
>> DB<21> x $seq->subseq(-START=>1, -END=>$seq->length,
>> -REPLACE_WITH=>$seq->subseq(-
>> START=>1,
>>
>> -END =>$seq->length,
>>
>> -NOGAP =>1))
>> 0 'tgatg---gta--'
>> DB<22> x $seq->seq
>> 0 'tgatggta'
>>
>> ----- Original Message ----- From: "Chris Fields"
>> <cjfields at illinois.edu
>> >
>> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
>> Sent: Sunday, November 23, 2008 7:31 PM
>> Subject: [Bioperl-l] LocatableSeq::subseq(): bug or not?
>>
>>
>>> Currently, we have Bio::LocatableSeq use the default
>>> (Bio::PrimarySeq) implementation of subseq(). However the
>>> returned data apparently clashes with the actual PrimarySeq
>>> documentation:
>>>
>>> Function: returns the subseq from start to end, where the first
>>> base
>>> is 1 and the number is inclusive, ie 1-2 are the first
>>> two
>>> bases of the sequence
>>>
>>> So, should the following actually return the indicated range of
>>> bases (no gaps)? Or should we clarify the above documentation to
>>> indicate subseq() returns the first x positions/columns
>>> (anything) instead of 'bases' (no gaps)?
>>>
>>> my $seq = Bio::LocatableSeq->new(
>>> -seq => '--atg---gta--',
>>> -strand => 1,
>>> -start => 1,
>>> -end => 6,
>>> -alphabet => 'dna'
>>> );
>>>
>>> # comments indicate current returned val
>>> $seq->subseq(1,3); # returns '--a'
>>> $seq->subseq(3,6); # returns 'atg-'
>>> $seq->subseq(1,10); # returns '--atg---gt'
>>>
>>> chris
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Marie-Claude Hofmann
> College of Veterinary Medicine
> University of Illinois Urbana-Champaign
>
>
>
>
>
>
More information about the Bioperl-l
mailing list