[Bioperl-l] Clear range from Bio::Seq::Quality?

Mark A. Jensen maj at fortinbras.us
Mon Apr 27 15:51:39 UTC 2009


Dan - congrats on your first contribution! Mark
----- Original Message ----- 
From: "Dan Bolser" <dan.bolser at gmail.com>
To: "Heikki Lehvaslaiho" <heikki.lehvaslaiho at gmail.com>
Cc: "Chris Fields" <cjfields at illinois.edu>; <bioperl-l at lists.open-bio.org>
Sent: Monday, April 27, 2009 4:31 AM
Subject: Re: [Bioperl-l] Clear range from Bio::Seq::Quality?


Hi Heikki,

Thanks very much for the advice on how to better implement the clear
range method within the Bio::Seq::Quality object. I can understand the
logic of what you have written, and it all sounds reasonable. The only
problem is that I am very inexperienced with working on object
oriented Perl (my 'one man' projects to date have never really
required me to think beyond scripts, and its been years since I
actually tried to code objects in Perl).

To be specific, when you say, "Lets add a method that sets the
threshold and stores it internally as $self->_threshold", ignoring any
other functionality, what would that method look like? in particular,
how would $self->_threshold be implemented?

I think once I see that detail, I can go ahead and try to code what
you suggested.


Similarly (Chris), where would I put the tests / how would they be implemented?


Thanks again for the feedback.

All the best,
Dan.



2009/4/27 Heikki Lehvaslaiho <heikki.lehvaslaiho at gmail.com>:
> Dan,
>
> It looks like your method does two different things:
>
> 1. Returns the longest subsequence above the threshold
> 2. Analyses the the sequence for the number of ranges the current
> threshold creates.
>
> Why not separate these functions?
>
> Lets add a method that sets the threshold and stores it internally as
> $self->_threshold. Setting it to a new values should trigger emptying
> all the caches (see below.)
>
> Lets have two more public methods:
>
> 1. get_clean_range() - optional argument 'threshold'
>
> It returns the longest clean subseq.
>
> 2. count_clean_ranges() -again optional argument 'threshold'
>
> This returns the number of ranges detected.
>
> Both methods call first the public method threshold if the argument
> has been given and then an internal method _find_clean_ranges(). That
> method calculates all the ranges and stores them internally (as
> $self->_clean_ranges-> [...]). The number of ranges is also stored
> (e.g. $self->_number_of ranges).These internal values form the cache
> that needs to be emptied whenever any of the critical values of the
> object changes: threshold, quality or seq. Create an internal method
> $self->_clear_cache, that does that.
>
> Now the quality new object does not get created until you call
> get_clean_range() which accesses the cached values (or creates them if
> they are not there).
>
> This design allows you to have no extra penalty for adding more
> methods that act on cached values. For example, it might be sensible
> thing to do at some point to look at all the ranges that are longer
> than some length. Then you could write in your program:
>
>
> $qual->threshold(10);
> if ($qual->count_clean_ranges = 1) {
> my $newqual = $qual->get_clean_range()
> # do your analysis
> } elsif ($qual->count_clean_ranges = 0) {
> # do some reporting and logging
> } else { # more than one ranges
> my @quals = $qual->get_all_clean_ranges($min_lenght);
> # do some more work and possibly select the best one(s)
> }
>
>
>
> Yours,
>
> -Heikki
>
> 2009/4/24 Chris Fields <cjfields at illinois.edu>:
>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. If
>> possible, tests don't hurt either!
>>
>> chris
>>
>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote:
>>
>>> Its a bit rough and ready, but it does what I need...
>>>
>>>
>>>
>>>
>>> =head2 get_clear_range
>>>
>>> Title : get_clear_range
>>>
>>> Title : subqual
>>> Usage : $subobj = $obj->get_clear_range();
>>> $subobj = $obj->get_clear_range(20);
>>> Function : Get the clear range using the given quality score as a
>>> cutoff or a default value of 13.
>>>
>>> Returns : a new Bio::Seq::Quality object
>>> Args : a minimum quality value, optional, devault = 13
>>>
>>> =cut
>>>
>>> sub get_clear_range
>>> {
>>> my $self = shift;
>>> my $qual = $self->qual;
>>> my $minQual = shift || 13;
>>>
>>> my (@ranges, $rangeFlag);
>>>
>>> for(my $i=0; $i<@$qual; $i++){
>>> ## Are we currently within a clear range or not?
>>> if(defined($rangeFlag)){
>>> ## Did we just leave the clear range?
>>> if($qual->[$i]<$minQual){
>>> ## Log the range
>>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag];
>>> ## and reset the range flag.
>>> $rangeFlag = undef;
>>> }
>>> ## else nothing changes
>>> }
>>> else{
>>> ## Did we just enter a clear range?
>>> if($qual->[$i]>=$minQual){
>>> ## Better set the range flag!
>>> $rangeFlag = $i;
>>> }
>>> ## else nothing changes
>>> }
>>> }
>>> ## Did we exit the last clear range?
>>> if(defined($rangeFlag)){
>>> my $i = scalar(@$qual);
>>> ## Log the range
>>> push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag];
>>> }
>>>
>>> unless(@ranges){
>>> die "There is no clear range... I don't know what to do here!\n";
>>> }
>>>
>>> print "there are ", scalar(@ranges), " clear ranges\n";
>>>
>>> my $sum; map {$sum += $_->[2]} @ranges;
>>>
>>> print "of ", scalar(@$qual), " bases, there are $sum with ".
>>> "quality scores above the given threshold\n";
>>>
>>> for (sort {$b->[2] <=> $a->[2]} @ranges){
>>> if($_->[2]/$sum < 0.5){
>>> warn "not so much a clear range as a clear chunk...\n";
>>> }
>>> print $_->[2], "\t", $_->[2]/$sum, "\n";
>>>
>>> return Bio::Seq::QualityDB->new( -seq => $self->subseq( $_->[0]+1,
>>> $_->[1]+1),
>>> -qual => $self->subqual($_->[0]+1,
>>> $_->[1]+1)
>>> );
>>> }
>>> }
>>>
>>>
>>>
>>>
>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which
>>> is a copy of Bio/Seq/Quality.pm that just has the above method added).
>>> That is why the 'new Bio::Seq::Quality object' is actually a
>>> Bio::Seq::QualityDB object, but other than that it should slot right
>>> in (apart from all the debugging output that I spit out).
>>>
>>>
>>> Cheers,
>>> Dan.
>>>
>>>
>>> 2009/4/24 Dan Bolser <dan.bolser at gmail.com>:
>>>>
>>>> Hi all,
>>>>
>>>> I couldn't find out how to get the 'clear range' from a
>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should
>>>> this method be a part of the Bio::Seq::Quality class?
>>>>
>>>> In the latter case I'm on my way to an implementation, but I am not
>>>> good at navigating the bioperl docs, so I thought I should ask before
>>>> I take the time to finish that off.
>>>>
>>>>
>>>> Cheers,
>>>> Dan.
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> --
> -Heikki
> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
> cell: +27 (0)714328090
> Sent from Claremont, WC, South Africa
>

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list