Bioperl: A greedy consensus builder

Thu, 05 Aug 1999 11:58:57 -0700

interesting. Do the various maintainers have an interest in these?

If so, this suggests that consensus can mean many things. Is there a preference
for making each type a distinct method, or should they be options passed to a
general consensus builder. Since I imagine our 2 techniquies are not mutually
exclusive, it might prove somewhat silly to have them as distinct methods, when
it might be useful to get both effects.

Since it's small I'm just going to append my code.

package Consensus;
use strict;
use vars qw(@ISA $_NO_CONSENSUS_SYMBOL);
use lib '/usr/users/mcariaso/pub/lib/perl5/site_perl';
use Bio::UnivAln;

@ISA = qw(Bio::UnivAln);
$_NO_CONSENSUS_SYMBOL = $Bio::UnivAln::_NO_CONSENSUS_SYMBOL;

sub greedy {
    my ($self, $threshold, $mincol) = @_;
    $threshold = 0.75 unless defined($threshold);
    $mincol = 1 unless defined($mincol);

    my ($row);
    my (@row_start, @row_stop);
    my ($start, $end);

    # measure the bounds for each row

    foreach $row (split(/\n/,$self->layout('raw'))) {
	($start) = $row =~ /^(\.*)/g;
	($end) = $row =~ /^(.*[^.])\.*$/;
	push @row_start, 1+length $start;
	push @row_stop, length $end;
    }

    my $consfunc = &Bio::UnivAln::_c_consensus_of_array($threshold);

    my @consensus;
    my $width = $self->width;
    my $height = $self->height;
    my @inside;
    my $seq;
    foreach my $ncol (1..$width) {
	@inside=();                 #which rows are inside their bounds
	foreach my $i (0..$height-1) {
	    push (@inside, $i+1) if ($ncol >= $row_start[$i]) &&
		($ncol <= $row_stop[$i]);
	}
	if ($#inside < $mincol) {
	    push @consensus, '#';   #my marker for insufficent lines
	} else {
	    $seq = $self->seqs([@inside],[$ncol]);
	    push @consensus, &$consfunc([split(/\n/,$seq)]);
	}
    }
    return wantarray ? @consensus : join("",@consensus);
}

1;

Ruediger Hornig wrote:
> 
> Mike Cariaso (mcariaso@genelogic.com) wrote:
> 
> > To scratch an itch I've built a small object which ISA
> > Bio::UnivAln.
> > It has a single method 'greedy' which constucts a consensus
> > sequence using a
> > somewhat different method. The approach does ignores alignment
> > gaps that are
> > before (or after) the first (last) non-gap of a row. An example
> > may explain it
> > more clearly.
> 
> While we are at collecting methods for the alignment object...
> 
> I have written something which may be useful for working with
> protein sequence alignments. The UnivAln->consensus method
> returns identity, while I was interested in homology (eg.
> V|I|A|L). So if identity of a column does not make the threshold,
> but the residues with a certain property do, the property will be
> returned.
> 
> VSPEF
> ASPEF
> ISPDF
> LTPDF
> 
> !SP~F is the 0.75 consensus with ! aliphatic and ~ acidic. If
> someone is interested in this type of consensus, or in separate
> identity and homology methods in that
> context, I'm happy to contribute that.
> 
> Rüdi
> 
> --
> Dr Rüdiger Hornig
> Computational Molecular Biology And Drug Design
> Department For Biochemistry And Molecular Biology
> Australian National University, Canberra 0200, Australia
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================

-- 
mike cariaso      --------------------     mcariaso@genelogic.com
ph:510-981-3156 -------------------------- fax:510-649-3449
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================