[BioPython] gapped consensus?

Thu Jun 12 11:54:43 EDT 2003

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I would like to have the chance to have "gapped consensus".
Now the dumb_consensus (on Align.AlignInfo) work like this:

AATGGGTCA---GTGACG
AA-GGGTGA---GTCACG
AATGGGTGA--GGTCACG
AATGG-TGAAAGGTCACG

Gives this consensus:

AATGGGTGAAAGGTCACG

But I'm working on a program to design primers based on INDELS, so it
would be usefull for me a result like this:

AA-GG-TGA---GTCACG

Each column with a -, the - will be on the consensus.

Another posibility (closer than what the program actually does) would be:

AATGGGTGA---GTCACG

In this way the - (dashes, gaps) would be weighted as any other character.

I think a little hack on the dumb_consensus function could allow me to
do it. I even comment out the place where it chacks for - or . before
entering on the sum, but I still get letters instead of -.

Here is the original relevant code:

def dumb_consensus(self, threshold = .7, ambiguous = "N",
~                       consensus_alpha = None, require_multiple = 0):
~        """Output a fast consensus sequence of the alignment.

~        This doesn't do anything fancy at all. It will just go through the
~        sequence residue by residue and count up the number of each type
~        of residue (ie. A or G or T or C for DNA) in all sequences in the
~        alignment. If the percentage of the most common residue type is
~        greater then the passed threshold, then we will add that residue
type,
~        otherwise an ambiguous character will be added.

~        This could be made a lot fancier (ie. to take a substitution matrix
~        into account), but it just meant for a quick and dirty consensus.

~        Arguments:
~        o threshold - The threshold value that is required to add a
particular
~        atom.
~        o ambiguous - The ambiguous character to be added when the
threshold is
~        not reached.
~        o consensus_alpha - The alphabet to return for the consensus
sequence.
~        If this is None, then we will try to guess the alphabet.
~        o require_multiple - If set as 1, this will require that more than
~        1 sequence be part of an alignment to put it in the consensus (ie.
~        not just 1 sequence and gaps).
~        """
~        consensus = ''

~        # find the length of the consensus we are creating
~        con_len = self.alignment.get_alignment_length()

~        # go through each seq item
~        for n in range(con_len):
~            # keep track of the counts of the different atoms we get
~            atom_dict = {}
~            num_atoms = 0

~            for record in self.alignment._records:
~                # make sure we haven't run past the end of any sequences
~                # if they are of different lengths
~                if n < len(record.seq):
~                    #if record.seq[n] != '-' and record.seq[n] != '.':
~                        if record.seq[n] not in atom_dict.keys():
~                            atom_dict[record.seq[n]] = 1
~                        else:
~                            atom_dict[record.seq[n]] = \
~                              atom_dict[record.seq[n]] + 1

~                        num_atoms = num_atoms + 1

~            max_atoms = []
~            max_size = 0

~            for atom in atom_dict.keys():
~                if atom_dict[atom] > max_size:
~                    max_atoms = [atom]
~                    max_size = atom_dict[atom]
~                elif atom_dict[atom] == max_size:
~                    max_atoms.append(atom)

~            if require_multiple and num_atoms == 1:
~                consensus = consensus + ambiguous
~            elif (len(max_atoms) == 1) and
((float(max_size)/float(num_atoms))
~                                         >= threshold):
~                consensus = consensus + max_atoms[0]
~            else:
~                consensus = consensus + ambiguous

~        # we need to guess a consensus alphabet if one isn't specified
~        if consensus_alpha is None:
~            consensus_alpha = self._guess_consensus_alphabet()

~        return Seq(consensus, consensus_alpha)

- --
Best regards,

//=\ Sebastian Bassi - Diplomado en Ciencia y Tecnologia, UNQ   //=\
\=// IT Manager Advanta Seeds - Balcarce Research Center -      \=//
//=\ Pro secretario ASALUP - www.asalup.org - PGP key available //=\
\=// E-mail: sbassi at genesdigitales.com - ICQ UIN: 3356556 -     \=//

~                http://Bioinformatica.info

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAj7ohiIACgkQ6lc7ixf6gKpjGwCggxTet25TfyqWrvhJRcrUmglt
ONsAn2lv6xQOnDUlKKKKABrJiDVVFG5x
=CvDr
-----END PGP SIGNATURE-----