[BioPython] gapped consensus?
Sebastian Bassi
sbassi at asalup.org
Thu Jun 12 11:54:43 EDT 2003
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
I would like to have the chance to have "gapped consensus".
Now the dumb_consensus (on Align.AlignInfo) work like this:
AATGGGTCA---GTGACG
AA-GGGTGA---GTCACG
AATGGGTGA--GGTCACG
AATGG-TGAAAGGTCACG
Gives this consensus:
AATGGGTGAAAGGTCACG
But I'm working on a program to design primers based on INDELS, so it
would be usefull for me a result like this:
AA-GG-TGA---GTCACG
Each column with a -, the - will be on the consensus.
Another posibility (closer than what the program actually does) would be:
AATGGGTGA---GTCACG
In this way the - (dashes, gaps) would be weighted as any other character.
I think a little hack on the dumb_consensus function could allow me to
do it. I even comment out the place where it chacks for - or . before
entering on the sum, but I still get letters instead of -.
Here is the original relevant code:
def dumb_consensus(self, threshold = .7, ambiguous = "N",
~ consensus_alpha = None, require_multiple = 0):
~ """Output a fast consensus sequence of the alignment.
~ This doesn't do anything fancy at all. It will just go through the
~ sequence residue by residue and count up the number of each type
~ of residue (ie. A or G or T or C for DNA) in all sequences in the
~ alignment. If the percentage of the most common residue type is
~ greater then the passed threshold, then we will add that residue
type,
~ otherwise an ambiguous character will be added.
~ This could be made a lot fancier (ie. to take a substitution matrix
~ into account), but it just meant for a quick and dirty consensus.
~ Arguments:
~ o threshold - The threshold value that is required to add a
particular
~ atom.
~ o ambiguous - The ambiguous character to be added when the
threshold is
~ not reached.
~ o consensus_alpha - The alphabet to return for the consensus
sequence.
~ If this is None, then we will try to guess the alphabet.
~ o require_multiple - If set as 1, this will require that more than
~ 1 sequence be part of an alignment to put it in the consensus (ie.
~ not just 1 sequence and gaps).
~ """
~ consensus = ''
~ # find the length of the consensus we are creating
~ con_len = self.alignment.get_alignment_length()
~ # go through each seq item
~ for n in range(con_len):
~ # keep track of the counts of the different atoms we get
~ atom_dict = {}
~ num_atoms = 0
~ for record in self.alignment._records:
~ # make sure we haven't run past the end of any sequences
~ # if they are of different lengths
~ if n < len(record.seq):
~ #if record.seq[n] != '-' and record.seq[n] != '.':
~ if record.seq[n] not in atom_dict.keys():
~ atom_dict[record.seq[n]] = 1
~ else:
~ atom_dict[record.seq[n]] = \
~ atom_dict[record.seq[n]] + 1
~ num_atoms = num_atoms + 1
~ max_atoms = []
~ max_size = 0
~ for atom in atom_dict.keys():
~ if atom_dict[atom] > max_size:
~ max_atoms = [atom]
~ max_size = atom_dict[atom]
~ elif atom_dict[atom] == max_size:
~ max_atoms.append(atom)
~ if require_multiple and num_atoms == 1:
~ consensus = consensus + ambiguous
~ elif (len(max_atoms) == 1) and
((float(max_size)/float(num_atoms))
~ >= threshold):
~ consensus = consensus + max_atoms[0]
~ else:
~ consensus = consensus + ambiguous
~ # we need to guess a consensus alphabet if one isn't specified
~ if consensus_alpha is None:
~ consensus_alpha = self._guess_consensus_alphabet()
~ return Seq(consensus, consensus_alpha)
- --
Best regards,
//=\ Sebastian Bassi - Diplomado en Ciencia y Tecnologia, UNQ //=\
\=// IT Manager Advanta Seeds - Balcarce Research Center - \=//
//=\ Pro secretario ASALUP - www.asalup.org - PGP key available //=\
\=// E-mail: sbassi at genesdigitales.com - ICQ UIN: 3356556 - \=//
~ http://Bioinformatica.info
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAj7ohiIACgkQ6lc7ixf6gKpjGwCggxTet25TfyqWrvhJRcrUmglt
ONsAn2lv6xQOnDUlKKKKABrJiDVVFG5x
=CvDr
-----END PGP SIGNATURE-----
More information about the BioPython
mailing list