[BioPython] new version of AlignInfo: Now you can choose to have
gap consensus
Sebastian Bassi
sbassi at asalup.org
Thu Jun 12 16:01:01 EDT 2003
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
Here is a new version of AlignInfo. It's based on Biophyton 1.1. I
changed the function dumb_align, now it looks like this:
(my e-mail client cuts the lines, so use the attached version, the
inline text version is just to look at it).
~ def dumb_consensus(self, threshold = .7, ambiguous = "N",
~ consensus_alpha = None, require_multiple = 0, gap
= 0):
~ """Output a fast consensus sequence of the alignment.
~ This doesn't do anything fancy at all. It will just go through the
~ sequence residue by residue and count up the number of each type
~ of residue (ie. A or G or T or C for DNA) in all sequences in the
~ alignment. If the percentage of the most common residue type is
~ greater then the passed threshold, then we will add that residue
type,
~ otherwise an ambiguous character will be added.
~ This could be made a lot fancier (ie. to take a substitution matrix
~ into account), but it just meant for a quick and dirty consensus.
~ Arguments:
~ o threshold - The threshold value that is required to add a
particular
~ atom.
~ o ambiguous - The ambiguous character to be added when the
threshold is
~ not reached.
~ o consensus_alpha - The alphabet to return for the consensus
sequence.
~ If this is None, then we will try to guess the alphabet.
~ o require_multiple - If set as 1, this will require that more than
~ 1 sequence be part of an alignment to put it in the consensus (ie.
~ not just 1 sequence and gaps).
~ o gap - If set as 1, it will allow to display gaps on the
consensus.
~ Useful for indel search.
~ """
~ consensus = ''
~ # find the length of the consensus we are creating
~ con_len = self.alignment.get_alignment_length()
~ # gap=0 doesn't take into account the gaps. This is default
behavior.
~ if gap == 0:
~ # go through each seq item
~ for n in range(con_len):
~ # keep track of the counts of the different atoms we get
~ atom_dict = {}
~ num_atoms = 0
~ for record in self.alignment._records:
~ # make sure we haven't run past the end of any
sequences
~ # if they are of different lengths
~ if n < len(record.seq):
~ if record.seq[n] != '-' and record.seq[n] != '.':
~ if record.seq[n] not in atom_dict.keys():
~ atom_dict[record.seq[n]] = 1
~ else:
~ atom_dict[record.seq[n]] = \
~ atom_dict[record.seq[n]] + 1
~ num_atoms = num_atoms + 1
~ max_atoms = []
~ max_size = 0
~ for atom in atom_dict.keys():
~ if atom_dict[atom] > max_size:
~ max_atoms = [atom]
~ max_size = atom_dict[atom]
~ elif atom_dict[atom] == max_size:
~ max_atoms.append(atom)
~ if require_multiple and num_atoms == 1:
~ consensus = consensus + ambiguous
~ elif (len(max_atoms) == 1) and
((float(max_size)/float(num_atoms))
~ >= threshold):
~ consensus = consensus + max_atoms[0]
~ else:
~ consensus = consensus + ambiguous
~ # gap!=0 Gaps are treated like any other atom.
~ else:
~ # go through each seq item
~ for n in range(con_len):
~ # keep track of the counts of the different atoms we get
~ atom_dict = {}
~ num_atoms = 0
~ for record in self.alignment._records:
~ # make sure we haven't run past the end of any
sequences
~ # if they are of different lengths
~ if n < len(record.seq):
~ if record.seq[n] not in atom_dict.keys():
~ atom_dict[record.seq[n]] = 1
~ else:
~ atom_dict[record.seq[n]] = \
~ atom_dict[record.seq[n]] + 1
~ num_atoms = num_atoms + 1
~ max_atoms = []
~ max_size = 0
~ for atom in atom_dict.keys():
~ if atom_dict[atom] > max_size:
~ max_atoms = [atom]
~ max_size = atom_dict[atom]
~ elif atom_dict[atom] == max_size:
~ max_atoms.append(atom)
~ if require_multiple and num_atoms == 1:
~ consensus = consensus + ambiguous
~ elif (len(max_atoms) == 1) and
((float(max_size)/float(num_atoms))
~ >= threshold):
~ consensus = consensus + max_atoms[0]
~ else:
~ consensus = consensus + ambiguous
- --
Best regards,
//=\ Sebastian Bassi - Diplomado en Ciencia y Tecnologia, UNQ //=\
\=// IT Manager Advanta Seeds - Balcarce Research Center - \=//
//=\ Pro secretario ASALUP - www.asalup.org - PGP key available //=\
\=// E-mail: sbassi at genesdigitales.com - ICQ UIN: 3356556 - \=//
~ http://Bioinformatica.info
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAj7ov90ACgkQ6lc7ixf6gKqMyQCfXRp/vw5N73Xj7wy1u0gtYs/3
FPYAoNC4cSihUsQPNOgbwNGq1YwM99Qa
=6cyt
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AlignInfo.zip
Type: application/x-zip-compressed
Size: 6842 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython/attachments/20030612/18daf51a/AlignInfo.bin
More information about the BioPython
mailing list