[BioPython] new version of AlignInfo: Now you can choose to have gap consensus

Sebastian Bassi sbassi at asalup.org
Thu Jun 12 16:01:01 EDT 2003


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

Here is a new version of AlignInfo. It's based on Biophyton 1.1. I
changed the function dumb_align, now it looks like this:
(my e-mail client cuts the lines, so use the attached version, the
inline text version is just to look at it).

~    def dumb_consensus(self, threshold = .7, ambiguous = "N",
~                       consensus_alpha = None, require_multiple = 0, gap
= 0):
~        """Output a fast consensus sequence of the alignment.

~        This doesn't do anything fancy at all. It will just go through the
~        sequence residue by residue and count up the number of each type
~        of residue (ie. A or G or T or C for DNA) in all sequences in the
~        alignment. If the percentage of the most common residue type is
~        greater then the passed threshold, then we will add that residue
type,
~        otherwise an ambiguous character will be added.

~        This could be made a lot fancier (ie. to take a substitution matrix
~        into account), but it just meant for a quick and dirty consensus.

~        Arguments:
~        o threshold - The threshold value that is required to add a
particular
~        atom.
~        o ambiguous - The ambiguous character to be added when the
threshold is
~        not reached.
~        o consensus_alpha - The alphabet to return for the consensus
sequence.
~        If this is None, then we will try to guess the alphabet.
~        o require_multiple - If set as 1, this will require that more than
~        1 sequence be part of an alignment to put it in the consensus (ie.
~        not just 1 sequence and gaps).
~        o gap - If set as 1, it will allow to display gaps on the
consensus.
~        Useful for indel search.
~        """
~        consensus = ''

~        # find the length of the consensus we are creating
~        con_len = self.alignment.get_alignment_length()
~        # gap=0 doesn't take into account the gaps. This is default
behavior.
~        if gap == 0:
~            # go through each seq item
~            for n in range(con_len):
~                # keep track of the counts of the different atoms we get
~                atom_dict = {}
~                num_atoms = 0

~                for record in self.alignment._records:
~                    # make sure we haven't run past the end of any
sequences
~                    # if they are of different lengths
~                    if n < len(record.seq):
~                        if record.seq[n] != '-' and record.seq[n] != '.':
~                            if record.seq[n] not in atom_dict.keys():
~                                atom_dict[record.seq[n]] = 1
~                            else:
~                                atom_dict[record.seq[n]] = \
~                                  atom_dict[record.seq[n]] + 1

~                            num_atoms = num_atoms + 1

~                max_atoms = []
~                max_size = 0

~                for atom in atom_dict.keys():
~                    if atom_dict[atom] > max_size:
~                        max_atoms = [atom]
~                        max_size = atom_dict[atom]
~                    elif atom_dict[atom] == max_size:
~                        max_atoms.append(atom)

~                if require_multiple and num_atoms == 1:
~                    consensus = consensus + ambiguous
~                elif (len(max_atoms) == 1) and
((float(max_size)/float(num_atoms))
~                                             >= threshold):
~                    consensus = consensus + max_atoms[0]
~                else:
~                    consensus = consensus + ambiguous
~        # gap!=0 Gaps are treated like any other atom.
~        else:
~            # go through each seq item
~            for n in range(con_len):
~                # keep track of the counts of the different atoms we get
~                atom_dict = {}
~                num_atoms = 0

~                for record in self.alignment._records:
~                    # make sure we haven't run past the end of any
sequences
~                    # if they are of different lengths
~                    if n < len(record.seq):
~                        if record.seq[n] not in atom_dict.keys():
~                            atom_dict[record.seq[n]] = 1
~                        else:
~                            atom_dict[record.seq[n]] = \
~                              atom_dict[record.seq[n]] + 1

~                        num_atoms = num_atoms + 1

~                max_atoms = []
~                max_size = 0

~                for atom in atom_dict.keys():
~                    if atom_dict[atom] > max_size:
~                        max_atoms = [atom]
~                        max_size = atom_dict[atom]
~                    elif atom_dict[atom] == max_size:
~                        max_atoms.append(atom)

~                if require_multiple and num_atoms == 1:
~                    consensus = consensus + ambiguous
~                elif (len(max_atoms) == 1) and
((float(max_size)/float(num_atoms))
~                                             >= threshold):
~                    consensus = consensus + max_atoms[0]
~                else:
~                    consensus = consensus + ambiguous
- --
Best regards,

//=\ Sebastian Bassi - Diplomado en Ciencia y Tecnologia, UNQ   //=\
\=// IT Manager Advanta Seeds - Balcarce Research Center -      \=//
//=\ Pro secretario ASALUP - www.asalup.org - PGP key available //=\
\=// E-mail: sbassi at genesdigitales.com - ICQ UIN: 3356556 -     \=//

~                http://Bioinformatica.info

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAj7ov90ACgkQ6lc7ixf6gKqMyQCfXRp/vw5N73Xj7wy1u0gtYs/3
FPYAoNC4cSihUsQPNOgbwNGq1YwM99Qa
=6cyt
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AlignInfo.zip
Type: application/x-zip-compressed
Size: 6842 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython/attachments/20030612/18daf51a/AlignInfo.bin


More information about the BioPython mailing list