[BioPython] From alignment column pos to seq nucleotide pos

Iddo Friedberg idoerg@cc.huji.ac.il
Wed, 10 Oct 2001 11:58:23 +0200 (GMT+0200)


On Mon, 8 Oct 2001, Brad Chapman wrote:

:
: I think it would be nice to have something like this in the
: Alignment class, since this is a general question people should be
: able to ask (in my opinion). Based on Iddo'sidea, I coded up a
: function to do this. I changed the algorithm a little bit: instead
: of using the assumption that the original sequence has no gap
: characters, I make the assumption that the user of the function is
: passing in stuff that "makes sense"(ie. the sequence you want to
: find the position in actually matches the aligned sequence you say
: it matches).

1) Cool!

2) There is something here which I don't get. In this function, you seem
to pass both the sequence (original_seq) and the index in the alignment
(seq_number). So, if the user already knows seq_number, why pass
original_seq as well, except for sanity checking? If this is only a sanity
check, maybe original_seq should be passed as a last, optional argument,
something like:

def original_sequence_pos(self, col_pos, seq_number, *original_seq):

So that if original_seq is passed, sanity checking will take place.
Otherwise, just count gaps, and subtract the number of gaps from the
length of the sequence (up to the point of the column). Or am I missing
something here completely?

Also, scroll down for a little nit-picking in the last-minute check.

Iddo

:
: The code is pasted in below. What do people think about this? Would
: it be a good thing to include in the Alignment class (or in
: AlignInfo.SummaryInfo)? Any other ideas/suggestions/comments?
:
:   def original_sequence_pos(self, col_pos, original_seq, seq_number):
:       """Given an alignment position, find the position in a sequence.
:
:       When given col_pos, the number of a column in this alignment, this
:       function finds the corresponding position is the original (unaligned)
:       sequence.
:
:       original_seq is the sequence you want to find the position in
:       and seq_number is the numerical position of the corresponding
:       aligned sequence in this alignment.
:
:       This works by moving along the aligned sequence and correspondingly
:       stepping along the original sequence until we come to col_pos.
:       The position we are at in the sequence is then returned.
:       """
:       aligned_seq = self._records[seq_number].seq
:       seq_pos = 0
:       for index in range(col_pos):
:           # if we hit a residue that was in the original sequence
:           # (not a character from the alignment), then we increment
:           # the position we are at in the original sequence
:           if aligned_seq[index].upper() == original_seq[seq_pos].upper():
:               seq_pos += 1
:           # otherwise we should have hit something in the alignment,
:           # which should be a gap character
:           else:
:               assert aligned_seq[index] == aligned_seq.alphabet.gap_char, \
:                 "Got an unexpected non-aligned character: %s" % \
:                 (aligned_seq[index])
:
:       # an last minute sanity check to be sure that we were asked
:       # to find a position that was actually in the original sequence
:       assert aligned_seq[col_pos] == original_seq[seq_pos], \

Iddo:
Shouldn't this be:
:       assert aligned_seq[col_pos].upper() == original_seq[seq_pos].upper(), \



:         "Residue in original sequence does not match aligned sequence."
:
:       return seq_pos
:
: Brad
: --
: PGP public key available from http://pgp.mit.edu/
: _______________________________________________
: BioPython mailing list-  BioPython@biopython.org
: http://biopython.org/mailman/listinfo/biopython
:

--

Iddo Friedberg                                  | Tel: +972-2-6757374
Dept. of Molecular Genetics and Biotechnology   | Fax: +972-2-6757308
The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il
POB 12272, Jerusalem 91120                      |
Israel                                          |
http://bioinfo.md.huji.ac.il/marg/people-home/iddo/