[Biopython-dev] Bio.AlignIO

Michiel de Hoon mdehoon at c2b2.columbia.edu
Wed Jul 25 10:44:56 EDT 2007


Peter wrote:
> Personally I see an alignment as both an array of characters (i.e. amino 
> acid residues or nucleotides), and a list of sequences.
> 
> In the same way that a Numeric or NumPy array lets you iterate over 
> rows, yet also access individual elements, we could allow iteration of 
> SeqRecords and also allow access to individual letters.

How about the following:

-Iterators iterate for the SeqRecords in the alignment

-An index of the form [xxx] returns the corresponding SeqRecord

-An index of the form [xxx:yyy:zzz] returns an Alignment object 
containing the SeqRecords in rows [xxx:yyy:zzz]
(compare to the current method get_all_seqs()).

-An index of the form [xxx,:] returns the Seq object of the SeqRecord at 
xxx (this is currently done by the get_seq_by_num() method).

-An index of the form [xxx:yyy:zzz,:] returns a list of Seq objects

-An index of the form [:,www] returns a string containing the characters 
  at column www (which is currently done by the get_column method)

-An index of the form [xxx:yyy:zzz,www] returns a string containing the 
characters at column www using only the rows xxx:yyy:zzz.

-An index of the form [xxx,www] returns a string containing the 
character of the sequence in row xxx at column www.

This is more-or-less how Numerical Python arrays work, except that we'll 
be returning SeqRecord/Seq/string objects depending on the indices.

--Michiel.


More information about the Biopython-dev mailing list