[BioPython] Bio.AlignIO and the Alignment object

Peter peter at maubp.freeserve.co.uk
Sat Jul 12 13:16:32 UTC 2008


> Finally, it would be great to also have some comments about the
> Alignment object itself.  In particular we have some ideas on Bug 1944
> for making it act more like an array of letters by supporting double
> indexing like a Numeric or numpy array/matrix class.
>
> Right now, the next little step I want to implement is allowing access to
> the rows of the alignment as SeqRecord objects using alignment[index],
> rather than alignment.get_all_seqs()[index] which I find cumbersome.
> http://bugzilla.open-bio.org/show_bug.cgi?id=1944

Here is a quick example using one of the test input files found in the
source code under Test/Clustalw:

>>> from Bio import AlignIO
>>> alignment = AlignIO.read(open("opuntia.aln"), "clustal")
>>> print alignment
SingleLetterAlphabet() alignment with 7 rows and 156 columns
TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA
gi|6273285|gb|AF191659.1|AF191
TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA
gi|6273284|gb|AF191658.1|AF191
TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA
gi|6273287|gb|AF191661.1|AF191
TATACATAAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA
gi|6273286|gb|AF191660.1|AF191
TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA
gi|6273290|gb|AF191664.1|AF191
TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA
gi|6273289|gb|AF191663.1|AF191
TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAG...AGA
gi|6273291|gb|AF191665.1|AF191

You can iterate over the rows as SeqRecord objects:

>>> for record in alignment :
...     print "%s length %i" % (record.id, len(record))
gi|6273285|gb|AF191659.1|AF191 length 156
gi|6273284|gb|AF191658.1|AF191 length 156
gi|6273287|gb|AF191661.1|AF191 length 156
gi|6273286|gb|AF191660.1|AF191 length 156
gi|6273290|gb|AF191664.1|AF191 length 156
gi|6273289|gb|AF191663.1|AF191 length 156
gi|6273291|gb|AF191665.1|AF191 length 156

Right now (Biopython 1.47) to get a particular row out as a SeqRecord
you must use the get_all_seqs() function:

>>> record = alignment.get_all_seqs()[3]
>>> print record.id
gi|6273286|gb|AF191660.1|AF191

I'd like to be able to do this instead:

>>> record = alignment[3]
>>> print record

This would render the get_all_seqs() function effectively obsolete,
and it could be deprecated in a later release.

Peter



More information about the Biopython mailing list