[BioPython] Bio.AlignIO and the Alignment object

Peter peter at maubp.freeserve.co.uk
Sat Jul 12 13:08:25 UTC 2008


Now that Biopython 1.47 is out, I hope some of you have had a chance
to try out the new Bio.AlignIO module.

Let me just point out that this is all talking about multiple sequence
alignments where each (gapped) sequence is the same length, and the
data can be seen as a matrix or grid of letters.  I am not talking
about the related concept of an EST or contig alignment which doesn't
really fit this image (although it could if you add large leading and
trailing gap sequences).

We've had some useful feedback from Sebastian Bassi, leading me to
suspect that both the Bio.SeqIO and Bio.AlignIO input functions would
benefit from an optional alphabet argument for all those file formats
which do not declare the sequence type (enhancement bug 2443).
http://bugzilla.open-bio.org/show_bug.cgi?id=2443

Any other issues or questions about using the new Bio.AlignIO module?
I've tried to explain how to use it in the Tutorial (Chapter 5).
http://biopython.org/DIST/docs/tutorial/Tutorial.html

Any "beginners questions" on the mailing list would be useful for
pointing out how the documentation could be improved.  This would also
be useful for the wiki page which is currently a little empty - I
didn't want to just cut and past the examples from the main tutorial.
http://biopython.org/wiki/AlignIO

I'd also like to hear if there are any alignment file formats you
think Bio.AlignIO should support.

Finally, it would be great to also have some comments about the
Alignment object itself.  In particular we have some ideas on Bug 1944
for making it act more like an array of letters by supporting double
indexing like a Numeric or numpy array/matrix class.  Right now, the
next little step I want to implement is allowing access to the rows of
the alignment as SeqRecord objects using alignment[index], rather than
alignment.get_all_seqs()[index] which I find cumbersome.
http://bugzilla.open-bio.org/show_bug.cgi?id=1944

Thanks,

Peter



More information about the Biopython mailing list