[Biopython-dev] Alignment object

Peter biopython at maubp.freeserve.co.uk
Tue Mar 2 07:25:05 EST 2010


On Wed, Oct 28, 2009 at 12:18 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>Peter wrote:
>> My rough work in progress in on github - at the moment I'm still trying
>> things out, and don't assume anything is set in stone. If you want to
>> have a play with this code, feedback is very welcome - probably best
>> on the dev list rather than here. See:
>>
>> http://github.com/peterjc/biopython/tree/seqrecords
>>
>> (a lot of the alignment things I want to support, like slicing and adding
>> are very closely linked to doing the same operations to SeqRecords)

Here is a new branch implementing a multiple-sequence-alignment
class (living under Bio.Align for now) based on the recent support
for slicing and adding SeqRecord objects:

http://github.com/peterjc/biopython/tree/alignment-obj

This handles most of the basic tasks I want to be able to easily do
with classical alignments, based on previous discussions on the
mailing list and/or bugzilla:

http://bugzilla.open-bio.org/show_bug.cgi?id=2551
http://bugzilla.open-bio.org/show_bug.cgi?id=2552
http://bugzilla.open-bio.org/show_bug.cgi?id=2553
http://bugzilla.open-bio.org/show_bug.cgi?id=2554

At its core, the alignment is still held as a list of SeqRecord objects,
which should mean minimal problems with backwards compatibility.

If anyone would like to try out the code, comments would be very
welcome. There are plenty of doctests in the docstrings which
should explain how I expect things to work.

> The bx-python alignment object is nice and goes to/from MAF
> and AXT formats:
>
> http://bitbucket.org/james_taylor/bx-python/src/tip/lib/bx/align/core.py
>
> This supports slicing by alignment coordinates and by reference
> coordinates for a species in the alignment. Some other useful
> features are limiting the alignment to specific species and removing
> all gap columns that can result. The representation is a high level
> Alignment object containing multiple Components.

My code does not (yet) attempt to deal with next-gen sequencing
alignments, which would require padding all the (short) reads with
leading and trailing gaps to ensure all rows of the alignment have
the same length. Doing this in a memory efficient way could be
done with a PaddedSeq object, or a very different alignment object
(hold read and their offsets in memory). I'm not sure what is best,
but the bx-python model looks worth understanding to help decide.

Perhaps until this is settled, it would be premature to merge my
alignment class to the trunk. After all, we may need to tweak the
alignment object class heirachy.

Peter


More information about the Biopython-dev mailing list