[Biopython-dev] [Biopython] Alignment object

Brad Chapman chapmanb at 50mail.com
Wed Oct 28 08:18:33 EDT 2009


Peter and Eric;
[Moving this over to biopython-dev and changing the subject]

> > Here's +1 for Python counting. That would match SeqFeature and the
> > ProteinDomain class in Bio.Tree.PhyloXML.

Agreed. My opinion on the 0/1 mess is that data objects in code
should expose all of the coordinates as 0-based, and that output and
display files meant for biologists should be 1-based.

> > While we're on this topic -- I have some unpublished code for rendering an
> > alignment object in HTML, with plans for colorization, conservation
> > profiles, etc. I rolled my own alignment class since the one in
> > Bio.Align.Generic didn't have the attributes (start, end, selected columns)
> > for a particular file format I was parsing. It's not urgent, but at some
> > point could you publish your plans for the Alignment classes so I (and
> > probably others) can stay/become compatible?
> 
> My rough work in progress in on github - at the moment I'm still trying
> things out, and don't assume anything is set in stone. If you want to
> have a play with this code, feedback is very welcome - probably best
> on the dev list rather than here. See:
> 
> http://github.com/peterjc/biopython/tree/seqrecords
> 
> (a lot of the alignment things I want to support, like slicing and adding
> are very closely linked to doing the same operations to SeqRecords)

The bx-python alignment object is nice and goes to/from MAF and AXT
formats:

http://bitbucket.org/james_taylor/bx-python/src/tip/lib/bx/align/core.py

This supports slicing by alignment coordinates and by reference
coordinates for a species in the alignment. Some other useful
features are limiting the alignment to specific species and removing
all gap columns that can result. The representation is a high level
Alignment object containing multiple Components.

You can also index the files for quick lookup via range queries:

http://bitbucket.org/james_taylor/bx-python/src/tip/lib/bx/interval_index_file.py
http://bcbio.wordpress.com/2009/07/26/sorting-genomic-alignments-using-python/

It's a nice implementation; it would be good to stay compatible with it and leverage
as much as we can from what they've done.

Brad


More information about the Biopython-dev mailing list