[Biopython-dev] MAF Parser/Writer/Indexer

Brad Chapman chapmanb at 50mail.com
Sun May 15 15:39:59 UTC 2011


Andrew and Peter;
Thanks for working on MAF parsing and interval access in general.
A few thoughts below:

> > I'd like to contribute MAF parser/writer classes to Bio.AlignIO.  MAF is an
> > alignment format used for whole genome alignments, as in the 30-way (or
> > more) multiz alignments at UCSC:
[...]
> > The value of this format to most users will come from the ability to
> > extract sequences from an arbitrary number of species that align to
> > a particular sequence range in a particular genome, at random.  We

> I've spoken to Andrew briefly before this, and I'm keen to get
> the core functionality of parsing and writing MAF alignments
> added to AlignIO. His other ideas for indexing these alignments
> are much more interesting - and part of a more general topic
> related to things like Ace alignments, or SAM/BAM alignments.

We may want to take a look at the interval access functionality in
bx-python and MAF parsing tied in with this:

https://bitbucket.org/james_taylor/bx-python/src/tip/lib/bx/interval_index_file.py
https://bitbucket.org/james_taylor/bx-python/src/tip/lib/bx/align/maf.py

Here is a worked example:

http://bcbio.wordpress.com/2009/07/26/sorting-genomic-alignments-using-python/

It would be useful to have an API that queries across bx-python intervals, 
BAM intervals and other formats.

Brad



More information about the Biopython-dev mailing list