[Biopython-dev] MUMmer

Peter Cock p.j.a.cock at googlemail.com
Mon May 4 12:02:59 UTC 2009


On Thu, Apr 30, 2009 at 4:23 PM, Marcin Swiatek
<marcin.swiatek at mail.mcgill.ca> wrote:
> Hello,
>
> I guess I should start with a nice 'hi' to everybody, now that I am
> sending my first message to this group. So: Hi, Everybody!

Hi!

> Now, that we have the formality out of the way, I will get to the point.
> Recently, I have written some Python code for parsing and processing the
> output of MUMmer tool (http://mummer.sourceforge.net/). More
> specifically, the code I have manages invocations and handles outputs of
> the nucmer pipeline (alignment of multiple closely related nucleotide
> sequences) and of mummer itself (short exact matches). Obviously, the
> results are ultimately rendered as pairs of biopython's Seq objects.
>
> I use this stuff only myself, in work on bacterial genomes, but I would
> be more than willing to contribute it to the project. It may be rough
> around the edges at the moment, but I think I could easily give it the
> necessary polish if there is interest in having it included.

Great!  I assume your OK with our licence, and there are no problems
from your employer/University with a contribution like this?

> Should that be the case, could one of the project leads point me in the
> right direction, please? How should I go about the submission?

In terms of showing us the code, how do you feel about trying out
github (see Bartek's email)?  Alternatively file and enhancement bug
on our bugzilla and upload your current python file (or a zip file if this
is split up into several modules).

>From your description above it sounds like you have two main lumps
of code: a pairwise alignment parser, and some command line tool
wrappers.

Brad and Bartek have already mentioned returning Alignment objects,
that would let us integrate MUMmer as an input format for Bio.AlignIO,
http://biopython.org/wiki/AlignIO
It may be helpful to have a look at how we parse FASTA output into
pairwise alignments, and also the EMBOSS "pairs" files from needle
and water.

Although (as Brad mentioned), this is currently undergoing a little flux,
for the command line wrappers I'd like this to use our Bio.Application
framework to represent the command line object, giving a string the
user can then invoke as the prefer.  Having the MUMmer wrapper
under Bio.Align.Applications seems sensible at this point.

If you have been lurking on the dev mailing list for a while, these
topics may be familiar already.  If not, have a look over the last
month or so in the archives here:
http://lists.open-bio.org/pipermail/biopython-dev/

Thanks,

Peter



More information about the Biopython-dev mailing list