[Biopython-dev] Fwd: New Biopython (sub)module?

Eric Talevich eric.talevich at gmail.com
Fri Sep 13 16:08:12 EDT 2013


On Thu, Aug 22, 2013 at 6:01 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Wed, Aug 21, 2013 at 11:00 PM, Cyrus Maher <michael.maher at ucsf.edu>
> wrote:
> >
> > That said, I was also hoping to get your thoughts on whether this seemed
> > like the type of project that would fit in with Biopython. Peter said
> that
> > Eric might have some good comments on this matter?
>
> Right - I was thinking Eric and this year's phylogenetic focused GSoC
> students should have some good comments, e.g. about adding
> something like pal2nal into Biopython.
>
> Peter
>

Hi Cyrus,

MOSAIC looks cool, it's always good to see progress in ortholog detection.
Since the core of the program is a single Python module, it shouldn't be
too hard to plug this into Biopython. Keep in mind, though, that once
MOSAIC is in the Biopython source tree it could become less convenient for
you to make major updates and changes to the program, whereas if you
control the packaging yourself you're free to change the API, add
dependencies, etc. however you like. So, for the manuscript/publication at
least, you might find it safer to only state that distributing MOSAIC with
Biopython is planned, rather than committing to a release version number.

Thoughts on the code:

- Zheng Ruan has written a nice codon alignment module as part of his GSoC
project. Once that's merged, you'll be able to drop the pal2nal dependency.

- We haven't merged Chris's MSAprobs wrapper yet (to my knowledge), though
at a glance it looks like it should be straightforward. For Bio.mosaic (I
guess?), we would probably wait until the wrapper is merged and then remove
the conditional in mosaic.

- Does EMBOSS stretcher do anything that couldn't be done with
Bio.pairwise2? If not, you could use pairwise2 instead and avoid another
dependency.

- The use of pandas looks fairly basic and therefore also avoidable. It
looks like with a few more lines of code you could use Python's built-in
csv module to parse a table and store it in a numpy matrix instead.

- MOSAIC does some logging to the console, which is sensible for the
program but isn't done as much in Biopython. Some of these print statements
could be changed to warnings (see the warnings module). The progress
indicators could maybe be toggled at the function level with a keyword
argument, e.g. verbose=True/False.


Cheers,
Eric


More information about the Biopython-dev mailing list