[Biopython-dev] Fwd: New Biopython (sub)module?

Tue Sep 17 19:20:46 UTC 2013

Hi Eric,

We're glad you like MOSAIC! It's exciting to start getting it out there.
Just as a quick update, the latest version of the paper is available on
arxiv <http://arxiv.org/abs/1309.2319>. In addition, updated documentation,
relevant files, etc. can be found
here<http://pythonhosted.org/bio-MOSAIC/index.html>
.

The module has also been uploaded to PyPI, so it can now be installed
with easy_install
bio-mosaic.

Given the importance of ortholog detection to a broad range of
computational biology tasks, we definitely think it's worth putting in a
little extra work and making a few sacrifices to make this tool more
broadly and conveniently available to the community.

So if you're game, we would love to start thinking about timelines for
making any necessary changes. We really appreciate your comments so far.
Below are some initial thoughts/replies:

============

*- Zheng Ruan has written a nice codon alignment module as part of his GSoC
project. Once that's merged, you'll be able to drop the pal2nal dependency.
*
*
*
This is a great idea and we'd be happy to incorporate it.

*- We haven't merged Chris's MSAprobs wrapper yet (to my knowledge), though
at a glance it looks like it should be straightforward. For Bio.mosaic (I
guess?), we would probably wait until the wrapper is merged and then remove
the conditional in mosaic.
* *
*
Sounds good!
*
*
*- Does EMBOSS stretcher do anything that couldn't be done with
Bio.pairwise2? If not, you could use pairwise2 instead and avoid another
dependency.

*
Pairwise alignment constitutes a significant portion of MOSAIC's run time.
stretcher was chosen because of its speed. How about this: we could test if
stretcher is installed, and if it's not, we can 1.) fall back to
Bio.pairwise2 and 2.) provide a helpful warning about slowdown with a
direct link to the latest EMBOSS toolkit. What do you think?
*
*
* - The use of pandas looks fairly basic and therefore also avoidable. It
looks like with a few more lines of code you could use Python's built-in
csv module to parse a table and store it in a numpy matrix instead.

*
You're totally right. We can do that.
*
*
*- MOSAIC does some logging to the console, which is sensible for the
program but isn't done as much in Biopython. Some of these print statements
could be changed to warnings (see the warnings module). The progress
indicators could maybe be toggled at the function level with a keyword
argument, e.g. verbose=True/False.*

Consider it done!

============

Thanks again for your feedback! Looking forward to hearing further
comments/next steps, etc...

Cheers,

-Cyrus

On Fri, Sep 13, 2013 at 1:08 PM, Eric Talevich <eric.talevich at gmail.com>wrote:

> On Thu, Aug 22, 2013 at 6:01 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:
>
>> On Wed, Aug 21, 2013 at 11:00 PM, Cyrus Maher <michael.maher at ucsf.edu>
>> wrote:
>> >
>> > That said, I was also hoping to get your thoughts on whether this seemed
>> > like the type of project that would fit in with Biopython. Peter said
>> that
>> > Eric might have some good comments on this matter?
>>
>> Right - I was thinking Eric and this year's phylogenetic focused GSoC
>> students should have some good comments, e.g. about adding
>> something like pal2nal into Biopython.
>>
>> Peter
>>
>
> Hi Cyrus,
>
> MOSAIC looks cool, it's always good to see progress in ortholog detection.
> Since the core of the program is a single Python module, it shouldn't be
> too hard to plug this into Biopython. Keep in mind, though, that once
> MOSAIC is in the Biopython source tree it could become less convenient for
> you to make major updates and changes to the program, whereas if you
> control the packaging yourself you're free to change the API, add
> dependencies, etc. however you like. So, for the manuscript/publication at
> least, you might find it safer to only state that distributing MOSAIC with
> Biopython is planned, rather than committing to a release version number.
>
> Thoughts on the code:
>
> - Zheng Ruan has written a nice codon alignment module as part of his GSoC
> project. Once that's merged, you'll be able to drop the pal2nal dependency.
>
> - We haven't merged Chris's MSAprobs wrapper yet (to my knowledge), though
> at a glance it looks like it should be straightforward. For Bio.mosaic (I
> guess?), we would probably wait until the wrapper is merged and then remove
> the conditional in mosaic.
>
> - Does EMBOSS stretcher do anything that couldn't be done with
> Bio.pairwise2? If not, you could use pairwise2 instead and avoid another
> dependency.
>
> - The use of pandas looks fairly basic and therefore also avoidable. It
> looks like with a few more lines of code you could use Python's built-in
> csv module to parse a table and store it in a numpy matrix instead.
>
> - MOSAIC does some logging to the console, which is sensible for the
> program but isn't done as much in Biopython. Some of these print statements
> could be changed to warnings (see the warnings module). The progress
> indicators could maybe be toggled at the function level with a keyword
> argument, e.g. verbose=True/False.
>
>
> Cheers,
> Eric
>