[Biopython] HMMER / Pfam support

Peter biopython at maubp.freeserve.co.uk
Fri Dec 3 13:22:40 UTC 2010


On Thu, Dec 2, 2010 at 10:53 PM, Kyle <kellrott at gmail.com> wrote:
> I would like to submit my hmmer branch for merge into the main BioPython
> tree, targeting inclusion in 1.57.
> This branch adds support for HMMER3 file parsing and some Pfam related file
> work. It's adapted from the PfamScan perl code found at
> ftp://ftp.sanger.ac.uk/pub/rdf/PfamScanBeta/
> The code can be found at https://github.com/kellrott/biopython/tree/hmmer
>
> Kyle

Hi Kyle,

I've had a quick look at the HMMER bits (but not the PFAM stuff)
and have some initial comments.

I'm concerned about the apparent use of Jaina Mistry & John Tate's
Perl code which is under the GPL v2+ and thus cannot be included
in Biopython. If you basically copied their code and translated it into
Python. I think to be safe you'll have to ask the origin authors'
permission to re-license it for Biopython (MIT/BSD style). If your
code is a fresh implementation using their approach you may be
OK, but the module text should be clarified.

Does hmmscan work on Windows? Would there be much point
writing a Bio.Application style wrapper class for it, rather than or
to be used within your Bio.hmmer.HMMScan function etc? A unit
test for calling the tool would be good, e.g. test_hmmer_tool.py
which can be made conditional on the tool being found.

In Bio.hmmer you have two functions, parseHMMER3 and
parseMultiHMMER3 taking file handles, used for a single
record and multiple records (right?). It would match Biopython
usage to call these read (single) and parse (iterator)

Is there anything here for HMMER2? I saw some apparent
stub entries in the code so I guess not.

A minor thing: In your unit test file test_hmmer.py do you really
need to use the obsolete string module? Can't you use a string
method?

Also since this is quite a long lived branch looking over your
changes isn't so simple what with all the merges. I'd find it
easier to review the changes if you could rebase it off the
current master, e.g. assuming you cloned from *your*
repository on github, and added the official one as remote
name upstream, something like this should do it:

git checkout hmmer
git branch hmmer_dec2010
git checkout hmmer_dec2010
git fetch upsteam
git rebase upstream/master
git push origin hmmer_dec2010

(Untested, but something like that).

Is the PFAM stuff on this branch required for your HMMER
code, or could we deal with the HMMER stuff  separately
first? If so a "clean" new branch with just the HMMER
stuff would be preferable (if it makes life easier, you
could do it as a single commit - assuming you don't care
about the history to date)

Peter



More information about the Biopython mailing list