[Biopython] HMMER / Pfam support

Kyle kellrott at gmail.com
Sat Dec 4 01:50:52 UTC 2010


> I'm concerned about the apparent use of Jaina Mistry & John Tate's
> Perl code which is under the GPL v2+ and thus cannot be included
> in Biopython. If you basically copied their code and translated it into
> Python. I think to be safe you'll have to ask the origin authors'
> permission to re-license it for Biopython (MIT/BSD style). If your
> code is a fresh implementation using their approach you may be
> OK, but the module text should be clarified.
>

I'll contact them (and CC you) off list to get permission to re-license the
code.


> Does hmmscan work on Windows? Would there be much point
> writing a Bio.Application style wrapper class for it, rather than or
> to be used within your Bio.hmmer.HMMScan function etc? A unit
> test for calling the tool would be good, e.g. test_hmmer_tool.py
> which can be made conditional on the tool being found.
>
I don't have a windows machine to test HMMER3 on, so somebody else will have
to check that.
I've started work on a wrapper module for it, as well as the unit testing
code for it.


> In Bio.hmmer you have two functions, parseHMMER3 and
> parseMultiHMMER3 taking file handles, used for a single
> record and multiple records (right?). It would match Biopython
> usage to call these read (single) and parse (iterator)
>

This is patterned after the original implementation. I've only utilized
parseMultiHMMER3 on the front end, but I've left the other methods intact in
case I need them in the future.


> Is there anything here for HMMER2? I saw some apparent
> stub entries in the code so I guess not.
>

Given that HMMER2s last official release was 2003, HMMER3 is much faster,
and that Pfam24 onward require HMMER3, I haven't put any effort into it. But
I've left those reference in case someone demands HMMER2 support. It
wouldn't been too difficult, probably only a few tweeks, to get it too work.
The stub entries are also patterned after the original code, and I just left
them incase I ended up needing them in the future.


> Also since this is quite a long lived branch looking over your
> changes isn't so simple what with all the merges. I'd find it
> easier to review the changes if you could rebase it off the
> current master, e.g. assuming you cloned from *your*
> repository on github, and added the official one as remote
> name upstream, something like this should do it:
>

I've reworked it into the hmmer_dec2010 branch, which comes straight out of
master, with only two revisions.

Is the PFAM stuff on this branch required for your HMMER
> code, or could we deal with the HMMER stuff  separately
> first? If so a "clean" new branch with just the HMMER
> stuff would be preferable (if it makes life easier, you
> could do it as a single commit - assuming you don't care
> about the history to date)
>

The HMMER stuff should be independent of the Pfam module, so it can be
integrated by itself. I've already removed the Pfam stuff from the new
hmmer_dec2010 branch.

Kyle



More information about the Biopython mailing list