[BioPython] Questions & suggestions

Mon Mar 22 18:06:54 EST 2004

Hey Jeff and everyone;

Me:
> >Yes, I've not been a fan of HappyDoc for a while. I was pointed to,
> >and really like, epydoc. Please take a look at:
> >
> >http://biopython.org/docs/api/private/trees.html
> 
> This looks very nice to me.  Is there any way to ask it to hide private 
> methods or variables, i.e. those that begin with "_"?  Although knowing 
> what those are is occasionally useful, exposing that extra information 
> may be confusing for people reading the docs and trying to figure out 
> how to use the module.

Good points. The one I have linked to is actually the version that
includes private variables. If you subsitution public for private in
the url above (or just click "hide private") at the top the private
functions. 

The problem I've had so far is that epydoc hides some public modules
by labelling them as private. I hadn't figured exactly sure how it 
decides what is public and what is private in terms of modules (it
seems to use the _Underscore for classes and functions, which I'm
happy with). 

I played around with it a bit since then and it looks like it was
using the __all__ variable to determine what it public and private.
To be honest, I'd like to remove the use of __all__ completely
unless people object. Unless I'm mistaken it controls what happens
when people do from Bio import * (or from Bio.Whatever import *).
Doing the import * is pretty discouraged now, and for maintenence it
is fairly annoying to have variables you have to make sure are
updated.

Would anyone object to stop using __all__? Any reasons to keep it? I
may be missing the point of it completely.

> kMeans is superceded by Bio.Cluster, and can be deprecated.  Thomas 
> wrote xkMeans, which is a visualizer for kMeans, and could be rewritten 
> to use Bio.Cluster instead.

Okay. I guess this would involve a couple of steps:

1. Starting to raise a Deprecation Warning for the kMeans module.
2. Trying to write some kind of short document on how to switch from
using kMeans to using Bio.Cluster.kcluster. BioPerl has a document
called DEPRECATED with this kind of info -- that seems like a
reasonable step to follow. Jeff and Michiel, would it be possible to
write something up quick.
3. Thomas needs to decide if he wants to rewrite xkMeans or
deprecate it as well.

Also, Thomas did mention the potential usefulness of having both
pure Python and Python/C implementation, in case someone wanted to
use the code for learning purposes. I'm not sure how much this
weighs on people's minds versus maintaining a slimmer code base. It
does seem to me like duplicate versions are a bad for confusion
issues, and because we have limited developer time to maintain and
document things. Anyways, just a point to bring up.

> MarkovModel is redundant with HMM.  Probably only one of them is 
> necessary.

Okay, I wrote HMM a long time ago and really haven't used it much
since then. I think you wrote MarkovModel. Both have tests and
things. MarkovModel has the serious advantage of having a C module
underlying it, which I think makes it the best candidate for
keeping.

I'd be very happy if we could get a volunteer to look at these and
decide if one has more functionality then the other, and then move
forward on this. Anyone excited about volunteering? If I can't get
someone, I can try to look at this myself (but not real soon).

> SVM is superceded by libsvm.  It should be deprecated.
> 
> kNN, LogisticRegression, MaxEntropy, and NaiveBayes are still useful, 
> but need more documentation.  Also, another idea is that they could be 
> donated to the pyml project.  Currently, no code in Biopython depends 
> on them.  However, they might be useful for a microarray package, in 
> which case donating them would introduce another dependency.

Ah, I didn't know about PyML. It does seem like it would be useful
to try and coordinate with their project -- do you happen to know the
author (Stanford connections and all)? Other candidates for donation
are the recently discussed GA and Neural Network packages.

Lots of thoughts. I think for the next release (which I'd like to
try and do soon-like) I think we should work on the kMeans code as a
priority and go from there.

Brad