[Biopython-dev] PEP8 lower case module names?

Thu Nov 1 14:46:36 EDT 2012

On Thu, Nov 1, 2012 at 6:10 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Tue, Oct 30, 2012 at 7:03 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>> Since we have no clear consensus, I propose we add Bow's code
>> as Bio.SearchIO (which is how it is written right now), with the new
>> BiopythonExperimentalWarning in place (to alert people that it may
>> change in the next release). We can then rename or move it at a
>> later date. This will make it easier for people to test the code, and
>> also suggest further changes or additions (e.g. Kai's HMMER work).
>>
>> If we and when we agree a consolidation of the Bio.SeqXXX
>> modules, then Bio.SearchIO could move too. If this happens
>> before any public release as Bio.SearchIO so much the better.
>>
>> Adopting lower case module names under Python 3 is also a
>> separate issue.
>>
>> Peter
>>
>
> +1
>
> Regarding the "great upheaval" of module renaming and reorganization:
>
> 0. If the only change is to combine the SeqIO, Seq, SeqRecord and
> SeqFeature classes under a single module, we probably can do that
> in a backwards-compatible way. But that means keeping our
> StudlyCaps module names for the most part.

Yes, that is something we could do in a backwards compatible way,
with the old "StdulyCaps" Bio.SeqXXX modules persisting as legacy
imports for at least a year (say). But it is worth it? See below.

> 1. If we're going to change the API substantially, we might as well "do it
> right". Besides our PEP8 non-compliance, there are some dark, dusty corners
> of Biopython that we ought to clean up while we're at it -- reorganize the
> little historical fiefdoms into a coherent structure. We'd call it Biopython
> 2.

Absolutely there are things we've lived with out of backwards
compatibility - the Alphabet objects are one example (foremost
the way gaps and stops codons were done with wrapper objects).
I'd also like us to switch the restriction digest module to using zero
based counting as Guido intended, and simplify some of the
more 'magical' code which has caused trouble porting to the
other Python implementations.

> 2. Observing BioPerl and BioRuby, it could make sense to split the
> distribution into multiple, with a sequence- and data-oriented
> "biopython-core" package and separate packages for, say, 3D structures
> ("biopython-struct") and perhaps other existing components that have ready
> maintainers and which the "core" of Biopython doesn't rely on. I don't think
> we need to fragment the code base much, primarily just extract PDB, SCOP and
> the other parts that depend on NumPy. On GitHub, these repositories would
> still be under the biopython organization name.

A clearer divide would be good - something we have at some level
already along the lines with and without numpy. However, given
the still unclear future for python packaging I'm not quite so sure
if we can/should go all the way to separate packages. Perhaps I
am being unduly worried by the concerns in the numpy/scipy
community? After all, we have no fortran code!

> 3. If we've decided to focus on Python 3 for the reorganization, we can take
> advantage of new features in that lineage for packaging, organization and
> distribution. These features could make it easier to have side-by-side
> Biopython 1 and 2 installations (maybe), and also plugging additional
> modules into the main "bio" package (namespace packages, new in Py3.3).

We can and should port the current namespace to Python 3, but
writing "Biopython 2" for Python 3 only (not Python 2) sounds wise.
More on this below.

> 4. Naming: "bio" is clean but might cause problems on Windows? (I wouldn't
> know, nyah); "bio2" is nearly as clean; "biopy" follows the numpy/scipy
> convention.

As noted before, we couldn't use "bio" on the average Mac either - the
default file system is like Windows, case insensitive.

The name biopy is in-line with bumpy/scipy, which is a plus. I know
not everyone liked this name, but personally it seems fine. Better
than bio2 in my view.

> 5. Porting: I, personally, would keep using the old Biopython for everything
> that's meant to run on Python 2, which is, currently, everything. Biopython2
> running on Python 3 would give me an excuse to start using Python 3 for new
> code. Keeping these separate would be more difficult if the lowercasing were
> done under the same "Bio" namespace.
>
> Thoughts?

As noted above, I'm on board with planning a Biopython 2 requiring Python 3
or later. I would regard this as effectively be forking from the current code
base, porting individual modules on a case by case basis (doing a final 2to3
conversion manually as part of this). The code could be shared as a series
of 'alpha' level releases for early testing - assume we want to make some
releases, particularly for Windows where fewer potential testers would
have all the compilers setup to follow the repository.

However, if we do that, we would still support Biopython 1.xx under
Python 3 as well (via 2to3 as we are now, currently 'beta' level support)
for some time in parallel (although likely not getting major new features -
just bug fixes and if required updates for format changes).

Is there enough enthusiasm now to start planning what we'd change for
a (potentially Python 3 only) Biopython 2 yet?

Peter