[Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone

Eric Talevich eric.talevich at gmail.com
Mon Jan 28 05:59:14 UTC 2013


On Sun, Jan 27, 2013 at 5:52 AM, Wibowo Arindrarto
<w.arindrarto at gmail.com>wrote:

> Hi Michiel, everyone,
>
> >> That's why Bio._utils is a private module - we can
> >> drop/change/etc this without worrying about breaking
> >> other people's code. The issue with Bio.ParserSupport
> >> is it was a public API.
> >
> > Its API being public was not the problem -- we have deprecated and
> removed lots of public modules over the years.
> >
> > The problem with Bio.ParserSupport was twofold. First, it ended up
> making parsers more complex and difficult to understand for people not
> familiar with Bio.ParserSupport, in particular for newcomers and users
> trying to fix a bug. So Bio.ParserSupport never made us really happy. As a
> case in point, Bio._utils was created rather than reusing the code in
> Bio.ParserSupport.
> >
> > The second problem was that many modules were using bits and pieces of
> Bio.ParserSupport, so we could not drop or change Bio.ParserSupport easily.
> Bio.ParserSupport has been officially obsolete but not deprecated for years.
> >
> >> That's why Bio._utils is a private module - we can
> >> drop/change/etc this without worrying about breaking
> >> other people's code.
> >
> > Let's drop it.
>
> My initial intention of refactoring and adding some new code to
> Bio._utils was to reduce code repetition. I intended it (and perhaps
> we should make it explicit in its docstrings) to be a collection of
> small, useful functions that may be used in various cases.
>
> Some examples inside include several string-formatting functions, each
> of them independent of the other. There's also a general function for
> running doctests
> (https://github.com/biopython/biopython/blob/master/Bio/_utils.py#L100),
> which was written because there was a lot of repetitive code in
> different submodules basically doing the same thing (looking up the
> test directory, running the test). I feel quite strongly that this
> doctest function is required by many current (and future modules)
> across Biopython, so it makes sense to refactor them out into a root
> namespace.
>

Interesting discussion.

It's worth considering why some functions are being used in multiple parts
of the code base. In some cases there are essentially shortcomings in the
Python standard library or issues with
cross-platform/cross-implementation/backward compatibility that would
require us to use *exactly* the same code each time a certain recurring
problem is encountered. The Bio._py3k and Bio.File modules makes sense for
this reason, I think, and before we deprecated Py2.4 it would have been
helpful to have shared code for importing ElementTree (both the uniprot-xml
and phyloXML parsers used the same half-page tangle of attempted imports).

So, maybe the doctest helpers should go in a new module specific to that
topic.

In other cases there's a recurring need in separate modules, but (a) it's
short and simple enough to write the solution from scratch each time where
it's needed, and so isn't enough of a maintenance concern to offset the
convenience of having all the relevant code in one place; and/or (b) the
needs of different modules aren't exactly the same, merely similar, leading
to a proliferation of options in the shared function and the situation that
a simpler implementation would have worked for any given module.

The point is that just as there's a maintenance cost to having duplicated
code in multiple places, there's a maintenance cost to having dependencies
between multiple modules even within the same project, and the value of a
new module ought to be greater than the cost it imposes.

Best,
Eric



More information about the Biopython-dev mailing list