[Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone

Wibowo Arindrarto w.arindrarto at gmail.com
Sun Jan 27 05:52:15 EST 2013


Hi Michiel, everyone,

>> That's why Bio._utils is a private module - we can
>> drop/change/etc this without worrying about breaking
>> other people's code. The issue with Bio.ParserSupport
>> is it was a public API.
>
> Its API being public was not the problem -- we have deprecated and removed lots of public modules over the years.
>
> The problem with Bio.ParserSupport was twofold. First, it ended up making parsers more complex and difficult to understand for people not familiar with Bio.ParserSupport, in particular for newcomers and users trying to fix a bug. So Bio.ParserSupport never made us really happy. As a case in point, Bio._utils was created rather than reusing the code in Bio.ParserSupport.
>
> The second problem was that many modules were using bits and pieces of Bio.ParserSupport, so we could not drop or change Bio.ParserSupport easily. Bio.ParserSupport has been officially obsolete but not deprecated for years.
>
>> That's why Bio._utils is a private module - we can
>> drop/change/etc this without worrying about breaking
>> other people's code.
>
> Let's drop it.

My initial intention of refactoring and adding some new code to
Bio._utils was to reduce code repetition. I intended it (and perhaps
we should make it explicit in its docstrings) to be a collection of
small, useful functions that may be used in various cases.

Some examples inside include several string-formatting functions, each
of them independent of the other. There's also a general function for
running doctests
(https://github.com/biopython/biopython/blob/master/Bio/_utils.py#L100),
which was written because there was a lot of repetitive code in
different submodules basically doing the same thing (looking up the
test directory, running the test). I feel quite strongly that this
doctest function is required by many current (and future modules)
across Biopython, so it makes sense to refactor them out into a root
namespace.

All of this seems different from Bio.ParserSupport, which attempts to
be a one-single solution for writing new parsers (only parsers). Given
the wildly incoherent nature of different file output formats, it's
not surprising that Bio.ParserSupport's code base has to be quite
complicated to accomodate all of them. Naturally it has many related
parts and functions, and understanding them all is much harder than to
understand the small functions in Bio._utils (in my experience).

So for now, I think it is still ok if we use Bio._utils. Perhaps, in
light of this discussion, we should make it explicitly clear that it's
only for containing general, small, utility functions instead of
containing one 'support framework' (e.g. ParserSupport) to avoid
future unhappiness.

Cheers,
Bow



More information about the Biopython-dev mailing list