[Biopython-dev] Ideas for Biopython 2.0

Wibowo Arindrarto w.arindrarto at gmail.com
Fri Jun 30 10:05:48 UTC 2017


Hello everyone,

> So, in theory, you can end up with dozens of extensions that are out of
sync with the core in terms of 'core' dependencies and it's up to the
maintainers of those extensions to keep them up-to-date. That's my main
concern. We have a monolith that works. It is conservative and that's
exactly why it works. Otherwise, for example, I would install the core and
your Cython-based lightining fast PopGen code and then you grow bored of
it, a new version of Cython comes along that has some incompatibilities
with your code, and now it doesn't work. Worse even, I release a blinding
fast PDBParser using a new feature in Cython and now you have two
incompatible extensions.

Yes, that is one concern that the modular setup can introduce. It is a
trouble when you want to use two non-core extensions which depend on an
incompatible versions of the core module. This applies if your application
wants to make use of both non-core modules.

However, I would argue that in that case, it would be easier for you push
for a change in the PopGen module (taking on your example). You would have
less code to look at (since you only want to change PopGen) and less person
to convince that your changes are worth a new release (which in this case
is Tiago, the PopGen maintainer). It would also be easier for you to fork
PopGen and maintain further, if you wish to do so. All of this boils down
to PopGen being smaller and, to a larger degree than it is now, independent
from the rest of Biopython.

Contrast this with what happens in the monolith setup: PDBParser is stuck
using a suboptimal parser because another module is not updated. Yes, there
is consistency in a sense that everything works together. But this is
consistency by the lowest common denominator: all modules need to adhere to
the oldest (and probably least maintained) module. Not to mention the
increased burden to the core developers when a maintainer for a module
decides to spend less time maintaining.

If you are interested in using non-compatible core modules inside a
pipeline (so both modules are used at different steps in the pipeline),
there are various pipeline frameworks and/or containers today that provide
more granular level of isolation. In other words, you can run incompatible
biopython versions in the same pipeline.

> I agree that we should 'refresh' our dependencies to be able to do cool
things. And modularity is incredibly attractive. But the main advantage of
modularity - lowering the standards of biopython code - is also, in my
opinion, it's main disadvantage.

I would say it actually encourages people to contribute more :). And yes,
this may come at a cost of some biopython non-core modules having 'lower
quality' than other modules. But I think that is not necessarily bad.
Useful modules with good documentation and good code quality are more
likely to be used.

> Question: How is scikit handling their different modules?

They have different teams for the core package and the toolkits (modules),
as far as I know [https://www.scipy.org/scikits.html]. It seems that anyone
can register a module with the scikit namespace (at least I could not find
any specific mention of the toolkit development requiring permission from
the core team). Even the license of the toolkit can be different. (slightly
unrelated, but there is also a scikit-bio package: http://scikit-bio.org/).

(P.S. just so that everyone is on the same page, we actually do have a
GitHub ticket on this modularization proposal here:
https://github.com/biopython/biopython/issues/349 ~ there were already some
discussion as you can see).

Best regards,
Bow

On Wed, Jun 28, 2017 at 7:10 PM João Rodrigues <
j.p.g.l.m.rodrigues at gmail.com> wrote:

> So, in theory, you can end up with dozens of extensions that are out of
> sync with the core in terms of 'core' dependencies and it's up to the
> maintainers of those extensions to keep them up-to-date. That's my main
> concern. We have a monolith that works. It is conservative and that's
> exactly why it works. Otherwise, for example, I would install the core and
> your Cython-based lightining fast PopGen code and then you grow bored of
> it, a new version of Cython comes along that has some incompatibilities
> with your code, and now it doesn't work. Worse even, I release a blinding
> fast PDBParser using a new feature in Cython and now you have two
> incompatible extensions.
>
> Being very honest, there is no barrier to anyone developing anything new
> here. If you want to use Cython, go ahead and do it and then worst case
> scenario, we have a dependency warning and check that can simply skip
> compilation of that code. It's what happens with Numpy at the moment IIRC.
> You can also bundle the c code, instead of compiling the pyx on install,
> and compile it regularly using GCC, skipping Cython altogether.
>
> I agree that we should 'refresh' our dependencies to be able to do cool
> things. And modularity is incredibly attractive. But the main advantage of
> modularity - lowering the standards of biopython code - is also, in my
> opinion, it's main disadvantage.
>
> Question: How is scikit handling their different modules?
>
> 2017-06-28 8:05 GMT-07:00 Tiago Antão <tiagoantao at gmail.com>:
>
>> It can see plenty of issues where it could help. In my specific case all
>> the PopGen code is stopped for 10 years because I would need to write very
>> fast code (say in Cython). This would be an extension module, not a core
>> module because it would impart a very big dependency on the system.
>>
>> Modules would allow a core with very strict policies and dependencies
>> _but_ extensions that could be way more relaxed.
>>
>> It would also lower the barrier of entry for new content. Everyone could
>> publish an extension. If the extension would survive time (which most do
>> not - creating a maintenance burden in the core) then it could eventually
>> be made a core extension. Now the policy in practice is to add very little
>> innovation out of the fear that it will become stagnant and not-supported
>> by the main author (say after publication). An extension system would
>> accommodate both innovation whereas preserving the core quality.
>>
>> Currently we have a gigantic monolith that in practice imposes very
>> conservative technologies and changes. I suspect that is why we do not see
>> anything really exciting with Biopython for the better part of the last
>> decade,
>>
>> On 28 June 2017 at 04:25, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>>
>>> I agree with Joao here. I don't see an immediate and overriding problem
>>> that modularity would solve, and I can see many drawbacks.
>>>
>>> Best,
>>> -Michiel
>>>
>>>
>>> On Monday, June 26, 2017 11:03 AM, João Rodrigues <
>>> j.p.g.l.m.rodrigues at gmail.com> wrote:
>>>
>>>
>>> Copied from the other thread where I mistakenly posted:
>>>
>>> I think we should focus on other topics such as modularity. What do the
>>> proponents of the said modularity say about it? What are its advantages? I
>>> personally think a big disadvantage is that with one package install you
>>> get a wide array of tools for a variety of subjects. With a constellation
>>> of modules you might end up with an up-to-date core and an out-of-date lone
>>> module somewhere, which makes things much much harder not only to maintain
>>> but also to debug in case of issues.
>>>
>>>
>>> _______________________________________________
>>> Biopython-dev mailing list
>>> Biopython-dev at mailman.open-bio.org
>>> http://mailman.open-bio.org/mailman/listinfo/biopython-dev
>>>
>>>
>>>
>>
>>
>> --
>> Tiago Antao
>> Scientific and HPC programmer
>> http://tiago.org
>> https://github.com/tiagoantao/
>>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20170630/6727011f/attachment.html>


More information about the Biopython-dev mailing list