[Biopython] Generative AI policy for contributions to Biopython
Michiel de Hoon
mjldehoon at yahoo.com
Mon Apr 27 09:22:39 EDT 2026
> How do you feel about the communication side (eg insisting on human> written commit messages and pull request interactions)?
Yes, communication should be by a human. Sometimes the issue with PRs is not so much the code itself, but how it fits in with the overall structure of Biopython, and the direction in which Biopython is heading. Those can be judgement calls, which cannot easily be made by AI.
-Michiel
On Monday, April 27, 2026 at 07:53:50 PM GMT+9, Peter Cock <p.j.a.cock at googlemail.com> wrote:
That sounds similar to the new Linux policy (linked to earlier). I'm
not convinced this is the right choice, but it is a pragmatic stance.
Would you like to try drafting a policy (perhaps as a draft pull
request editing at least the CONTRIBUTING file and the pull request
template)?
How do you feel about the communication side (eg insisting on human
written commit messages and pull request interactions)?
Peter
On Sat, Apr 25, 2026 at 12:44 AM Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> I am in favor of accepting code generated by AI in principle, as long as whoever submits the code is responsible for the contributed code.
> Key points are:
> - Contributed code (whether generated by AI or not) must be understood by the PR submitter;
> - The PR submitter takes responsibility for guaranteeing that the contributed code is free of license restrictions, and therefore can be released under Biopython's license. But the same requirement holds for any code contributed to Biopython, not just AI-generated code.
>
> Best,
> -Michiel
> On Friday, April 24, 2026 at 07:16:23 PM GMT+9, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
>
> Dear Biopythoneers,
>
> We need to set out a generative AI policy for contributions to Biopython.
>
> There are now multiple recent PRs submitted by new contributors which
> are openly using AI tools, more that I suspect are, and now even AI assisted
> PRs from past contributors (where CV padding or other external metrics
> are unlikely to be driving this). These are generally more work to review
> than human written PRs, and that is a growing issue.
>
> I blogged about my views late last year - ending in the line "Right now, I
> still lean very much to saying no any PR using generative AI".
>
> https://blastedbio.blogspot.com/2025/11/thoughts-on-generative-ai-contributions.html
>
> Things will change (both tool capabilities, but also the social and legal
> interpetations) but that post still describes my views today - note I did
> not touch on the topic of communications there (see below).
>
> Recently Linux adopted what has been described as a balanced stance
> treating it as a tool with very clear expectations that usage MUST be declared
> and that the human submitter is responsible for (quoting these four points):
>
> * Reviewing all AI-generated code
> * Ensuring compliance with licensing requirements
> * Adding their own Signed-off-by tag to certify the DCO
> * Taking full responsibility for the contribution
>
> https://docs.kernel.org/process/coding-assistants.html
>
> That is pragmatic but ignores the legal and ethical minefield. We don't
> have a Developer Certificate of Origin (DCO), but I think the other
> points are a bare minimum for any Biopython policy.
>
> Most of my personal open source projects have only had a very small
> number of contributors, and I am comfortable with outright rejecting
> generative AI. I know some of the past/current Biopython contributors
> are more willing to embrace this technology though - so I doubt support
> for a simple ban would be unanimous.
>
> Speaking for a moment as the current Open Bioinformatics Foundation
> president, the board has discussed this and agreed not to try to micro
> manage the member projects. For reference, BioPerl have started
> https://github.com/bioperl/bioperl-live/issues/407 which has some
> excellent points and examples to consider.
>
> In particular, this is not just a code or documentation changes issue - but
> also about the communication around any proposed change: the nature
> of the commit messages, pull request description, and discussion. This
> ties into the maintainers' burden - many of our recent AI generated PRs
> have fairy short code changes but the verbose text is exhausting to read
> and unhelpful. It has sometimes felt like I have been talking to an AI agent
> rather than a human - I actually liked the feeling of mentoring a new
> contributor and guiding them through minor hurdles to getting their
> change accepted, but you lose that with an AI agent inbetween you.
>
> I therefore very much like this line from the curreth Codeberg policy:
>
> > All communication, that includes: commit messages, pull request
> > messages, documentation, code comments and issues (and
> > comments on issues/pull requests), that is intended to be read
> > by people to understand your thoughts and work must not have
> > been generated with AI. We exclude machine translation and
> > tooling that helps with grammar and spelling check.
>
> https://codeberg.org/comaps/Governance/src/branch/main/AI_USAGE.md
>
> Would anyone like to speak in defence of accepting AI (assisted) PRs,
> and suggest an existing policy you would be happy we adopt or base
> ours on?
>
> Or should I start drafting a more draconian but likely much shorter one -
> a few lines like this in the CONTRIBUTING file and/or PR template: No
> generative AI to be used in any Biopython contributions, with the exception
> of machine translation to/from English (where you might consider including
> your original language text as well).
>
> Thank you,
>
> Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20260427/ad6c3825/attachment-0001.htm>
More information about the Biopython
mailing list