[Biopython] A possibility for speeding up FASTA/FASTQ reading in BioPython
Peter Cock
p.j.a.cock at googlemail.com
Mon Nov 24 04:41:06 EST 2025
Hello Terry,
I just posted a blog about my thoughts on receiving generative AI
contributions as an Open Source project maintainer:
https://blastedbio.blogspot.com/2025/11/thoughts-on-generative-ai-contributions.html
I am sceptical, and in this case adding a Rust dependency to Biopython
seems too much to ask. I think you could get similar performance gains
with C (which we do use) where at least the maintainers have some
experience. However, even there, gains may not make the additional
complexity and maintenance burden worthwhile.
Thank you for writting and asking, rather than suprising everyone with
a large pull request.
Peter
P.S. Cross reference https://github.com/biopython/biopython/pull/5085
On Tue, Nov 11, 2025 at 10:00 PM Jones Kelly, Terence Carleton
<terence.jones at charite.de> wrote:
>
> Hi all
>
> I regularly process reasonably large FASTQ (hundreds of billions of sequencing reads) and FASTA files using BioPython. For some years I've been meaning to implement a FASTQ/FASTA reader in a compiled language and add Python bindings to improve the speed. I could've done this in C but I spent some decades writing C and I wanted to learn something new, so I considered a few languages. Because Rust makes it very easy to create Python bindings, I decided to give it a try. I thought I'd get going by asking the Claude CLI to write me some Rust. That turned out to be a much, much better experience than I had anticipated. With Claude I played with several implementations, keeping track of timing. Claude also wrote some tests. To compare what I was seeing I got Claude to write a pure Python version, a pure C version, Python bindings to the C, and to create a benchmark suite. From what I can tell, the Rust/Python (and the C/Python) FASTA reading is twice as fast as BioPython and FASTQ reading is four times as fast. I didn't write a single line of code. I just did some minimal cleaning up when things were already far along. I've been using the code for the last month or two with no problems.
>
> The repo is at https://github.com/VirologyCharite/prseq (prseq = Python/Rust for sequences). You'll find the benchmark results on that page. There are still some small things I would adjust in the API. BTW, Claude also wrote the README (which should definitely be improved).
>
> I am wondering if there might be interest in incorporating this into BioPython. I don't know if there are any Rust dependencies in BioPython but I know that there are some C extensions. We could use either, as their speeds are comparable. If there's interest, I'd be happy to help (or to do it all, after some discussion and maybe with some guidance).
>
> Thanks very much for all the work on BioPython. It's really been a pleasure to use the code over the last dozen years or so.
>
> Terry Jones
>
>
> _______________________________________________
> Biopython mailing list - Biopython at biopython.org
> https://mailman.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list