[Biopython] A possibility for speeding up FASTA/FASTQ reading in BioPython

Dan Bolser dan.bolser at outsee.co.uk
Tue Nov 18 09:37:51 EST 2025


Great work! Sounds fantastic!

Can you (or Claude!) create a PR?

On Tue, Nov 11, 2025, 10:00 PM Jones Kelly, Terence Carleton <
terence.jones at charite.de> wrote:

> Hi all
>
> I regularly process reasonably large FASTQ (hundreds of billions of
> sequencing reads) and FASTA files using BioPython. For some years I've been
> meaning to implement a FASTQ/FASTA reader in a compiled language and add
> Python bindings to improve the speed. I could've done this in C but I spent
> some decades writing C and I wanted to learn something new, so I considered
> a few languages. Because Rust makes it very easy to create Python bindings,
> I decided to give it a try. I thought I'd get going by asking the Claude
> CLI to write me some Rust. That turned out to be a much, much better
> experience than I had anticipated. With Claude I played with several
> implementations, keeping track of timing. Claude also wrote some tests.
> To compare what I was seeing I got Claude to write a pure Python version, a
> pure C version, Python bindings to the C, and to create a benchmark suite.
> From what I can tell, the Rust/Python (and the C/Python) FASTA reading is
> twice as fast as BioPython and FASTQ reading is four times as fast. I
> didn't write a single line of code. I just did some minimal cleaning up
> when things were already far along. I've been using the code for the last
> month or two with no problems.
>
> The repo is at https://github.com/VirologyCharite/prseq  (prseq =
> Python/Rust for sequences). You'll find the benchmark results on that
> page.  There are still some small things I would adjust in the API.  BTW,
> Claude also wrote the README (which should definitely be improved).
>
> I am wondering if there might be interest in incorporating this into
> BioPython. I don't know if there are any Rust dependencies in BioPython but
> I know that there are some C extensions. We could use either, as their
> speeds are comparable. If there's interest, I'd be happy to help (or to
> do it all, after some discussion and maybe with some guidance).
>
> Thanks very much for all the work on BioPython. It's really been a
> pleasure to use the code over the last dozen years or so.
>
> Terry Jones
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at biopython.org
> https://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20251118/af15e4a9/attachment.htm>


More information about the Biopython mailing list