[Biopython] (no subject)

Fri May 29 10:43:21 EDT 2026

*Subject:* Reproducibility and evaluation instability in clinical ML
pipelines using Biopython-based workflows
------------------------------

Hi Biopython team,

I’m reaching out as a user of Biopython in the context of biomedical data
processing pipelines for clinical machine learning research.

While developing a reproducible ICU prediction pipeline using
MIMIC-IV-derived datasets (https://github.com/netanelcyber/PenuX), we
encountered an interesting observation that may be relevant to the broader
bioinformatics community.

Although Biopython is primarily used for sequence and molecular data
workflows, we found it useful as part of a broader preprocessing and
integration pipeline alongside clinical datasets and downstream ML models.
------------------------------
Observation (methodological, not tool-specific)

Across multiple modeling experiments, we observed that:

   - standard performance metrics (e.g., AUROC) remain relatively stable
   across implementations
   - however, model reliability varies significantly under:
      - temporal validation vs random splits
      - different preprocessing strategies
      - subgroup stratification and missing-data regimes

These effects appear to be *evaluation-design dependent rather than
model-dependent*, and raise broader questions about reproducibility in
biomedical ML pipelines.
------------------------------
Why this may be relevant to Biopython users

Even though Biopython is not directly responsible for clinical ML
evaluation, many real-world pipelines combine:

   - biological data processing (where Biopython is used)
   - clinical datasets (e.g., MIMIC-IV)
   - downstream predictive modeling

This creates a gap where upstream reproducibility (sequence/biological
processing) is strong, but downstream evaluation protocols may still
introduce instability.
------------------------------
Question to the community

I would be interested to hear whether others in the Biopython community
have encountered:

   - reproducibility issues when Biopython pipelines are integrated into
   larger ML systems
   - challenges in maintaining consistency across downstream evaluation
   setups
   - best practices for ensuring pipeline-level reproducibility beyond
   sequence processing itself

------------------------------
Context

Project reference (for reproducibility context only):
https://github.com/netanelcyber/PenuX
------------------------------

Thank you for your work on Biopython — it remains a foundational tool in
computational biology and bioinformatics workflows.

Best regards,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20260529/3e4d4ce1/attachment.htm>