[Bioperl-l] BioPerl long-term, was Re: dependencies on perl version

Fields, Christopher J cjfields at illinois.edu
Wed Feb 6 22:11:06 UTC 2013


George,

Should put your post on a pedestal :)

tl;dr version: I completely agree, but we need help in order to do this.

Long(-winded) version:

I agree completely, backwards compatibility is killing us.  But, we do need current and new people to get involved and help drive this forward.  We need people on all fronts, from coding and bug fixes to documentation and web site maintenance.  I've been driving this bus for a number of years now.  Not getting tired yet, but I am getting substantially busier with my current endeavors, so my time spent working on BioPerl has dwindled considerably.  Any additional support or sharing of responsibilities will help tremendously in keeping up momentum (if someone else wants to take the wheel for a bit, please let me know :).  

If we follow the perl release route, we should streamline the release process (think Dist::Zilla), end support of older versions of Perl, and work on a sustainable release schedule.  The fact that we have so many of us so-called 'old folks' speaking up in favor of this is a very good sign.  We do need a bit more than that; we need help.  BioPerl is a very large project.

A key point we need to address, which is very important for the future of BioPerl.  I use Perl quite a bit in my current work (dabble with Ruby and Python as well when I have to).  BioPerl?  A little, but not as much as I could.  

Shocked?  The main three reason I don't use it 'in anger':  performance, performance, and performance.  It is very important that we make a concerted effort to address this at all levels.  It could be as simple as completely separating parsing from object creation (where the bulk of performance problems seem to lie, but not all of them).  

A specific example: Heng Li once tested the performance of FASTQ parsing (perl, python, bioperl, biopython, his C code, etc). BioPerl's FASTQ couldn't even be measured; IIRC it went on for many hours until he killed it.  This was with the older version of the parser, but I'm willing to bet the newer one I wrote isn't any better.

This. needs. to. change.

I see no problem in stating any generic parsing and low-level interfaces are just as much a part of what BioPerl encompasses as the higher-level Bio::* classes themselves.  Steve and Jason were on to something with SearchIO; it's maybe not as performant as we would like, but it certainly is more flexible in terms of what can be done, b/c it separates out low-level parsing from object creation.  That's the general model we should look at.  There is a good reason Biopython is following this model with their SearchIO implementation (Peter C, are you reading this?)

We have a lot of very talented people involved with this project, both on the purely computational and purely biological end as well as the folks like me who straddle the two domains.  A lot of good code out there that can be used, wrapped, taken advantage of, including everything we currently have in BioPerl.  Let's come up with something that both works and works well, that people can use on a regular basis, even at a low level if they choose.  That alone would dissuade new users from writing up (yet another) custom FASTA/FASTQ/BLAST/GenBank/etc parser b/c the BioPerl one takes millennia to finish.  

A few examples on this front: Rob Buels created a generic parser for GFF3 (Bio::GFF3::LowLevel) with very few dependencies, we wrap this with the newer Bio::FeatureIO code.  Leon has Bio::SFF.  Lincoln of course wrote Bio::DB::Sam and Bio::DB::BigFile.  I have started a wrapper around Heng's FASTQ/FASTA parsing code (kseq), it seems to work quite well (~20M FASTQ in 30 sec last I recall?).  

So:

If it means targeting performance, backwards-compatibility be damned (using Devel::NYTProf?), we do that.

If it means creating a new Bio-NGS repo to focus some of these efforts, so be it.

If it means we get away from the Java-based interface stuff in favor of something more Perl-like (roles anyone?), then I'm all for it.

If it means we modularize BioPerl so this can be done, well, you probably know where I stand (yes).

If it means this is to be BioPerl 2.0, then let's move that direction, sooner than later.

But I can't do it alone.  We (not just me, but we) need to drive the direction we take.

First one who codes gets the gold ring.

chris

On Feb 6, 2013, at 12:47 PM, Aaron Mackey <amackey at virginia.edu>
 wrote:

> Huzzah!
> 
> --
> Aaron J. Mackey, PhD
> Assistant Professor
> Center for Public Health Genomics
> University of Virginia
> amackey at virginia.edu
> http://www.cphg.virginia.edu/mackey
> 
> 
> On Wed, Feb 6, 2013 at 12:58 PM, George Hartzell <hartzell at alerce.com> wrote:
> Fields, Christopher J writes:
>  > [...]
>  > Right, it took ~8 yrs to go from 5.8 to 5.10.  I'd like to point
>  > out that Python users are in the same boat: the Python version for
>  > CentOS 5 is 2.4.3, and Biopython requires a minimum of python 2.5
>  > (and recommends python 2.7).
>  >
>  > We can always state that perl 5.8 is supported for the upcoming
>  > Bioperl release, but we're dropping v5.8 support for any future
>  > releases.
> 
> Do more than drop support for 5.8.
> 
> The Perl community has put a transparent and predictable process in
> place for releasing [generally] better versions of the language.  It
> means that Perl has a chance of continuing to be relevant, attracting
> new talent and actually *fixing* some of the s&%t that gives Perl a
> bad rap.  It gives people something to plan around, no one should be
> surprised that v 5.X.Y is coming out in mid 20ZZ.
> 
> BioPerl should do the same thing, declare a release policy that trails
> along with the Perl release schedule.  Keep it simple and no one can
> argue with it.  Support Perl releases as long as the releases
> themselves are supported.
> 
> Rather than expending energy supporting out of date platforms, put the
> energy into being modern (or Modern...), better distro building and
> packaging, testing, documentation and releasing so that the process of
> staying current is painless.
> 
> Look forward.  Keep it interesting and fun.
> 
> Everyone running Mac OS 9 on their Pismo, raise your hand.  Anyone
> make their living running sequencing gels in Plexiglas doohickeys on
> their lab bench?
> 
> I'm not suggesting that the BioPerl community is free to make
> arbitrary and capricious changes that makes it difficult for *anyone*
> to get anything done.  Churn is a waste of time.
> 
> But why should the all-volunteer BioPerl community be stuck supporting
> code from 12 years ago because it's cost effective for someone else to
> avoid spending *their* $/time/people to stay up to date.
> 
> Those sites that value stability/maturity/stagnation so highly have
> already accepted the cost/difficulty of nailing one of their feet to
> the floor as they try to run forward.  They recognize and depend on
> the benefits of having that stable base but generally they've also
> accepted the costs associated with their restrictive choices.  They
> know how to pull in separate kernel/driver updates so that they can
> actually run on nearly modern hardware.  They know, and live with, the
> fact that they're not going to have access to the shiny new stuff.
> And they know how to stay up to date, when they need to, with the
> software that their users need to be competitive (e.g. BioConductor
> and R).
> 
> As long as (if/when...) updating a BioPerl release is something that
> can reliably happen with a few cpanm invocations then the sites that
> otherwise favor punctuated equilibrium will learn to handle gradual
> change.
> 
> Those folks that are "stuck" on older releases always have the option
> of supporting professional Perl programmers to keep older releases
> going, backport changes, etc....  They're already buying support for
> their platforms (or freeloading and coping), let them put bread on the
> table at one of the bioinformatics consultancies or labs if they have
> something special they need.
> 
> Have fun.  Use sharp tools.  Do cool science.  Build cool things.  No
> one is paying you to be backwards compatible with the previous
> millennium.
> 
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 





More information about the Bioperl-l mailing list