[Bioperl-l] BP split progress and rationale

Brian Osborne bosborne11 at verizon.net
Wed Jun 1 12:37:57 UTC 2016


Mark,

I don’t understand. Last year I put Bio::Root* back into bioperl-live, to simplify installation. Now we are splitting again?

IMO Bio::Base/Bio::Root and Bio::Seq*/Bio::SeqIO* should never be separate. Generally people install BioPerl to get IO and basic Sequence object functionality. Why would Bio::Root (always required) be separate from things like Bio::Seq and SeqIO (always requested)?

Simplicity, please. BioPerl has very few people actively engaged these days, and the numbers there are steadily dropping. Everything we do should be geared towards simplicity and efficiency. Another example: SeqFeature and Annotation. Why separate them? They are almost always used together.

Then there’s the maintenance, and documentation. Please don’t take this personally MAJ, this business about splitting everything up is an old idea, an unquestioned assumption. Time to re-consider it.

Brian O.



> On Jun 1, 2016, at 1:06 AM, Mark A. Jensen <maj at fortinbras.us> wrote:
> 
> All,
> 
> I've made some significant progress towards a BP split. I know there have been several tries, but I'm willing to take this one to an actionable endpoint with YAPC::NA 2016 as a goal date for action.
> 
> I have built a graph of all the module dependencies (parent-child and horizontal) in Neo4j, and have been using this to design module groupings that encompass functional areas and also have hierarchical group dependencies such that the dependencies between groups are minimized. I'm calling the groupings "packages".
> 
> I am using the loose convention that "monophyletic" packages (groups of modules that fall within a namespace) are named after the namespace, and "polyphyletic" packages are named "BioPerl::<functional name>". The following packages are currently pretty solid. The descriptions indicate mainly what is encompassed by the contained modules, not rules for membership.
> 
> BioPerl::Base - Bio::Root::*, general design pattern helpers (i.e., many Bio::*I, Bio::Factory::*, Build helper classes.)
> 
> BioPerl::Sequence - Bio::Seq, Bio::SeqIO, and SeqIO drivers that can do without annotations (e.g., fasta)
> 
> BioPerl::Alignment - alignment objects and parsers
> 
> BioPerl::Annotation - most annotation modules
> 
> BioPerl::SeqFeature - most SeqFeature modules
> 
> BioPerl::Tree - most Tree related modules
> 
> BioPerl::DB - Most Bio::DB::*, Bio::Das interfaces
> 
> BioPerl::Search - The blast parsing and tiling
> 
> There are quite a few more. Examples of the logic: BioPerl::Base contains all of its dependencies. BioPerl::Sequence requires only BioPerl::Base to satisfy all its BP dependencies. BioPerl::Alignment requires BioPerl::Base and BioPerl::Sequence. BioPerl::Search requires Base, Sequence, and SeqFeature. And so on.
> 
> With a structure like this, a user who just needs Bio::PrimarySeq and Bio::SeqIO to read some fasta files can get away with installing BioPerl::Base and BioPerl::Sequence, about 141 modules, as opposed to the full 805 modules, including that broadly useful one "Bio::DB::HIV::HIVQueryHelper".
> 
> Once finished, I'll propose setting many of the namespaces free as separate CPAN packages - Bio::Restriction, Bio::DB::HIV, and others. These can be packaged with their appropriate BioPerl::* prerequisites in the metadata. I expect this will allow natural selection to operate much more efficiently on the obsolete modules.
> 
> I will set up CPAN::Meta compliant metadata for everything.
> 
> I have more thoughts but this is already too long.
> 
> MAJ
> 
> 
> 
> 




More information about the Bioperl-l mailing list