[Bioperl-l] Priorities for a bioperl-1.6 release

Sendu Bala bix at sendu.me.uk
Tue Mar 18 11:32:25 EDT 2008


aaron.j.mackey at gsk.com wrote:
>> Or is the split intended to be 'core' == "anything and everything
>> that was in 1.4", '????' == "everything else"? In which case,
>> what's a good name for "modules created after 1.4"? 'crust'? ;)
> 
> Nah, "icing".
> 
> a module "use" map might be very useful to help identify "core" vs.
> other layers of mantle/crust/icing.
> 
> http://www.perlmonks.org/?node_id=87329 
> http://search.cpan.org/src/NEILB/pmusage-1.2/

Thanks for those. Neither could quite cope with BioPerl, but I've munged
them together and hacked up 'module_usage.pl' which I've just committed
to the maintenance directory of bioperl-live.

module_usage.pl ../Bio

Produces:
  *warning, may crash your browser; download it and view in a dedicated
image viewer*
http://bix.sendu.me.uk/files/module_usage.jpeg
http://bix.sendu.me.uk/files/module_usage.txt

First I considered what modules each BioPerl package (aka class, module)
'uses' (what modules does it load via 'use', 'require' or inherit from
via 'use base', excluding external (non-BioPerl) modules), then grouped
together packages that have identical usage. The graph shows all the
groups with more than one member as nodes and edges from them pointing
to the individual packages that they use. The set of those individual
packages pointed to by groups also have edges showing their
use-relationship to other members of the set (only). Members of the set
are also shaded in red. The saturation of the shade indicates how many
packages use that package (so dark red packages are used a lot).

(I had to simplify in this way because otherwise GraphViz bailed on me.
If anyone can come with nicer simplification/visualisation systems,
please do! It's important to note that there is lots of information loss
in my scheme, so you can't rely on the graph alone.)

Getting to the question on how to decide what is 'core' and on what
basis to split things up, first consider the darker red packages. Next
consider how many groups point to it. Finally consider the membership of
those groups: are they all highly related, or are they from different
'parts' of BioPerl?

For example, Bio::Graphics::Glyph::generic is dark red and has 3 groups
pointing to it, but all the members of those groups are
Bio::Graphics::Glyph*. You could imagine that Bio::Graphics::Glyph (or
Bio::Graphics?) could be split off cleanly if desired and not kept in
core. Bio::SimpleAlign, on the other hand, whilst not being quite as
dark a red, has 7 attached groups with members from Bio::AlignIO,
Bio::Search and Bio::Tools. You could easily argue it is more
fundamental to BioPerl and should be in core. In turn, the things that
Bio::SimpleAlign points to would also have to be in core.

I haven't done any full analysis along these lines and leave as an
exercise for the interested reader for now ;)


Chris Fields wrote:
> http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules
> 
> I'm pretty flexible on any of that; it's a proposal only and I think
> some of it may be wrongheaded, but hey, I'm willing to take a few
> rotten tomatoes.  The key issue is we should try to work out what we
> mean by 'core' or the core library.  I have a rather extreme view of
> it as being the bare essentials without external, non-perl core
> dependencies (only SeqI/PrimarySeqI, AlignI, AnnotationI, SeqFeatureI
> and required modules for those classes) but I'm sure others would
> lump in parsers, DB functionality, etc.  I basically suggest placing
> those (and any stable but potentially non-core code) in a
> 'bioperl-main', with any unstable or untested code going into a
> 'bioperl-unstable'.

My thoughts are along these lines:
# I agree that core should have no external dependencies
# I agree that it might mostly be interfaces
# It should represent a framework with all the interfaces (that have
   stable APIs), directory structure and base classes that everything
   else relies on
# It might not do much useful bioinformatics, but provides just about
   everything needed for a dev to create a new module that does


> In essence, bioperl-main would require core and resemble a stable
> release; bioperl-unstable would require bioperl-main (and core) and
> resemble a dev release.  Not sure how versioning would go or if this
> is a viable option at all, but it's worth discussing.

# I agree that this 3-way split seems reasonable
# bioperl-main would consist primarily of the 'leaves' of the module
   tree, mostly parsers and the like which, whilst 'stable' and tested
   should still be split away from core because the data sources they
   parse could change format slightly
# bioperl-unstable, better bioperl-bleed, would feature brand-new
   stuff, be it new parsers for totally new formats, new APIs that do
   something not thought of before etc. When they are complete, bug-free
   and have stood the test of time they get moved into bioperl-main.
   (It is not a place for all new commits; bug fixes to something in
   bioperl-main would be committed to bioperl-main)
# The current splits (bioperl-run, bioperl-network etc.) do not get
   their own core and bleed variant. Anything they need for core
   functionality would enter the single bioperl-core, anything new
   would enter the single bioperl-bleed, and anything stable would
   be in their own bioperl-[package]

Discuss :)


More information about the Bioperl-l mailing list