[Bioperl-l] bioperl reorganization
Scott Cain
cain.cshl at gmail.com
Sat Jul 18 08:23:50 EDT 2009
Hi All,
I don't want to wade in too deeply, but I like the idea of splitting
things up. I think the Bio::Graphics split has gone well and has made
life easier in GBrowse world. I could see Bio::DB::SeqFeature and
Bio::DB::GFF being split and either being kept together or going there
separate ways (though I have a nagging suspicion that SeqFeature code
depends on GFF code in a few places, so it may make sense to just keep
them together.
And Chris, if it makes you feel any better, I don't think anything
you've done or not done has held up GBrowse2.
Scott
On Jul 17, 2009, at 11:14 PM, Chris Fields wrote:
> My 2c...
>
> On Jul 17, 2009, at 12:01 PM, Jason Stajich wrote:
>
>> Will try to weigh in more, a little bit of stream of consciousness
>> to let you know I'm thinking about it. Tough summer to focus much
>> on this.
>
> Yes, for me as well. That will change soon (approx two weeks) ;>
>
>> It's too bad we are apparently the laughing stock of Perl gurus,
>> but it would be great to see how to modernize aspects of the
>> development.
>>
>> I'm curious how it will work that we'll have dozens of separate
>> distros that we'll have a hard time keeping track of what directory
>> things are in? Will there have to be a master list of what version
>> and what modules are in what distro now?
>
> I don't think we're a laughingstock as much as we haven't had the
> time to dedicate towards this (and much of this occurred at a point
> early on, with that whole 'Cathedral and Bazaar' esr-based thingy).
> BTW,, those same gurus shouldn't speak: perl core is just as bad and
> riddled with worse bugs, though rgs and co. wouldn't admit it.
>
> In fact, base.pm itself has a nasty one; I'm surprised no one in the
> bioperl community has noticed it yet (it's listed as a bug on RT I
> think):
>
> pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print
> $Bio::SeqIO::VERSION."\n"'
> 1.0069
> pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print
> $Bio::Root::IO::VERSION."\n"'
> -1, set by base.pm
>
> Imported modules do not have VERSION set correctly when it is
> exported. This hasn't become an issue in bioperl yet (it's really
> an edge case), but several devs have run into this. And really, why
> set VERSION to a string like '-1, set by base.pm'?
>
> Anyway, re: versioning, the way I think about it, if we have a small
> very stable core with version X, and a focused very stable module
> group with version Y, other distributions would have a separate
> version and require subgroup version Y (which would in turn require
> core version X). CPAN would take care of it. This isn't much
> different than what occurs everyday on CPAN anyway (Jay's Catalyst,
> Moose and MooseX, and so on). In fact, several Moose-requiring
> distributions don't require the latest Moose.
>
>> When I do a SVN (or git) checkout do I need to checkout each of
>> these in its own directory? Or will there be a master packaging
>> script that makes the necessary zip files for CPAN submission?
>
> Not sure; that would be up to us I suppose. I think it would be
> easier to maintain and release if they were separate or packaged up
> as Jay suggests.
>
>> If they are in separate directories are we organizing by conceptual
>> topic (phylogenetics, alignment, database search) or by namespace
>> of the modules?
>
> By topic, retaining namespaces. We have a basic Bio::* directory
> structure already in place for various generic terms (Tools, DB,
> etc), so I see this crossing simple namespaces very easily. And as
> I pointed out to Robert, several of those could possibly go together.
>
>> Do all the 'database' modules live together - probably not - so do
>> we name bioperl-db-remote bioperl-db-local-index, bioperl-db-local-
>> sql, etc? really bioperl-db is somewhat focused on sequences and
>> features, but what about things that integrate multiple data types
>> - like biosql?
>
> I don't see bioperl-db (BioSQL) being split up. I think it's too
> intrinsically linked and cohesive (it's almost a separate core unto
> itself), so it would be counterproductive to do so.
>
> Maybe have bioperl-db become bioperl-biosql. Web-based = bioperl-
> remotedb. Local = bioperl-localdb. OBDA = bioperl-obda.
>
>> If they are in separate directories, what about all the test data
>> that might be shared, is this replicated among all the sub-
>> directories - how do we do a good job keeping that up to date,
>> could we have a test-data distro instead with symlinks within SVN?
>
> We have to see how much is actually shared and proceed from there.
> I would like to eventually resurrect the idea of a separate biodata
> repo that we could just ftp the data from as needed. That would cut
> down on the package size quite a bit, but I'm not sure how feasible
> that is from the testing point of view (would we have to skip all
> tests if there were no network access)?
>
>> For some other obvious modules that can be split off and self-
>> contained, each of these could be a package. I would estimate more
>> than 20 packages depending on how Bio::Tools are carved up.
>> - I think Bio::DB::SeqFeature needs to be split off for sure this
>> is a nice logical peeling off. Could be another test case since it
>> is a Gbrowse dependancy
>> - Bio::DB::GFF as well for the same reasons.
>
> Completely agree (and I think Lincoln would like this as well).
>
>> - Bio::PopGen - self contained for the most part, but depends on
>> Bio::Tree and Bio::Align objects
>
> Could list those as a required dependency.
>
>> - Bio::Variation
>> - Bio::Map and Bio::MapIO
>> - Bio::Cluster and Bio::ClusterIO
>> - Bio::Assembly
>> - Bio::Coordinate
>>
>> My nightmare is that we're going to have to manage a lot of 'use XX
>> 1.01' enforcing version requiring when dealing with the
>> dependancies on the interface classes and having to keep these all
>> up to date? The version was implicit when they are all part of the
>> same big distro.
>
> Right. But it also becomes a maintenance problem when serious bugs
> in one module impede the needed release of others to CPAN.
>
>> Also the splits need not only include one namespace if need be I
>> guess but we have generally grouped things by namespace.
>>
>> What do you want to do about the bioperl-run. Do we make a set of
>> parallel splits from all of these? I think at the outset we need
>> to coordinate the applications supported here in some sort of loose
>> ontology - the namespaces were not consistently applied so we have
>> some alignment tools in different directories, etc. So the
>> namespace sort of classifies them but it could be better. One of
>> the challenges of multiple developers without a totally shared
>> vision on how it should be done.
>
> We could split bp-run and Tools, pairing the wrappers with the
> relevant parsers modules. Not sure if this can be done with
> SearchIO as well but it could be tested to see how feasible that
> would be.
>
>> I'm not convinced that the Bio::Graphics splitoff has been painless
>> so we should take stock of how that is working.
>
> Really? Lincoln has made several fixes lately on CPAN, so I thought
> everything was going well. If anything I would think the lack of
> additional 1.6.x bioperl releases has probably held Gbrowse 2.0 up
> more due to Bio::DB::SeqFeature (my fault, but as you know life and
> job take precedence sometimes).
>
>> It seems like this split off would be a way to better streamline
>> things in bioperl so that modern versions of bioperl might be able
>> to better interface with things like Ensembl again too.
>>
>> How much of this effort is worth triaging on the current code
>> versus the efforts we want to make on a cleaner, simpler bioperl
>> system that appears to scare so many users (and potential
>> developers) off.
>
> I say triage away on a branch, but we need to indicate which ones to
> whittle out first. The reason I believe we went for a larger split
> initially (as indicated on the wiki page) was to push something
> forward and not get too bogged down in the details. But we may as
> well go full throttle and do this right away.
>
>> Okay I rambled, hope that was helpful.
>>
>> -jason
>> --
>> Jason Stajich
>> jason at bioperl.org
>
> Very, very helpful. Now I need a beer.
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-----------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research
More information about the Bioperl-l
mailing list