[Bioperl-l] bioperl reorganization

Fri Jul 24 18:32:48 UTC 2009

Lincoln,

I recall seeing somewhere in the Bio::DB::SeqFeature code a reliance  
on some of the Bio::DB::GFF Utility stuff (rearrange and binning come  
to mind).

Thinking about it, these are pretty commonly used.  Maybe we could  
move some of these to Bio::Root::Utilities and just export/import code  
as needed.  This way both GFF and SeqFeature::Store could use it.

chris

On Jul 24, 2009, at 8:31 AM, Lincoln Stein wrote:

> My preference would be to split both Bio::DB::SeqFeature and  
> Bio::DB::GFF into their own module. I don't think they depend on  
> each other, but I'm not 100% sure!
>
> Lincoln
>
> On Sat, Jul 18, 2009 at 8:23 AM, Scott Cain <cain.cshl at gmail.com>  
> wrote:
> Hi All,
>
> I don't want to wade in too deeply, but I like the idea of splitting  
> things up.  I think the Bio::Graphics split has gone well and has  
> made life easier in GBrowse world.  I could see Bio::DB::SeqFeature  
> and Bio::DB::GFF being split and either being kept together or going  
> there separate ways (though I have a nagging suspicion that  
> SeqFeature code depends on GFF code in a few places, so it may make  
> sense to just keep them together.
>
> And Chris, if it makes you feel any better, I don't think anything  
> you've done or not done has held up GBrowse2.
>
> Scott
>
>
>
> On Jul 17, 2009, at 11:14 PM, Chris Fields wrote:
>
> My 2c...
>
> On Jul 17, 2009, at 12:01 PM, Jason Stajich wrote:
>
> Will try to weigh in more, a little bit of stream of consciousness  
> to let you know I'm thinking about it.  Tough summer to focus much  
> on this.
>
> Yes, for me as well.  That will change soon (approx two weeks) ;>
>
> It's too bad we are apparently the laughing stock of Perl gurus, but  
> it would be great to see how to modernize aspects of the development.
>
> I'm curious how it will work that we'll have dozens of separate  
> distros that we'll have a hard time keeping track of what directory  
> things are in? Will there have to be a master list of what version  
> and what modules are in what distro now?
>
> I don't think we're a laughingstock as much as we haven't had the  
> time to dedicate towards this (and much of this occurred at a point  
> early on, with that whole 'Cathedral and Bazaar' esr-based thingy).   
> BTW,, those same gurus shouldn't speak: perl core is just as bad and  
> riddled with worse bugs, though rgs and co. wouldn't admit it.
>
> In fact, base.pm itself has a nasty one; I'm surprised no one in the  
> bioperl community has noticed it yet (it's listed as a bug on RT I  
> think):
>
> pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print  
> $Bio::SeqIO::VERSION."\n"'
> 1.0069
> pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print  
> $Bio::Root::IO::VERSION."\n"'
> -1, set by base.pm
>
> Imported modules do not have VERSION set correctly when it is  
> exported.  This hasn't become an issue in bioperl yet (it's really  
> an edge case), but several devs have run into this. And really, why  
> set VERSION to a string like '-1, set by base.pm'?
>
> Anyway, re: versioning, the way I think about it, if we have a small  
> very stable core with version X, and a focused very stable module  
> group with version Y, other distributions would have a separate  
> version and require subgroup version Y (which would in turn require  
> core version X).  CPAN would take care of it.  This isn't much  
> different than what occurs everyday on CPAN anyway (Jay's Catalyst,  
> Moose and MooseX, and so on).  In fact, several Moose-requiring  
> distributions don't require the latest Moose.
>
> When I do a SVN (or git) checkout do I need to checkout each of  
> these in its own directory?  Or will there be a master packaging  
> script that makes the necessary zip files for CPAN submission?
>
> Not sure; that would be up to us I suppose.  I think it would be  
> easier to maintain and release if they were separate or packaged up  
> as Jay suggests.
>
> If they are in separate directories are we organizing by conceptual  
> topic (phylogenetics, alignment, database search) or by namespace of  
> the modules?
>
> By topic, retaining namespaces.  We have a basic Bio::* directory  
> structure already in place for various generic terms (Tools, DB,  
> etc), so I see this crossing simple namespaces very easily.  And as  
> I pointed out to Robert, several of those could possibly go together.
>
> Do all the 'database' modules live together - probably not  - so do  
> we name bioperl-db-remote bioperl-db-local-index, bioperl-db-local- 
> sql, etc?  really bioperl-db is somewhat focused on sequences and  
> features, but what about things that integrate multiple data types -  
> like biosql?
>
> I don't see bioperl-db (BioSQL) being split up.  I think it's too  
> intrinsically linked and cohesive (it's almost a separate core unto  
> itself), so it would be counterproductive to do so.
>
> Maybe have bioperl-db become bioperl-biosql.  Web-based = bioperl- 
> remotedb.  Local = bioperl-localdb. OBDA = bioperl-obda.
>
> If they are in separate directories, what about all the test data  
> that might be shared, is this replicated among all the sub- 
> directories - how do we do a good job keeping that up to date, could  
> we have a test-data distro instead with symlinks within SVN?
>
> We have to see how much is actually shared and proceed from there.   
> I would like to eventually resurrect the idea of a separate biodata  
> repo that we could just ftp the data from as needed.  That would cut  
> down on the package size quite a bit, but I'm not sure how feasible  
> that is from the testing point of view (would we have to skip all  
> tests if there were no network access)?
>
> For some other obvious modules that can be split off and self- 
> contained, each of these could be a package.  I would estimate more  
> than 20 packages depending on how Bio::Tools are carved up.
> - I think Bio::DB::SeqFeature needs to be split off for sure this is  
> a nice logical peeling off.  Could be another test case since it is  
> a Gbrowse dependancy
> -  Bio::DB::GFF as well for the same reasons.
>
> Completely agree (and I think Lincoln would like this as well).
>
> -  Bio::PopGen - self contained for the most part, but depends on  
> Bio::Tree and Bio::Align objects
>
> Could list those as a required dependency.
>
> -  Bio::Variation
> -  Bio::Map and Bio::MapIO
> -  Bio::Cluster and Bio::ClusterIO
> -  Bio::Assembly
> - Bio::Coordinate
>
> My nightmare is that we're going to have to manage a lot of 'use XX  
> 1.01' enforcing version requiring when dealing with the dependancies  
> on the interface classes and having to keep these all up to date?   
> The version was implicit when they are all part of the same big  
> distro.
>
> Right.  But it also becomes a maintenance problem when serious bugs  
> in one module impede the needed release of others to CPAN.
>
> Also the splits need not only include one namespace if need be I  
> guess but we have generally grouped things by namespace.
>
> What do you want to do about the bioperl-run.  Do we make a set of  
> parallel splits from all of these?  I think at the outset we need to  
> coordinate the applications supported here in some sort of loose  
> ontology - the namespaces were not consistently applied so we have  
> some alignment tools in different directories, etc.  So the  
> namespace sort of classifies them but it could be better.  One of  
> the challenges of multiple developers without a totally shared  
> vision on how it should be done.
>
> We could split bp-run and Tools, pairing the wrappers with the  
> relevant parsers modules.  Not sure if this can be done with  
> SearchIO as well but it could be tested to see how feasible that  
> would be.
>
> I'm not convinced that the Bio::Graphics splitoff has been painless  
> so we should take stock of how that is working.
>
> Really?  Lincoln has made several fixes lately on CPAN, so I thought  
> everything was going well.  If anything I would think the lack of  
> additional 1.6.x bioperl releases has probably held Gbrowse 2.0 up  
> more due to Bio::DB::SeqFeature (my fault, but as you know life and  
> job take precedence sometimes).
>
> It seems like this split off would be a way to better streamline  
> things in bioperl so that modern versions of bioperl might be able  
> to better interface with things like Ensembl again too.
>
> How much of this effort is worth triaging on the current code versus  
> the efforts we want to make on a cleaner, simpler bioperl system  
> that appears to scare so many users (and potential developers) off.
>
> I say triage away on a branch, but we need to indicate which ones to  
> whittle out first.  The reason I believe we went for a larger split  
> initially (as indicated on the wiki page) was to push something  
> forward and not get too bogged down in the details.  But we may as  
> well go full throttle and do this right away.
>
> Okay I rambled, hope that was helpful.
>
> -jason
> --
> Jason Stajich
> jason at bioperl.org
>
> Very, very helpful.  Now I need a beer.
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -----------------------------------------------------------------------
> Scott Cain, Ph. D. scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/) 216-392-3087
> Ontario Institute for Cancer Research
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
> -- 
> Lincoln D. Stein
> Director, Informatics and Biocomputing Platform
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>