[Bioperl-l] bioperl reorganization

Sat Jul 18 09:48:54 EDT 2009

Scott,

I think keeping the two together is a good idea unless Bio::DB::GFF is  
essentially end-of-life and will no longer be maintained.  Then maybe  
it's a good idea to port all needed methods to Bio::DB::SeaFeature and  
release the code separately, then call it a day on Bio::DB::GFF  
maintenance-wise?  Just a thought.

Nice to hear my tardiness on 1.6.whatever has not held up Gbrowse2.   
Thanks!  Will be setting up my own local instance of Gbrowse2 here soon.

chris

On Jul 18, 2009, at 7:23 AM, Scott Cain wrote:

> Hi All,
>
> I don't want to wade in too deeply, but I like the idea of splitting  
> things up.  I think the Bio::Graphics split has gone well and has  
> made life easier in GBrowse world.  I could see Bio::DB::SeqFeature  
> and Bio::DB::GFF being split and either being kept together or going  
> there separate ways (though I have a nagging suspicion that  
> SeqFeature code depends on GFF code in a few places, so it may make  
> sense to just keep them together.
>
> And Chris, if it makes you feel any better, I don't think anything  
> you've done or not done has held up GBrowse2.
>
> Scott
>
>
> On Jul 17, 2009, at 11:14 PM, Chris Fields wrote:
>
>> My 2c...
>>
>> On Jul 17, 2009, at 12:01 PM, Jason Stajich wrote:
>>
>>> Will try to weigh in more, a little bit of stream of consciousness  
>>> to let you know I'm thinking about it.  Tough summer to focus much  
>>> on this.
>>
>> Yes, for me as well.  That will change soon (approx two weeks) ;>
>>
>>> It's too bad we are apparently the laughing stock of Perl gurus,  
>>> but it would be great to see how to modernize aspects of the  
>>> development.
>>>
>>> I'm curious how it will work that we'll have dozens of separate  
>>> distros that we'll have a hard time keeping track of what  
>>> directory things are in? Will there have to be a master list of  
>>> what version and what modules are in what distro now?
>>
>> I don't think we're a laughingstock as much as we haven't had the  
>> time to dedicate towards this (and much of this occurred at a point  
>> early on, with that whole 'Cathedral and Bazaar' esr-based  
>> thingy).  BTW,, those same gurus shouldn't speak: perl core is just  
>> as bad and riddled with worse bugs, though rgs and co. wouldn't  
>> admit it.
>>
>> In fact, base.pm itself has a nasty one; I'm surprised no one in  
>> the bioperl community has noticed it yet (it's listed as a bug on  
>> RT I think):
>>
>> pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print  
>> $Bio::SeqIO::VERSION."\n"'
>> 1.0069
>> pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print  
>> $Bio::Root::IO::VERSION."\n"'
>> -1, set by base.pm
>>
>> Imported modules do not have VERSION set correctly when it is  
>> exported.  This hasn't become an issue in bioperl yet (it's really  
>> an edge case), but several devs have run into this. And really, why  
>> set VERSION to a string like '-1, set by base.pm'?
>>
>> Anyway, re: versioning, the way I think about it, if we have a  
>> small very stable core with version X, and a focused very stable  
>> module group with version Y, other distributions would have a  
>> separate version and require subgroup version Y (which would in  
>> turn require core version X).  CPAN would take care of it.  This  
>> isn't much different than what occurs everyday on CPAN anyway  
>> (Jay's Catalyst, Moose and MooseX, and so on).  In fact, several  
>> Moose-requiring distributions don't require the latest Moose.
>>
>>> When I do a SVN (or git) checkout do I need to checkout each of  
>>> these in its own directory?  Or will there be a master packaging  
>>> script that makes the necessary zip files for CPAN submission?
>>
>> Not sure; that would be up to us I suppose.  I think it would be  
>> easier to maintain and release if they were separate or packaged up  
>> as Jay suggests.
>>
>>> If they are in separate directories are we organizing by  
>>> conceptual topic (phylogenetics, alignment, database search) or by  
>>> namespace of the modules?
>>
>> By topic, retaining namespaces.  We have a basic Bio::* directory  
>> structure already in place for various generic terms (Tools, DB,  
>> etc), so I see this crossing simple namespaces very easily.  And as  
>> I pointed out to Robert, several of those could possibly go together.
>>
>>> Do all the 'database' modules live together - probably not  - so  
>>> do we name bioperl-db-remote bioperl-db-local-index, bioperl-db- 
>>> local-sql, etc?  really bioperl-db is somewhat focused on  
>>> sequences and features, but what about things that integrate  
>>> multiple data types - like biosql?
>>
>> I don't see bioperl-db (BioSQL) being split up.  I think it's too  
>> intrinsically linked and cohesive (it's almost a separate core unto  
>> itself), so it would be counterproductive to do so.
>>
>> Maybe have bioperl-db become bioperl-biosql.  Web-based = bioperl- 
>> remotedb.  Local = bioperl-localdb. OBDA = bioperl-obda.
>>
>>> If they are in separate directories, what about all the test data  
>>> that might be shared, is this replicated among all the sub- 
>>> directories - how do we do a good job keeping that up to date,  
>>> could we have a test-data distro instead with symlinks within SVN?
>>
>> We have to see how much is actually shared and proceed from there.   
>> I would like to eventually resurrect the idea of a separate biodata  
>> repo that we could just ftp the data from as needed.  That would  
>> cut down on the package size quite a bit, but I'm not sure how  
>> feasible that is from the testing point of view (would we have to  
>> skip all tests if there were no network access)?
>>
>>> For some other obvious modules that can be split off and self- 
>>> contained, each of these could be a package.  I would estimate  
>>> more than 20 packages depending on how Bio::Tools are carved up.
>>> - I think Bio::DB::SeqFeature needs to be split off for sure this  
>>> is a nice logical peeling off.  Could be another test case since  
>>> it is a Gbrowse dependancy
>>> -  Bio::DB::GFF as well for the same reasons.
>>
>> Completely agree (and I think Lincoln would like this as well).
>>
>>> -  Bio::PopGen - self contained for the most part, but depends on  
>>> Bio::Tree and Bio::Align objects
>>
>> Could list those as a required dependency.
>>
>>> -  Bio::Variation
>>> -  Bio::Map and Bio::MapIO
>>> -  Bio::Cluster and Bio::ClusterIO
>>> -  Bio::Assembly
>>> - Bio::Coordinate
>>>
>>> My nightmare is that we're going to have to manage a lot of 'use  
>>> XX 1.01' enforcing version requiring when dealing with the  
>>> dependancies on the interface classes and having to keep these all  
>>> up to date?  The version was implicit when they are all part of  
>>> the same big distro.
>>
>> Right.  But it also becomes a maintenance problem when serious bugs  
>> in one module impede the needed release of others to CPAN.
>>
>>> Also the splits need not only include one namespace if need be I  
>>> guess but we have generally grouped things by namespace.
>>>
>>> What do you want to do about the bioperl-run.  Do we make a set of  
>>> parallel splits from all of these?  I think at the outset we need  
>>> to coordinate the applications supported here in some sort of  
>>> loose ontology - the namespaces were not consistently applied so  
>>> we have some alignment tools in different directories, etc.  So  
>>> the namespace sort of classifies them but it could be better.  One  
>>> of the challenges of multiple developers without a totally shared  
>>> vision on how it should be done.
>>
>> We could split bp-run and Tools, pairing the wrappers with the  
>> relevant parsers modules.  Not sure if this can be done with  
>> SearchIO as well but it could be tested to see how feasible that  
>> would be.
>>
>>> I'm not convinced that the Bio::Graphics splitoff has been  
>>> painless so we should take stock of how that is working.
>>
>> Really?  Lincoln has made several fixes lately on CPAN, so I  
>> thought everything was going well.  If anything I would think the  
>> lack of additional 1.6.x bioperl releases has probably held Gbrowse  
>> 2.0 up more due to Bio::DB::SeqFeature (my fault, but as you know  
>> life and job take precedence sometimes).
>>
>>> It seems like this split off would be a way to better streamline  
>>> things in bioperl so that modern versions of bioperl might be able  
>>> to better interface with things like Ensembl again too.
>>>
>>> How much of this effort is worth triaging on the current code  
>>> versus the efforts we want to make on a cleaner, simpler bioperl  
>>> system that appears to scare so many users (and potential  
>>> developers) off.
>>
>> I say triage away on a branch, but we need to indicate which ones  
>> to whittle out first.  The reason I believe we went for a larger  
>> split initially (as indicated on the wiki page) was to push  
>> something forward and not get too bogged down in the details.  But  
>> we may as well go full throttle and do this right away.
>>
>>> Okay I rambled, hope that was helpful.
>>>
>>> -jason
>>> --
>>> Jason Stajich
>>> jason at bioperl.org
>>
>> Very, very helpful.  Now I need a beer.
>>
>> chris
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -----------------------------------------------------------------------
> Scott Cain, Ph. D. scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/) 216-392-3087
> Ontario Institute for Cancer Research
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l