[Biopython-dev] biopython on github

Brad Chapman chapmanb at 50mail.com
Tue Mar 17 08:49:30 EDT 2009


Hi everyone;
Nice to see the discussion around trying out git. Leighton and
Tiago, you both brought up some definite concerns in moving
to a distributed version control system.

Git aims to help solve the problem of a them versus us community.
When you read posts critical of Biopython, you will find a lot of
complaints about "they didn't do this." This is confusing, as anyone
using, coding with, interested in, or contributing to Biopython is a
member of the community. CVS can help create this division, since it
appears as a walled off repository only the core developers can
access.

Git frees up the source code and lowers this barrier to contributing. Now
instead of saying "why didn't the developers integrate the code I
sent to the mailing list and write tests and documentation for it,"
we can all turn the question back on ourselves and ask why we didn't
create a branch with our new contribution and do it, soliciting help
from others in Biopython.

With solving the problems come potential concerns. This coincidental
blog post from yesterday intelligently covers a lot of the issues:

http://www.pointy-stick.com/blog/2009/03/16/dark-side-distributed-version-control/

The one we should be most concerned about is fragmentation. The
community of Python coders in bioinformatics is too small to be
split up; surely we are better served by resolving any differences
and producing one high quality reusable code base.

Tiago's assessment of how things should work practically looks
exactly right. Hard working core developers, like Peter and
Michiel, will be maintaining the trunk which we roll releases off
of. Contributors can either submit patches as now, or create short
branches which get merged back in. The advantage of branches is that
others can test and develop the branched code, and that the software
should help deal with some of the pain of merging.

There is a lot of good material in this thread for new potential
developers. Tiago, it would make sense to condense what you've
written and include it with the Contributing guide:

http://biopython.org/wiki/Contributing

We should also create a place on the wiki from the developer
documentation:

http://biopython.org/wiki/Documentation#Documentation_for_Developers

that describes active development branches and their goals
(called, say, ActiveBranches). Tiago, I thought you did a page for PopGen
earlier like this but I can't find it right now. We should keep
communication at a high level to avoid confusing fragmentation.

This is a difficult change in terms of how things work; we are
asking the right questions to create a good environment for improvement.

Brad

> Hi all,
> 
> This has been an occasionally frustrating thread to read...
> 
> On 17/03/2009 08:46, "Peter" <biopython at maubp.freeserve.co.uk> wrote:
> 
> > On Tue, Mar 17, 2009 at 3:45 AM, Chris Lasher <chris.lasher at gmail.com> wrote:
> >> 2009/3/16 Tiago Antão <tiagoantao at gmail.com>
> >> 
> 
> >>> How is the "official" biopython trunk controlled? Currently what is on
> >>> CVS is the gospel and Peter and Michiel essencially have control of
> >>> what is there and what is labelled as a "biopython distribution". How
> >>> will this work now?
>  
> >> In a distributed workflow, there is no technical official repository. The
> >> "official repository" is socially enforced.
> 
> That was true before.  Unless I misread the Biopython licencing, there was
> no real barrier to putting a branched copy of the code on your own
> server/site, with your own modifications.  What git does is provide tools to
> make merging of that sort of code easier (along with a number of of other
> nice features, such as authentication of contributions).  The presence of
> git does not ensure that your changes, or anyone else's, will be merged with
> any other repository, and nor does it ensure the quality of contributed
> code.  Git, while nice, and ideal for a number of tasks, is no magic bullet.
> 
> To an extent, the 'official' repository is, pragmatically, the one that is
> most stable and well-tested.  If my hypothetical branched version had become
> more stable and widely-used than the 'official' trunk, and become the most
> frequently downloaded and implemented, and received new contributions in its
> own right, it might then be considered de facto 'the distribution'; nasty
> online spats with the original authors notwithstanding.  The 'social
> enforcement' of politeness (i.e. *I* don't take credit for *your* work)
> prevents this to an extent, as it ought to under any versioning system.
> 
> There's a competing tendency to consider that the coders who spent the most
> time creating the code understand it the best, and are in the best position
> to maintain it directly.  This is true to a large degree, and entirely
> applicable to Biopython's contributed modules.  git can potentially
> facilitate that sort of contribution to the 'official' trunk in a way that
> CVS can't, due to its permissions bottleneck.  However, the mechanics of
> incorporating that contributed code are more or less the same: the people
> with control of the 'official' trunk review the code and decide whether to
> include it.  This is true whether the code is submitted as a patch to
> Bugzilla, emailed to a developer, put up on public CVS on your site, or in a
> forked git repository.  The same is true of your own git repository - you
> don't have to include someone else's forked code if you don't want to.
> 
> What possibly needs to change is not the version control system, but the way
> in which people think about their contribution.  Contributions can be made
> productively under any versioning system, and the key questions remain the
> same in all cases: Does the new code work (are there tests)? Does the new
> code break any old code?  Is there documentation?  Is the API consistent?
> 
> "What version control system are we using?" is a minor detail, unless it is
> inherently broken, hinders any of the above, or causes some other
> deal-breaking issue (for Linus Torvalds, this included speed issues for
> merges).
> 
> >> I think Michiel and Peter still head the Biopython project--at
> >> least they have the most clout, I would say. Therefore, we will probably
> >> look to one of their branches as the "official" branch of Biopython. When
> >> one of them wants to step down in duty, we will socially pass the torch on
> >> to the next taker.
> 
> It has always been thus.  Now, instead of passing on the user authentication
> to the CVS server at OBF, the user authentication to the biopython github
> account will be passed on, instead:
> 
> > I think it is essential we have a clearly labeled official trunk
> > (perhaps with branches for releases), which will be used for all the
> > official releases (tar balls, zip files and windows installers).  Our
> > main webpage should make this very clear.
> > 
> > We could potentially continue to have a shared official branch (e.g.
> > belonging to the generic github biopython user), and give all the
> > existing CVS contributors write access - and continue to manage this
> > as before.  So for example, if Frank wanted to check in some minor
> > changes to Bio.Nexus he could just do it.  Future contributors
> > patches/branches might get taken up by a developer on a personal
> > branch for testing, before being merged into the official branch.
> > 
> > i.e. We can initially continue as before - right now I don't have a
> > feel for how much work the role of an official branch maintainer would
> > be, and it is difficult to guess without more hands on experience
> > using the new tools.
>  
> Plus ca change (avec git)...
> 
> >>> The second question, related to the first is how will different
> >>> branches (of different persons) be managed? I am seeing people
> >>> starting working on the same code in different directions and then
> >>> having problems merging everything together.
> >> 
> >> People are supposed to work in different directions; this is the point of
> >> distributed workflows.
> 
> I may have a different understanding of 'different directions' than you
> mean, but I don't think that it's good for a community project if people
> work in different directions.  I also don't think that that is the point of
> distributed workflows; on the contrary, I think that they are intended to
> make it easier to work independently towards a common goal.  Even if that is
> by working on loosely- or non-interacting parts of the whole.
> 
> >> Merging tends not to be so difficult, and compared to
> >> centralized models like CVS and SVN, it's a cinch. We will help provide
> >> documentation for proper merging habits (e.g., merge early, merge often, and
> >> no rebasing after pushing, etc.). There are also screencasts popping up (in
> >> particular Scott Chacon's re-make of his Gitcasts, now at learn.github) that
> >> we will link to for educational purposes.
> >> And of course, other developers will be around to help out in tricky merges.
> 
> This characterises one of the frustrating aspects of this thread (not
> getting at you personally, Chris) - the occasional implicit assumption that
> 'things will be inherently *better* if we use git'.  Developers are around
> to help now, even using CVS (which also has clear, long-standing stable
> documentation - and even an O'Reilly book).  Several people don't seem to
> think that that - and the way that code is reviewed and incorporated into
> the main distribution - is good enough, and I don't think that this will
> change just because the version control system has changed.  Nor will
> changing revision control system generate significant free time to write,
> test and document code.  But we may have the recession to do that last one
> for us.
> 
> > Well, yes, in theory we have the same problem now with CVS - and while
> > the tools may make merging easier, some communication is essential
> > when working on the key modules which impact large parts of the code
> > base.
> 
> I would put it more strongly than that: communication is essential in all
> aspects of the project.  A number of related blog posts make statements
> along the lines of "I don't use Biopython, or post to the mailing lists, but
> I think that they're doing *this* wrong", or "I submitted code, but it
> didn't get taken up immediately".  Now, venting and ranting on a blog is
> fine, but it's not really *communicating*, any more than it was when I
> thought that the BioSQL GenBank upload code was broken, fixed it (for my
> purposes) and told no-one.  Git won't change the communication issue (in
> either direction) any more than it changes the code review process.
> 
> FWIW, I think that git looks like a good way to go, and that it could help
> encourage people to make local modifications of Biopython for their own
> benefit and in their own interests and expert area, in a way that is visible
> to the core distribution (unlike the patch submission process that is now
> implemented).  In that way it could facilitate more rapid expansion of the
> core distribution.  However, the bottlenecks of ensuring code quality,
> testing and documentation will only ease if that is taken up by the
> individuals/groups making those contributions, in addition to the core
> developers.
> 
> And yes, I know I'm late with the new GenomeDiagram docs... ;)
> 
> L.
> 
> -- 
> Dr Leighton Pritchard MRSC
> D131, Plant Pathology Programme, SCRI
> Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
> e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
> gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405
> 
> 
> ______________________________________________________
> SCRI, Invergowrie, Dundee, DD2 5DA.  
> The Scottish Crop Research Institute is a charitable company limited by guarantee. 
> Registered in Scotland No: SC 29367.
> Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.
> 
> 
> DISCLAIMER:
> 
> This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
> addressee.
> If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
> this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.
> 
> Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
> ______________________________________________________
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


More information about the Biopython-dev mailing list