[Bioperl-l] question about the nature of bioperl

Tue, 20 Aug 2002 23:02:41 -0700 (PDT)

On Tue, 20 Aug 2002, nkuipers wrote:

> I mean no offense to anyone, especially being new to this mailing list, but I
> am starting to wonder just what people expect from this project.  Is this
> supposed to be a do-it-all bioinformatics kit or a set of basic tools that
> people are free to use, fleshing it out with their own code as per their
> specific application?  It just seems to me that, if not already, the project
> is on its way to being what is referred to as a "bloated monster" in computer
> science classes.  Everyone has their "bug" catches, specific formats, and
> there are multiple versions flying around with varying degrees of
> documentation and testedness.  Whoa horse.  Stop.  Trying to account for every
> single format or user-defined case is in my opinion folly and impossible,
> especially given the nature of bioinformatics.  Define a broad but simple
> suite that is flexible to specifics and leave the rest to the users.  That's
> what Perl was made for by definition: TIMTOWTDI.  This was (is?) probably the
> idea with bioperl also, but in browsing the hierarchy diagrams and reading the
> emails, it sounds like a big confused mess that several(?) people are trying
> so hard to keep in order but the task is too big.  Simplify simplify.  I think
> there comes a point where too many "bugs" (real or not) means more than
> debugging.  Easier said than done I know.  Heh.  Pay me no mind.

bioperl has been incredibly useful to me in a variety of projects - it has
saved massive amounts of time and sanity. I'm not sure what you mean by
multiple versions flying around - if you're a bioperl user you only care
about the release version. also not sure what you mean by the bug catches.

i think you make an interesting point though - the other bioperlers will
probably disagree with, me but these days i am of the opinion that given
time, *any* piece of object oriented software starts suffering from bloat,
it's inevitable. in fact, bioperl is admirably bloat free, though it could
do with losing a few pounds here and there.

anyway, i think bioperl is moving away from a
force-everyone-to-use-our-grand-unfied-object-model approach; for
instance, we (ok, others, i just sit back and watch and pontificate)
will be writing new parsers that use an event/stream based approach. I
think this qualifies as the simple approach you are hankering after - it
tackles the core nitty gritty problem of interpreting some wacky syntax an
insane fortran programmer invented in 1982, and allows the semantic
interpretation up to the individual programmer/project/use-case. you don't
need to understand/use the sequence object model or the alignment object
model, just catch the events you need and do what you want with them. you
can think of these as xml if you really want to, and use xslt if you
really like (but this is just bloated lisp); or you could just write
simple code that handles the exceptions.

also i think there is a niche for a nice procedural (yes, the horror)
module (or a class Util type module with procedural style calls) for a lot
of core bioinformatics stuff. You know, often I just want to say
$aa = translate("ATG");
rather than have to go through a bunch of bureaucratic object middlemen.

let's get back to your point... the nature of bioperl - the most common
use-case for bioinformatics programmers is dealing with stuff like
sequence and features and sequence analyses. i think bioperl is
astonishingly successful in this niche (if you think otherwise i'd be
interesting in hearing more concrete examples).

bioperl is expanding into other bioinformatics areas. as we do that,
we will have to pay heed to your call for a broad and simple suite of
tools. I think the one unified object model approach will break down, we
really will end up with a bloated monster. I mean, it sort of works with
things like Sequences - we can all agree what a sequence is (can't we?...)
but with richer data you'll never come up with some UML that satisfies
everyone - it's absolutely necessary to accommodate different perspectives
on what a Gene/polymorphism/disease/expression/allele is. We can't use
Bio::Annotation for everything.

i'm not sure I have a clear alternative to the object approach; i guess it
involves loosely coupled, autonomous, simpler modules, that can be
understood largely independently of one another or of a single grand
object model. it would involve coexistence of different representations of
the same data (eg a dbSNP conception of a SNP vs the genbank flat file
conception of a SNP as a feature on the genome), with clearly defined
transformations between them where interoperation is required. it would of
course involve a unified syntax, which would be xml, since no one else
would vote for s-expressions. where objects are useful (eg features,
graphs, trees) we would have them but these would preferably be kept
simple as possible. gbrowse is an example of where bioperl objects are
fantastically useful, but really it only needs a subset of the whole
object model.

Ewan will no doubt say I've been in berkeley too long and am out of my
tree and live in a world where code writes itself and i don't have to come
in to work... but i am a little perturbed by the direction the OO
technology bandwagon has led the whole software industry up this cul de
sac of ever more complex and unnecessary middleware and convoluted
techniques, design patterns are just formulae to turn programmers into
macro processors. at the end of the day it's just passing data around -
maybe bioinformatics should just ignore the mainstream software industry
and get back to computer science fundamentals.

But realistically speaking this won't happen. I do actually have to build
working large scale bioinformatics software for a living (although you
wouldn't think so from this ill though out rant..) and from that
perspective bioperl has been a lifesaver.

> Best regards,
>
> Nathanael Kuipers
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>