[Bioperl-l] Re: [Root-l] new Bioperl bug tracking system

Ewan Birney birney@ebi.ac.uk
Tue, 22 Oct 2002 15:08:57 +0100 (BST)


> >
> > In what way does it expose its implementation except in the constructor,
> > which - as it is a different implementation -  it has to expose?
> >
>
> I just think it's a bit strange that I have to change my Sequence
> object depending on my input size. It was the thing that confused me
> the most when starting to use bioperl was the inheritens tree of a
> sequence.
>

Well.. I think there are two things going on here:

  (a) Loading up a very big sequence- no matter how you cut it, a 250MB
string (chr1 in human) is just out of scope for most implementations,
whether it is in memory (do you have that amount of memory), in a database
(stretches most database implementations) or network (lets not talk about
network buffers). Handling this amount of data causes all sorts of issues
- eg, the loading system has to load in sequence parts piece by piece
because it can't keep the whole thing in memory.

   The mere fact that, with Bio::LargeSeq it can handle > 5MB sequences is
pretty darn cool in my not so honest opinion - clients only have to change
how they load up the sequence.


   Writing an implementation system which automatically senses when it
should flip implementations is not a complaint I have heard much of and in
some areas is... ummm... challenging because not all formats let you know
how much sequence you are loading - As there are other use cases (eg,
parsing the whole of the EST database) where people are worried about
speed of implementation (and they are *very* sensitive to it), making code
that's shaped well for all cases will provide another level of challenge.
(could be done of course)

   Our current approach is to leave the choice to the client writer (ie,
clients have to be slightly aware of what they are doing...).


   I'm interested if your problem was "I was reading the documentation and
it seemed abstractly confusing" or "I wrote code which then didn't work
because of this".


   BTW - in my experience, anything that has use cases stretching from
strings of 500 characters to 250MB of characters *all* have interesting
choices to make. That said we could do something "clever" here. Has this
really effected you or is this a docs thing?


  (b) The layering of Bio::Seq classes is one of those design decisions
which ups the understanding curve but considerably lowers the overhead of
development. My honest and straightforward opinion is that if we stuck
with one Bio::Seq class we would have been in serious shit by now with
many different people pushing and pulling that class across different use
cases. The Seq/PrimarySeq seperates "useful biological info" from "storing
sequences" - the Seq/RichSeq separates out all the "database tracking
pecuarlities" out from "stuff we expect everything to do". Without these
splits I just don't think we would have kept bioperl together.


   my suspicion is that you read the docs and wondered why - even though I
suspect the tutorial tries to suggest that there are good reasons to do
this split - you said to yourself "do these guys really know what they are
doing or is this just signs of software insanity" - my answer to that is -
we don't know what we are doing any more than any other large software
project trying to produce sensible code - ie - 25% good design, 25% bad
design, 50% it-does-the-job-at-the-time.


   having seen internals of other projects, I put us in the higher
quartile of software sanity, but I'd be happy to hear (sensible)
suggestions about how to do it better...



> But most of my problems might be caused by my lack of
> insight into biology jargon...
>

i doubt that. Biology jargon here is pretty small...

> >
> > ...confused...
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
> >
>
> --
> Mvh|Regards, Lars
> System administrator     | Student
> Bioinformatics Centre    | Department of Computer Science
> University of Copenhagen | University of Copenhagen
> http://www.binf.ku.dk    | http://www.diku.dk
> When's the last time you used duct tape on a duct? -- Larry Wall
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>