[Bioperl-l] Re: [Root-l] new Bioperl bug tracking system

Lars G. T. Jorgensen larsj@diku.dk
22 Oct 2002 16:28:05 +0200


Ewan Birney <birney@ebi.ac.uk> writes:

> > >
> > > In what way does it expose its implementation except in the constructor,
> > > which - as it is a different implementation -  it has to expose?
> > >
> >
> > I just think it's a bit strange that I have to change my Sequence
> > object depending on my input size. It was the thing that confused me
> > the most when starting to use bioperl was the inheritens tree of a
> > sequence.
> >
> 
> Well.. I think there are two things going on here:
> 
>   (a) Loading up a very big sequence- no matter how you cut it, a 250MB
> string (chr1 in human) is just out of scope for most implementations,
> whether it is in memory (do you have that amount of memory), in a database
> (stretches most database implementations) or network (lets not talk about
> network buffers). Handling this amount of data causes all sorts of issues
> - eg, the loading system has to load in sequence parts piece by piece
> because it can't keep the whole thing in memory.
> 
>    The mere fact that, with Bio::LargeSeq it can handle > 5MB sequences is
> pretty darn cool in my not so honest opinion - clients only have to change
> how they load up the sequence.
> 
> 
>    Writing an implementation system which automatically senses when it
> should flip implementations is not a complaint I have heard much of and in
> some areas is... ummm... challenging because not all formats let you know
> how much sequence you are loading - As there are other use cases (eg,
> parsing the whole of the EST database) where people are worried about
> speed of implementation (and they are *very* sensitive to it), making code
> that's shaped well for all cases will provide another level of challenge.
> (could be done of course)
> 
>    Our current approach is to leave the choice to the client writer (ie,
> clients have to be slightly aware of what they are doing...).
> 
> 
>    I'm interested if your problem was "I was reading the documentation and
> it seemed abstractly confusing" or "I wrote code which then didn't work
> because of this".
> 
> 
>    BTW - in my experience, anything that has use cases stretching from
> strings of 500 characters to 250MB of characters *all* have interesting
> choices to make. That said we could do something "clever" here. Has this
> really effected you or is this a docs thing?
> 
> 
>   (b) The layering of Bio::Seq classes is one of those design decisions
> which ups the understanding curve but considerably lowers the overhead of
> development. My honest and straightforward opinion is that if we stuck
> with one Bio::Seq class we would have been in serious shit by now with
> many different people pushing and pulling that class across different use
> cases. The Seq/PrimarySeq seperates "useful biological info" from "storing
> sequences" - the Seq/RichSeq separates out all the "database tracking
> pecuarlities" out from "stuff we expect everything to do". Without these
> splits I just don't think we would have kept bioperl together.
> 
> 
>    my suspicion is that you read the docs and wondered why - even though I
> suspect the tutorial tries to suggest that there are good reasons to do
> this split - you said to yourself "do these guys really know what they are
> doing or is this just signs of software insanity" - my answer to that is -
> we don't know what we are doing any more than any other large software
> project trying to produce sensible code - ie - 25% good design, 25% bad
> design, 50% it-does-the-job-at-the-time.
> 
> 
>    having seen internals of other projects, I put us in the higher
> quartile of software sanity, but I'd be happy to hear (sensible)
> suggestions about how to do it better...

I don't talking about the code quality of bioperl I to think it's
top. You are the last people to be accused of doing crappy
bioinformatics software.

I understand the need for different implementations for different
jobs. But the point I tried to get across in a bit difuse manner, is
that I think the *IO interfaces are a lot easier to work with then the
Sequence classes. 

So the thing I wanted was this little subtle difference:
my $seq = new Seq(-storage => "disk");

insted of

my $seq = new LargeSequence();

I also once tried to clean up the bioperl.dia class diagram so it
would fit every major part (as stated in you article) on a A4 paper.
But I failed because I only had a glidepad to do it with. 



> 
> 
> 
> > But most of my problems might be caused by my lack of
> > insight into biology jargon...
> >
> 
> i doubt that. Biology jargon here is pretty small...
> 
> > >
> > > ...confused...
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@bioperl.org
> > > http://bioperl.org/mailman/listinfo/bioperl-l
> > >
> > >
> >
> > --
> > Mvh|Regards, Lars
> > System administrator     | Student
> > Bioinformatics Centre    | Department of Computer Science
> > University of Copenhagen | University of Copenhagen
> > http://www.binf.ku.dk    | http://www.diku.dk
> > When's the last time you used duct tape on a duct? -- Larry Wall
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
> 
> 
> 

-- 
Mvh|Regards, Lars
System administrator     | Student 
Bioinformatics Centre    | Department of Computer Science  
University of Copenhagen | University of Copenhagen
http://www.binf.ku.dk    | http://www.diku.dk
When's the last time you used duct tape on a duct? -- Larry Wall