[Bioperl-l] SeqIO parsing

Hilmar Lapp hlapp@gnf.org
Tue, 24 Sep 2002 11:36:18 -0700


On Tuesday, September 24, 2002, at 09:01 AM, Ewan Birney wrote:

>
>
>
> I have been getting down into the depths of the parsing, and we are
> horrendously slow on the object creation - there are two main reasons:
>
>
>    (a) a somewhat tortorous path of object creation,

Yes, absolutely.

> which *always*
> travels through at least three functions to build a blessed hash 
> (before
> you have even got to the object-specific parts). I believe this can be
> slimmed down by:
>
>
>        (i) Assumming the object's new function is supplied by the
> implementation heirarchy, and not the interface, getting rid of the 
> jump
> through RootI. RootI's new() would now behave like RootI's
> _create_object()
>
>

I entirely agree. Interfaces should not be instantiated, hence do 
not need a new() to inherit from.


>        (ii) Remove the _create_object line in Root.pm - assumming that
> people who make to make a custom object would inheriet from RootI,
> implement ->verbose() and ->new() as they like.
>

I agree.

>
>        (iii) To prevent henious errors of RootI compliance without 
> verbose
> being overriden, put in a default implementation of verbose returning 0
> and a warning.
>

Not a warning -- throw_not_implemented() and ideally also throw() 
and warn() should still work without warning about themselves.


>
>   this scheme in my mind has one *SERIOUS* gotcha. People *have* to 
> write
> their @ISA's with their implementation tree *first* and their interface
> inheritance second. Is this ok with people?
>

It should be the style everyone employs already. Otherwise you set 
yourself up for trouble anyway (how would you avoid the interface 
method stub that throws an exception from being invoked if you put 
the interface first?).


>
> The nice thing about (a) is that it should give speed ups across the
> entire system, not just SeqIO.
>
>
> Jason/Hilmar - is there a hidden gotcha here somewhere?
>
>

Not that I'd see one immediately.


>
>
>   (b) Making a new Bio::Seq::SeqFactory with privledged access to
> functions in Bio::Seq and Bio::PrimarySeq to make fast access 
> objects, eg,
> not going through a second alphabet on setting seq.
>

I doubt this will have a huge effect but I'm happy to be surprised. 
Also, as for alphabet, you can avoid second alphabet guessing by 
saying $seq->seq(-seq => $seqstr, -alphabet => $seq->alphabet()) or 
$seq(-seq => $seqstr, -alphabet => 'protein').

>
> BTW - I think I can cut the object creation time by a factor of 6 in my
> tests if I get this written right ;)
>
>

looking forward to it :-)

	-hilmar
--
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------