[Bioperl-l] SeqIO parsing

Tue, 24 Sep 2002 17:01:37 +0100 (BST)

I have been getting down into the depths of the parsing, and we are 
horrendously slow on the object creation - there are two main reasons:

   (a) a somewhat tortorous path of object creation, which *always*
travels through at least three functions to build a blessed hash (before
you have even got to the object-specific parts). I believe this can be 
slimmed down by:

       (i) Assumming the object's new function is supplied by the 
implementation heirarchy, and not the interface, getting rid of the jump 
through RootI. RootI's new() would now behave like RootI's 
_create_object()

       (ii) Remove the _create_object line in Root.pm - assumming that 
people who make to make a custom object would inheriet from RootI, 
implement ->verbose() and ->new() as they like.

       (iii) To prevent henious errors of RootI compliance without verbose 
being overriden, put in a default implementation of verbose returning 0 
and a warning.

  this scheme in my mind has one *SERIOUS* gotcha. People *have* to write 
their @ISA's with their implementation tree *first* and their interface 
inheritance second. Is this ok with people?

The nice thing about (a) is that it should give speed ups across the 
entire system, not just SeqIO. 

Jason/Hilmar - is there a hidden gotcha here somewhere?

  (b) Making a new Bio::Seq::SeqFactory with privledged access to 
functions in Bio::Seq and Bio::PrimarySeq to make fast access objects, eg, 
not going through a second alphabet on setting seq.

(b) is SeqIO specific so I want to do this second.

BTW - I think I can cut the object creation time by a factor of 6 in my 
tests if I get this written right ;)

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------