[Bioperl-l] SeqIO parsing

Ewan Birney birney@ebi.ac.uk
Tue, 24 Sep 2002 17:01:37 +0100 (BST)

I have been getting down into the depths of the parsing, and we are 
horrendously slow on the object creation - there are two main reasons:

   (a) a somewhat tortorous path of object creation, which *always*
travels through at least three functions to build a blessed hash (before
you have even got to the object-specific parts). I believe this can be 
slimmed down by:

       (i) Assumming the object's new function is supplied by the 
implementation heirarchy, and not the interface, getting rid of the jump 
through RootI. RootI's new() would now behave like RootI's 

       (ii) Remove the _create_object line in Root.pm - assumming that 
people who make to make a custom object would inheriet from RootI, 
implement ->verbose() and ->new() as they like.

       (iii) To prevent henious errors of RootI compliance without verbose 
being overriden, put in a default implementation of verbose returning 0 
and a warning.

  this scheme in my mind has one *SERIOUS* gotcha. People *have* to write 
their @ISA's with their implementation tree *first* and their interface 
inheritance second. Is this ok with people?

The nice thing about (a) is that it should give speed ups across the 
entire system, not just SeqIO. 

Jason/Hilmar - is there a hidden gotcha here somewhere?

  (b) Making a new Bio::Seq::SeqFactory with privledged access to 
functions in Bio::Seq and Bio::PrimarySeq to make fast access objects, eg, 
not going through a second alphabet on setting seq.

(b) is SeqIO specific so I want to do this second.

BTW - I think I can cut the object creation time by a factor of 6 in my 
tests if I get this written right ;)


Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420