[Bioperl-l] SeqIO profiling

Aaron J Mackey Aaron J. Mackey" <amackey@virginia.edu
Tue, 17 Sep 2002 08:44:17 -0400 (EDT)


  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.
  Send mail to mime@docserver.cac.washington.edu for more info.

--0-1526045603-1032266657=:9055
Content-Type: TEXT/PLAIN; charset=US-ASCII


With my just-committed patches to Root and RootI (BEGIN->INIT and some
small optimizations to _rearrange), things are marginally better
(previously this took around 30 seconds, and _rearrange took more than 7
seconds of the time):

Total Elapsed Time = 23.99110 Seconds
         User Time =    26.62 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 16.0   4.270  4.270 119910   0.0000 0.0000  Bio::Root::RootI::_rearrange
 10.2   2.740  4.700 119910   0.0000 0.0000  Bio::Root::RootI::new
 8.08   2.150 10.480  47962   0.0000 0.0002  Bio::PrimarySeq::new
 7.59   2.020 25.150  23982   0.0001 0.0010  Bio::SeqIO::fasta::next_seq
 6.12   1.630 17.210  23990   0.0001 0.0007  Bio::Seq::BEGIN
 4.92   1.310  1.310 143892   0.0000 0.0000  Bio::Root::Root::verbose
 4.43   1.180  3.580  47962   0.0000 0.0001  Bio::PrimarySeq::seq
 3.08   0.820  0.820  95924   0.0000 0.0000  Bio::PrimarySeq::alphabet
 2.97   0.790  0.790 119910   0.0000 0.0000  Bio::Root::Root::_create_object
 2.89   0.770  3.630  23981   0.0000 0.0002  Bio::Annotation::Collection::new
 2.89   0.770  1.830  23981   0.0000 0.0001  Bio::Annotation::TypeManager::new
 2.85   0.760  1.490  23981   0.0000 0.0001  Bio::PrimarySeq::_guess_alphabet
 2.48   0.660  3.510  47962   0.0000 0.0001  Bio::SeqIO::alphabet
 2.37   0.630 18.190  23981   0.0000 0.0008  Bio::Seq::SeqFactory::create
 2.22   0.590  0.750  23982   0.0000 0.0000  Bio::Root::IO::_readline

I still don't understand why we're getting 6 times as many
Bio::Root::RootI::new() calls (and 2 times as many PrimarySeq::new()
calls) as we are next_seq() calls ... ?!?!

So that folks can play along at home, I've attached the testing script,
and you can download my test protein database (~5 Mb, compressed) at:
http://www.people.virginia.edu/~ajm6q/sp_protein.lseg.gz

% perl -d:DProf testSeqIO.pl sp_protein.lseg
% dprofpp -u

[ ... etc ... ]

-Aaron

On Tue, 17 Sep 2002, Tim Bunce wrote:

> Aaron,
>
> Can you send me the script and anything else I'd need to be able
> to reproduce this data? Inclusing the data file. I'd like to take
> a look at it (performance tuning is a particular interest of mine).
>
> What version of bioperl were you using? 1.0.2, or from the repository?
>
> Tim.
>
> On Thu, Sep 12, 2002 at 03:40:38PM -0400, Aaron J Mackey wrote:
> >
> > here's some more fuel for the fire; this is a profiling run from a
> > super-simple script that invokes a SeqIO::fasta on a 24K sequence protein
> > database, and does nothing else but call next_seq ... 24K calls to
> > next_seq, but 120K calls to _rearrange and BEGIN!!
> >
> > Total Elapsed Time = 30.84184 Seconds
> >          User Time = 30.51116 Seconds
> > Exclusive Times
> > %Time ExclSec CumulS #Calls sec/call Csec/c  Name
> >  23.0   7.040  6.980 119910   0.0001 0.0001  Bio::Root::RootI::_rearrange
> >  11.6   3.539  5.530 119914   0.0000 0.0000  Bio::Root::RootI::BEGIN
> >  9.96   3.039 30.811  23982   0.0001 0.0013  Bio::SeqIO::fasta::next_seq
> >  8.81   2.689 12.976  47969   0.0001 0.0003  Bio::PrimarySeq::BEGIN
> >  5.67   1.730  1.658 143892   0.0000 0.0000  Bio::Root::Root::verbose
> >  5.21   1.589 20.041  23981   0.0001 0.0008  Bio::Seq::new
> >  3.77   1.150  3.434  47962   0.0000 0.0001  Bio::PrimarySeq::seq
> >  3.02   0.920  5.134  47962   0.0000 0.0001  Bio::SeqIO::alphabet
> >  2.92   0.890  2.454  23981   0.0000 0.0001  Bio::Annotation::TypeManager::new
> >  2.88   0.880  4.514  23981   0.0000 0.0002  Bio::Annotation::Collection::new
> >  2.75   0.840  0.792  95924   0.0000 0.0000  Bio::PrimarySeq::alphabet
> >  2.72   0.830  0.850 119917   0.0000 0.0000  Bio::Root::Root::BEGIN
> >  2.65   0.810 21.299  23981   0.0000 0.0009  Bio::Seq::SeqFactory::create
> >  2.52   0.770  0.734  71943   0.0000 0.0000  Bio::Annotation::TypeManager::_add
> >                                              _type_map
> >  2.43   0.740  1.336  23981   0.0000 0.0001  Bio::PrimarySeq::_guess_alphabet
> >
> > here's what the call trace looks like, for each $io->next_seq call:
> >
> > Bio::SeqIO::fasta::next_seq
> >    Bio::Root::IO::_readline
> >       File::Temp::BEGIN
> >    Bio::SeqIO::alphabet
> >    Bio::SeqIO::sequence_factory
> >    Bio::Seq::SeqFactory::create
> >       Bio::Seq::SeqFactory::type
> >       Bio::Root::Root::verbose
> >       Bio::Seq::new
> >          Bio::Root::RootI::new
> >             Bio::Root::Root::BEGIN
> >             Bio::Root::Root::verbose
> >          Bio::PrimarySeq::BEGIN
> >             Bio::Root::RootI::new
> >                Bio::Root::Root::BEGIN
> >                Bio::Root::Root::verbose
> >             Bio::Root::RootI::_rearrange
> >             Bio::PrimarySeq::seq
> >                Bio::Root::RootI::_rearrange
> >                Bio::PrimarySeq::validate_seq
> >                Bio::PrimarySeq::alphabet
> >                Bio::PrimarySeq::_guess_alphabet
> >                   Bio::PrimarySeq::seq
> >                      Bio::Root::RootI::_rearrange
> >                   Bio::PrimarySeq::alphabet
> >             Bio::PrimarySeq::display_id
> >             Bio::PrimarySeq::primary_id
> >             Bio::PrimarySeq::desc
> >          Bio::Root::RootI::_rearrange
> >          Bio::Seq::primary_id
> >          Bio::Annotation::Collection::new
> >             Bio::Root::RootI::new
> >                Bio::Root::Root::BEGIN
> >                Bio::Root::Root::verbose
> >             Bio::Annotation::TypeManager::new
> >                Bio::Root::RootI::new
> >                   Bio::Root::Root::BEGIN
> >                   Bio::Root::Root::verbose
> >                Bio::Annotation::TypeManager::_add_type_map
> >                Bio::Annotation::TypeManager::_add_type_map
> >                Bio::Annotation::TypeManager::_add_type_map
> >             Bio::Annotation::Collection::_typemap
> >          Bio::Seq::annotation
> >          Bio::Seq::primary_seq
> >             UNIVERSAL::isa
> >             Bio::Seq::top_SeqFeatures
> >    Bio::Seq::alphabet
> >       Bio::Seq::primary_seq
> >       Bio::PrimarySeq::alphabet
> >    Bio::SeqIO::alphabet
> >       Bio::PrimarySeq::BEGIN
> >          Bio::Root::RootI::new
> >             Bio::Root::Root::BEGIN
> >             Bio::Root::Root::verbose
> >          Bio::Root::RootI::_rearrange
> >          Bio::PrimarySeq::alphabet
> >       Bio::Root::RootI::DESTROY
> >          Bio::Root::Root::_cleanup_methods
> > Bio::Seq::DESTROY
> > Bio::Root::RootI::DESTROY
> >    Bio::Root::Root::_cleanup_methods
> > Bio::Root::RootI::DESTROY
> >    Bio::Root::Root::_cleanup_methods
> > Bio::Root::RootI::DESTROY
> >    Bio::Root::Root::_cleanup_methods
> >
> >
> > There's a fair amount of handling in Bio::Root::Root BEGIN that
> > involves setting up Error usage; a $DONT_USE_ERROR in my test script
> > improves things a little (down to 25 seconds instead of 30), but doesn't
> > change much of the profile percentages.
> >
> > -Aaron
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
 Aaron J Mackey
 Pearson Laboratory
 University of Virginia
 (434) 924-2821
 amackey@virginia.edu


--0-1526045603-1032266657=:9055
Content-Type: TEXT/PLAIN; charset=US-ASCII; name="testSeqIO.pl"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.OSF.4.33.0209170844170.9055@alpha10.bioch.virginia.edu>
Content-Description: testSeqIO.pl
Content-Disposition: attachment; filename="testSeqIO.pl"

IyEvdXNyL2Jpbi9wZXJsIC13DQoNCnVzZSBzdHJpY3Q7DQp1c2UgdmFycyBx
dygkRE9OVF9VU0VfRVJST1IpOw0KDQp1c2UgQmlvOjpTZXFJTzsNCg0Kc3Vi
IElOSVQgeyAkRE9OVF9VU0VfRVJST1IgPSAxOyB9Ow0KDQpteSAkZmlsZSA9
IHNoaWZ0IEBBUkdWIG9yIGRpZSAic3VwcGx5IGZpbGVuYW1lIVxuIjsNCg0K
bXkgJGlvID0gbmV3IEJpbzo6U2VxSU8gLWZpbGUgPT4gJGZpbGUsIC1mb3Jt
YXQgPT4gImZhc3RhIjsNCndoaWxlIChteSAkc2VxID0gJGlvLT5uZXh0X3Nl
cSkgew0KICAgICMgbm90aGluZy4NCn0NCg0K
--0-1526045603-1032266657=:9055--