[MOBY-dev] XML parsing takes forever

Kenny Billiau kebil at psb.ugent.be
Tue Nov 4 08:02:49 EST 2008


Hi,

I've run a little testprogram to see where most of the time is spent. I use a stripped down version of the xml file I've sent you. All but a few ElementContainers and no sequence. Just to keep the processing time low enough to debug.

      use MOSES::MOBY::Parser;

      open(FILE, '<', $ARGV[0]);
      my @lines = <FILE>;
      close FILE;

      my $parser = MOSES::MOBY::Parser->new();
      print $parser->parse( method => 'string', data => join(q{}, @lines) );


[davebowman at hal2000 ~]$ /usr/bin/perl -d:DProf mobytest.pl mobysav_small.xml
[davebowman at hal2000 ~]$ dprofpp
Exporter::Heavy::heavy_export has 14 unstacked calls in outer
Config::Simple::import_from has 1 unstacked calls in outer
utf8::AUTOLOAD has -1 unstacked calls in outer
Config::Simple::verbose has 1 unstacked calls in outer
Config::Simple::vars has 1 unstacked calls in outer
AutoLoader::AUTOLOAD has -3 unstacked calls in outer
Fcntl::AUTOLOAD has -6 unstacked calls in outer
Fcntl::__ANON__ has 6 unstacked calls in outer
Exporter::export has -14 unstacked calls in outer
utf8::SWASHNEW has 1 unstacked calls in outer
Total Elapsed Time = 8.470220 Seconds
   User+System Time = 7.750220 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
  68.1   5.280  5.280  69736   0.0001 0.0001  IO::Scalar::print
  6.94   0.538  0.783  64088   0.0000 0.0000  Dumpvalue::stringify
  6.04   0.468  7.173  33755   0.0000 0.0002  Dumpvalue::unwrap
  4.30   0.333  0.333  47091   0.0000 0.0000  File::Spec::Unix::canonpath
  2.43   0.188  7.157  33006   0.0000 0.0002  Dumpvalue::DumpElem
  2.13   0.165  0.165  59776   0.0000 0.0000  overload::mycan
  1.78   0.138  0.224  15990   0.0000 0.0000  MOSES::MOBY::Base::__ANON__
  1.42   0.110  0.708    539   0.0002 0.0013  MOSES::MOBY::Cache::Central::_createDataTypeFromXML
  1.39   0.108  0.108 142023   0.0000 0.0000  UNIVERSAL::isa
  1.29   0.100  0.100    208   0.0005 0.0005  MOSES::MOBY::Cache::Registries::init
  1.20   0.093  0.357  17359   0.0000 0.0000  overload::OverloadedStringify
  1.16   0.090  0.090  30407   0.0000 0.0000  XML::LibXML::Node::nodeName
  1.02   0.079  0.079    539   0.0001 0.0001  MOSES::MOBY::Cache::Central::_clean
  0.99   0.077  0.418    207   0.0004 0.0020  MOSES::MOBY::Generators::Utils::find_file
  0.94   0.073  0.203   3147   0.0000 0.0001  XML::LibXML::Element::getChildrenByTagName

It's fairly obivous now ;)

If I remove the overload operator in MOSES::MOBY::Base, time is reduced 
15fold! But for the 250KB file I've sent, it still takes us a minute to 
parse.

If I simply parse the file with XML::LibXML and no callbacks or custom 
handlers, it's parsed instantly:

[davebowman at hal2000 ~]$ /usr/bin/perl -d:DProf mobytest.pl mobysav_small.xml
[kebil at psbdev01 ~]$ dprofpp
Total Elapsed Time = 0.079034 Seconds
   User+System Time = 0.079034 Seconds


-Kenny, slowly getting to the bottom of this ;)


On Thu, 30 Oct 2008, Edward Kawas wrote:

> Now I can confirm that it is slow parsing and not an infinite loop! The XML
> file took over an hour on my virtual machine to parse. I will see what I can
> do with other SAX parsers for Perl.
>
> Thanks,
>
> Eddie
>
> -----Original Message-----
> From: moby-dev-bounces at lists.open-bio.org
> [mailto:moby-dev-bounces at lists.open-bio.org] On Behalf Of José María
> Fernández González
> Sent: October-30-08 12:55 PM
> To: Core developer announcements
> Subject: Re: [MOBY-dev] XML parsing takes forever
>
> Hi everybody,
> 	I'm using the XML::LibXML SAX parser for other projects and tasks,
> and it can
> be really, really slow. You don't realize that until you are parsing a large
> file from a pipe, and you see that most of the time the program which is
> parsing is CPU (and not I/O) bounded.
>
> 	Best Regards,
> 		José María
>
> Edward Kawas wrote:
>> I think that you are right about the parser not going into an infinite
> loop.
>> For some reason, the SAX parser seems to be really slow (although, I don't
>> want to say that until I know for sure).
>>
>> Eddie
>>
>> -----Original Message-----
>> From: moby-dev-bounces at lists.open-bio.org
>> [mailto:moby-dev-bounces at lists.open-bio.org] On Behalf Of Kenny Billiau
>> Sent: October-30-08 11:56 AM
>> To: Core developer announcements
>> Subject: Re: [MOBY-dev] XML parsing takes forever
>>
>> Hi,
>>
>> thx for helping to investigate the problem!
>>
>> I'm pretty sure the parser gets out of it's mess. When you truncate the
>> XML file by removing all but a few ElementContainers, it takes awhile, but
>
>> it actually returns without error.
>>
>> And memory consuption doesn't seem to be a problem either, as mostly 'top'
>
>> reveals that the invoked webservice script uses a steady amount of memory.
>>
>> thx,
>> Kenny
>>
>> On Thu, 30 Oct 2008, Edward Kawas wrote:
>>
>>> So this one might take some time... from what I can see, our parser goes
>>> into an infinite loop (although, I am not certain about that ...).
>>>
>>> I will play around with this XML file and get back to you.
>>>
>>> Thanks,
>>>
>>> Eddie
>>>
>>> -----Original Message-----
>>> From: moby-dev-bounces at lists.open-bio.org
>>> [mailto:moby-dev-bounces at lists.open-bio.org] On Behalf Of Kenny Billiau
>>> Sent: October-30-08 8:03 AM
>>> To: Core developer announcements
>>> Subject: Re: [MOBY-dev] XML parsing takes forever
>>>
>>> Hi,
>>>
>>> here you go,
>>>
>>> Kenny
>>>
>>> On Thu, 30 Oct 2008, Edward Kawas wrote:
>>>
>>>> Hi,
>>>>
>>>> This is the right list! Could you send me the XML for your second
>> service?
>>>> Thanks,
>>>>
>>>> Eddie
>>> _______________________________________________
>>> MOBY-dev mailing list
>>> MOBY-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/moby-dev
>>>
>>
>
>

-- 


More information about the MOBY-dev mailing list