[Bioperl-l] SearchIO speed up

Chris Fields cjfields at uiuc.edu
Fri Aug 11 16:33:52 UTC 2006


Sendu,

If we go the route of flexibility (so one could use full-blown objects,
hashes, lazy parsing, etc.), maybe we should initially have custom Result*,
Hit*, HSP* Bio::Search objects returned via the Handler initially.  This
would allow you to commit everything and get people testing it on various
OS's.  You could also develop a custom handler but that isn't absolutely
necessary (see below).

The various Handlers apparently are set up for allowing one to create a
custom Factory for each Search object type (such as BLAST*).  These are
added to the Handler upon instantiation or by using register_factory().  The
modified Handler can then be added using SearchIO's attach_EventHandler().
So I guess one could do something like this:

use Bio::SearchIO;
use Bio::Factory::ObjectFactory;
use Bio::SearchIO::SearchResultEventBuilder;

my $resfac = Bio::Factory::ObjectFactory->new(
            -type      => 'Bio::Search::Result::LazyResult',
            -interface => 'Bio::Search::Result::ResultI');

my $hitfac = Bio::Factory::ObjectFactory->new(
            -type      => 'Bio::Search::Hit::LazyHit',
            -interface => 'Bio::Search::Hit::HitI');

my $hspfac = Bio::Factory::ObjectFactory->new(
            -type      => 'Bio::Search::HSP::LazyHSP',
            -interface => 'Bio::Search::HSP::HSPI');

my $handler = Bio::SearchIO::SearchResultEventBuilder->new(
        -result_factory     => $resfac,
        -hit_factory        => $hitfac,
        -hsp_factory        => $hspfac);

my $parser = Bio::SearchIO->new(-file   => $file,
                                -format => 'lazyblast');

$parser->attach_EventHandler($handler);

# proceed with parsing...

Of course I haven't tried this out... ;>  

Would be nice to add a parameter that allows one to add a modified handler
upon SearchIO object instantiation.  Oh well...

Most users don't know nor use the various handlers or know about the Search
objects, which is a shame.  Maybe the HOWTO needs to be written more
explicitly?

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Thursday, August 10, 2006 5:29 PM
> To: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] SearchIO speed up
> 
> aaron.j.mackey at gsk.com wrote:
> >> As I understand your description, this is exactly what I do. My
> 'chunks'
> >> are the hashes that are normally used to create a new Hit/HSP object.
> >>
> >> The initial parse of the data file results in a small number of objects
> >> (Results) that contain all the data: HSP data nested in Hit data nested
> >> in the Result objects. When you actually want to do something with a
> >> certain hit or HSP it becomes an object, allowing you to call its
> >> methods like normal.
> >>
> >> Or are you suggesting something that would be even better than that? If
> >> so, please elucidate! :)
> >
> > So the only lazyness you invoke is the object instantiation (but you've
> > already done all the parsing).
> >
> > My proposal involves the "chunks" being unparsed, raw text "blobs", that
> > are essentially blessed into a package that does the parsing only when
> > necessary (and even then, might choose different parsing strategies,
> based
> > on what's been asked for).  Thus a potentially large amount of parsing
> and
> > storage is skipped.  Additionally, you now have the option of not even
> > storing the blobs in memory, just file seek pointers (requiring temp.
> > storage for streaming pipe data sources), and thus can process very
> large
> > reports without consuming memory (currently a problem).
> 
> Thanks, I might try out something along those lines. The problem I see
> is with piped input; I wouldn't want to require temp. storage because
> the user may deliberately be trying to gain speed by doing as little
> disc io as possible. Then you'd have to special-case it; pointers if we
> have a file on disc, stored-in-memory if piped. Maybe that special-case
> wouldn't be so bad.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list