[Bioperl-l] parsing tblastn using BPlite

Chervitz, Steve Steve_Chervitz@affymetrix.com
Sat, 27 Oct 2001 23:58:03 -0700


Jason Eric Stajich wrote:

> Steve C wrote:
> 
> > A nice feature of the Bio::Search system is that there's a natural
framework
> > for segregating the different aspects common to all search-type
algorithms
> > (Processor, Result, Hit, and though it's not there yet, Run).
> >
> My thoughts exactly - I am just decoupling Search Parsing from Search
> Result objects - Bio::SearchIO is event based parsing with a
> Bio::SearchIO::SearchResultEventBuilder object handling these 
> events and building the correct Bio::Search objects.

In Aaron's original Bio::Search:: system there is such decoupling, in that
Bio::Search::Processor::ProcessorI specifies the parser and
Bio::Search::Result::ResultI specifies the result objects. But I think the
name "Processor" is too ambiguous and I like "SearchIO" better, since it
fits with the Bioperl convention.

What I'd like to try to do is to merge the IO stuff I did in
Bio::Search::Processor.pm into your Bio::SearchIO.pm. The only significant
issue that I noticed is that your SearchIO constructor is going to need
-algorithm parameter, as in:

  my $searchio = new Bio::SearchIO( -algorithm => 'blast', # could be
'fasta'
                                    -format => 'xml',  # could default to
'plain'
                                    ... );

There will also need to be directories within Bio::SearchIO corresponding to
each algorithm, containing modules corresponding to the different formats.

> > Perhaps we should both check in what we have so we can coordinate our
> > efforts better. (I could also use another pair of eye/hands; my progress
of
> > late has been rather choppy as I find myself spending most of my free
time
> > chasing after an energetic 9-month old. 8)
> > Steve
> 
> Sorry for not coordinating better with you - had a need - 
> decided to go
> ahead and write the code for it first.  Wasn't really sure 
> where you were
> and I assumed that more important life issues were dominating 
> your time.
> The object work you have done may be a more thorough treatment of the
> necessary pieces for result parsing, I've no objections to chucking my
> current objects if you have a better proposal for any of 
> these (including
> names) - I currently have:
> 
> Interfaces:
> Bio::Search::ReportI
> Bio::Search::SubjectI
> Bio::Search::HSPI
> 
> implementations
> Bio::Search::Report
> Bio::Search::Subject
> Bio::Search::HSP
> 
> Perhaps these should be better named and aggregated into the 
> classes that
> are there - oops.

Yes, I'd like to stick with the Search::Result:: and Search::Hit::
namespaces, if you don't mind. It might be good to have algorithm-specific
subdirs in these to help collect modules pertaining to blast vs. fasta vs.
whatever. Your Subject module can then peacefully co-exist with my BlastHit
module. 

We need a good place to put HSP-like objects. I propose adding
Bio::Search::Alignment::. We also need an interface to define these objects,
which have both FeaturePair and SimpleAlign nature. It would be good to have
an interface version of SimilarityPair and/or FeaturePair. I'm thinking of
creating an interface called SimilarityFeatureI that would extend the
proposed FeaturePair interface and add the ability to get a SimpleAlign
object. (The alignment functionality isn't essential, but would be nice to
have in such an object.)

So I'll take a stab at integrating my stuff with your SearchIO system. Do
you mind if your modules get renamed and/or moved into different
directories? 

I also made some changes in Bio/Root that I'll need to check in. The changes
allow for better exception handling via Graham Barr's Error.pm. The
integration is pretty transparent, so that if you don't have Error.pm or
don't want to use it, you don't have to. I can use some feedback on this
stuff as well.

Steve


> I guess the best thing is to either post your code or commit 
> it with the
> caveat that it is under development and may not work (hence the 'live'
> notion of the head of the dev tree).  Your call as to how to 
> best start to
> integrate our work, I'm more than happy to provide an 
> additional set of
> eyeballs to your code.
> 
> I have more things in the pipeline: Basic set of objects for 
> working with
> Maps and Markers, Phylogenetic Trees, and eventually 
> Populations if anyone
> has interest in working together on these feel free to post 
> to the list.
> 
> -jason
> -- 
> Jason Stajich
> Duke University
> jason@cgt.mc.duke.edu
>