[Bioperl-l] parsing tblastn using BPlite

Jason Eric Stajich jason@cgt.mc.duke.edu
Sun, 28 Oct 2001 12:07:31 -0500 (EST)


On Sat, 27 Oct 2001, Chervitz, Steve wrote:

>
> Jason Eric Stajich wrote:
>
> > Steve C wrote:

> In Aaron's original Bio::Search:: system there is such decoupling, in that
> Bio::Search::Processor::ProcessorI specifies the parser and
> Bio::Search::Result::ResultI specifies the result objects. But I think the
> name "Processor" is too ambiguous and I like "SearchIO" better, since it
> fits with the Bioperl convention.
>

> What I'd like to try to do is to merge the IO stuff I did in
> Bio::Search::Processor.pm into your Bio::SearchIO.pm. The only significant
> issue that I noticed is that your SearchIO constructor is going to need
> -algorithm parameter, as in:
>
>   my $searchio = new Bio::SearchIO( -algorithm => 'blast', # could be
> 'fasta'
>                                     -format => 'xml',  # could default to
> 'plain'
>                                     ... );
>
This is good except that really each different XML DTD will be a different
"FORMAT" in my mind unless we have do remapping of the DTDs through XSLT.
So I was expecting each format to be for a certain report format
BLAST, BLASTXML, FASTA, HMMER,....

> There will also need to be directories within Bio::SearchIO corresponding to
> each algorithm, containing modules corresponding to the different formats.
>

Okay see the Bio::SearchIO::blastxml as a first implementation of how I
imagined going about this w/ event based parsing.
SearchResultEventBuilder builds Search:: objects from events thrown by the
parser.  If the format parser can throw events that can be mapped by the
SearchResultEventBuilder to objects.  I guess if there are events
that don't map into the general Report/Subject/HSP or attribute-of one of
these categories then we'll need a new Handler for those types of
SearchEvents?  Ewan has some thoughts here too I'm sure.

[snip]
>
> Yes, I'd like to stick with the Search::Result:: and Search::Hit::
> namespaces, if you don't mind. It might be good to have algorithm-specific
> subdirs in these to help collect modules pertaining to blast vs. fasta vs.
> whatever. Your Subject module can then peacefully co-exist with my BlastHit
> module.
>
Okay.  But am happy to try and aggregate things - I don't want to have
an Object jumble for no good reason.

> We need a good place to put HSP-like objects. I propose adding
> Bio::Search::Alignment::. We also need an interface to define these objects,
> which have both FeaturePair and SimpleAlign nature. It would be good to have
> an interface version of SimilarityPair and/or FeaturePair. I'm thinking of
> creating an interface called SimilarityFeatureI that would extend the
> proposed FeaturePair interface and add the ability to get a SimpleAlign
> object. (The alignment functionality isn't essential, but would be nice to
> have in such an object.)
>

That sounds interesting - at some point we want to have an interface for
SimpleAlign, I think we are close because it seems to have all the
expected methods.  We will want to mesh with the eventual idea that a
Bio::Assembly::AssemblyI interface will be built based on SimpleAlign so
that we can comply BioCORBA 0.3 spec.  Of course need more hands on deck
here to work on that implementation.

> So I'll take a stab at integrating my stuff with your SearchIO system. Do
> you mind if your modules get renamed and/or moved into different
> directories?
>
Don't mind at all.  I have some last changes I need to check in that allow
for flexible notion of multiple parameters and statistics - not sure where
we should start to branch off and handle things in an algorithm specific
manner and where we should try and have generaly flexible ways across
all algorithms of getting data from these objects.  For example, instead
of having a lambda method, having get_statistics('lambda') and a method
available_statistics() which returns the names of all the available stats.
Simliarily with parameters.

Also need some help structuring the Report object to handle multiple
iterations ala PSI-BLAST - Peter S's model in BPpsilite looks good and I
wanted to co-opt most of that to the new framework.  Do you have ideas
about that as well?

I imagine the current sweet spot of algorithms are
FASTA, SSEARCH
BLAST, PSI-BLAST, RPSBLAST
HMMER

Are there others you plan to support or that people would like?

Do we want to support writing of certain formats back out - HTML-ified
BLAST like your Tools::Blast had done?  XML?  I'm not sure it is really
useful to think of reading in FASTA and writing out BLAST but I guess it
could be done...

> I also made some changes in Bio/Root that I'll need to check in. The changes
> allow for better exception handling via Graham Barr's Error.pm. The
> integration is pretty transparent, so that if you don't have Error.pm or
> don't want to use it, you don't have to. I can use some feedback on this
> stuff as well.
>
Get it checked in there and we can start testing/commenting away.

> Steve
>
>

-jason