[Bioperl-l] arbitrary hashes, blast, statistics, parameters, and java interoperability

Jason Stajich jason at cgt.duhs.duke.edu
Fri May 14 08:23:37 EDT 2004


++ aaron.

Totally agree and a stats object was something I've wanted but bascially
was too tired to get to so we settled for simple hash keys for the stats
as part of the result object.

Would be excellent for someone to write a Statistics object for the Search
Results.

and the UML would be welcomed although I shudder everytime I see the
multi-inheritance we've created:
HSP->SimilarityPair->FeaturePair->SeqFeatureI
          |----------SimilarityI----^


In the future I think we are going to have to break some of the
inheritance/perl OO tricks if we want SearchIO to achieve better parsing
speeds (its the object creation which kills us not the parsing AFAIK).


-jason
On Fri, 14 May 2004, Aaron J. Mackey wrote:

>
> I'd be happy to see the statistics and parameters turn into full
> objects, and could even imagine some useful functions that a
> Bio::Search::Statistics::BLAST object might provide:
>
> my $stats = $result->statistics;
>
> # use the report's database size to get a bit score threshold
> # that corresponds to a given expectation threshold:
> my $bitscore_threshold = $stats->E_to_bits(1e-6);
>
> # vice versa:
> my $expect_threshold = $stats->bits_to_E(32.0);
>
> # calculate a bitscore or expectation for a given comparison:
> my $bitscore = $stats->bitscore($rawscore, $querylen, $liblen);
> my $exp = $stats->expect($rawscore, $querylen, $liblen);
>
> # Make Warren Gish happy:
> my $nats = $stats->bits_to_nats($bitscore);
>
> I realize you (and 99.9% of the world) only care about BLAST statistics
> and parameters, but I really do think you should subclass these things
> so that we can plug in others when/if necessary.  I would think that
> all an interface should gaurantee are generic data access methods
> (get_param, set_param, etc).
>
> $stats->set_param( Lambda => 0.123 );
> $stats->set_param( K      => 0.002 );
>
> Specific subclasses might include direct parameter access:
>
>    $blaststats->lambda(0.123);
>    $blaststats->K(0.002);
>
> But we shouldn't try to agree on "universal" statistical parameters,
> because they really don't exist.
>
> In terms of run-time parameters, I would guess that a
> Bio::Tools::Run::ParameterI kinda thing would be appropriate; that way,
> you could build a runtime parameter object, pass it off to the
> runnable, and get a result object back that included the (possibly
> modified) parameter object.
>
> -Aaron
>
> On May 14, 2004, at 12:52 AM, Chad Matsalla wrote:
>
> >
> > Greetings all,
> >
> > I am writing a web service that provides Bio::Search::Result objects to
> > a Java client. Yes, this does work and yes, it is very kewl.
> >
> > I created UML models for all of the components required to produce a
> > Bio::Search::Result (Bio::Seq, Bio::HitI, etc) and used a code
> > generation system to create Java classes that match. Would you like me
> > to contribute this UML model (XMI format) to the project? I notice that
> > the UML for Bioperl is a bit... dated.
> >
> > Anyway...
> >
> > I tell a Java client to ask for a Bio::Search::Result from a SOAP::Lite
> > service. This works, until...
> >
> > The _statistics and _parameters attributes of a Bio::Search::Result
> > object are hashes.  Although Java has a corresponding Hashtable class,
> > it is not smart enough to deserialize a perl hash in an efficient,
> > hack-free manner.
> >
> > I propose creating a SearchStatistics module that would hold these
> > statistics and a SearchParameters object that would hold the
> > parameters.
> >
> > I understand that hashes are used when you need an arbitrary data
> > structure. At least in the case of Blast we know what the keys in a
> > statistics and parameters hashtable are going to be so why not have
> > objects?
> >
> > At this time, I really only care about Blast results. Does anybody see
> > why I should not change those two parameters to refer to objects rather
> > then hashes in the Blast parts of the SearchIO subsystem?
> >
> > In the case that I create, for example, a SearchStatistics object I
> > think that code based on the fact that _statistics is a hash would not
> > break because _statistics is still a hash- it is just an object hash.
> >
> > Can anybody suggest what package these modules should belong to?
> >
> > I'm very eager to do this so unless there are reasonable objections I
> > will do it this weekend. If it suddenly breaks tests or something I can
> > undo it.
> >
> > I have invested significant time in Java<->BioPerl interoperability
> > over
> > web services and if anybody is interested in my work just give me a
> > shout (ISMB/BOSC?).
> >
> > Thanks!
> >
> > Chad Matsalla
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> --
> Aaron J. Mackey, Ph.D.
> Dept. of Biology, Goddard 212
> University of Pennsylvania       email:  amackey at pcbi.upenn.edu
> 415 S. University Avenue         office: 215-898-1205
> Philadelphia, PA  19104-6017     fax:    215-746-6697
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list