[Bioperl-l] GenericHSP and SimpleAlign

Jason Stajich jason@cgt.mc.duke.edu
Tue, 15 Jan 2002 12:44:18 -0500 (EST)


I guess so, here is the summary as I see it now.

The new method for the implementation of the Generic objst is best place
to see these described (i.e. GeneriHSP GenericHit and GenericResult).  I
currently don't use SteveC's GenericDatabase Object in my code but we may
work it in eventually.

Result:
	query_name
	query_description
	database_name
	database_letters
	database_entries
	algorithm
	algorithm_version

	parameters (hash of key/value pairs)
	statistics (has of key/value pairs)

Hit:
	name
	description
	accession
	length
	score
	significance
	algorithm

HSP:
values use to initialize an HSP:
	   -algorithm => algorithm used (BLASTP, TBLASTX, FASTX, etc)
           -evalue    => evalue
           -bits      => bit value for HSP
           -score     => score value for HSP (typically z-score but depends on
                                              analysis)
           -identical => # of residues that that matched identically
           -conserved => # of residues that matched conservatively
                           (only protein comparisions;
                            conserved == identical in nucleotide comparisons)
           -hsp_gaps   => # of gaps in the HSP
           -query_gaps => # of gaps in the query in the alignment
           -hit_gaps   => # of gaps in the subject in the alignment
           -query_start => HSP Query start (in original query sequence coords)
           -query_end   => HSP Query end (in original query sequence coords)
           -hit_start   => HSP Hit start (in original hit sequence coords)
           -hit_end     => HSP Hit end (in original hit sequence coords)
           -hit_length  => total length of the hit sequence
           -query_length=> total length of the query sequence
           -query_seq   => query sequence portion of the HSP
           -hit_seq     => hit sequence portion of the HSP
           -homology_seq=> homology sequence for the HSP
           -hit_frame   => hit frame (only if hit is translated protein)
           -query_frame => query frame (only if query is translated protein)

the data is accessed slightly differently because we calculate some things
like percent_identity and frac_conserved and frac_identitical.

See the implementation and documentation in GenericHSP for more details
http://docs.bioperl.org/bioperl-live (w/ frames - all modules)

(w/o frames - the GenericHSP module)
http://docs.bioperl.org/bioperl-live/Bio/Search/HSP/GenericHSP.html

Are you building something to do with analysis results in biosql?  I've
just built a db to store my analysis results after parsing out with
SearchIO and was thinking of building this into bioperl-db after we
release bioperl 1.0 - would be nice to coordinate a schema.

-jason
On Sun, 13 Jan 2002, Andrew Dalke wrote:

> Hey Jason,
>
>   Does this mean you have settled on variable names I can use?
>
>                     Andrew
>
>

-- 
Jason Stajich
Duke University
jason@cgt.mc.duke.edu