[Bioperl-l] Questions on Representing Protein Ambiguity

Aaron J. Mackey amackey at pcbi.upenn.edu
Fri Oct 1 16:04:29 EDT 2004


On Sep 30, 2004, at 10:49 PM, James Thompson wrote:

> An alternative would be to borrow an idea from Perl's regex character 
> classes
> and represent multiple residues at a position inside of a set of 
> brackets, like
> this:
>
> M[ES]N[IAP]S

In general, you're always going to lose information moving from a 
profile to a flat pattern.  This option prevents losing all the 
information that flattening to "MENIS" would (although MENIS is a 
reasonable "consensus" in this case), but there's still information 
loss.  So in that sense it isn't really a better solution than "just 
take the most probable residue, unless it's less than some threshold, 
in which case X".

I think the whole idea of a consensus sequence from a profile is a bit 
worthless, to be honest.  What are you supposed to be able to do with 
the consensus, search with it?  That's what the profile is for in the 
first place ... [ speaking of which, I'd love to see 
Bio::Tools::dpAlign make use of these protein profiles ].

-Aaron

--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey at pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697



More information about the Bioperl-l mailing list