[Biojava-dev] Suggestion for Canonical Symbols

Schreiber, Mark mark.schreiber@agresearch.co.nz
Mon, 9 Dec 2002 13:08:44 +1300


Would it be useful to use the Integer/ SubInteger model where protein
alphabet is a sub alphabet of protein-term?

> -----Original Message-----
> From: Thomas Down [mailto:td2@sanger.ac.uk] 
> Sent: Monday, 9 December 2002 1:00 p.m.
> To: Schreiber, Mark
> Cc: biojava-dev@biojava.org
> Subject: Re: [Biojava-dev] Suggestion for Canonical Symbols
> 
> 
> On Mon, Dec 09, 2002 at 11:59:01AM +1300, Schreiber, Mark wrote:
> > Hi -
> > 
> > If you translate and RNA SymbolList into Protein the Symbols in the 
> > protein SymbolList come from the alphabet referenced by the 
> > ProteinTools.getTAlphabet.
> > 
> > The Symbols from the Talphabet are not canonical with the 
> Symbols from 
> > the other protein Alphabet. This has lead to some very 
> surprising bugs 
> > in some stuff we were developing. Given that Integer 
> Symbols are now 
> > canonical even if they come from IntegerAlphabet or one of the 
> > Integer.SubAlphabets could the same happen for the protein 
> Alphabets?
> 
> *sigh*
> 
> That was actually the original behaviour.  I broke it
> (deliberately) a few weeks ago when fixing the knotty 
> question of serializing ambiguous symbols, so now you know
> who to blame.  At the time, requiring that all well-known 
> symbols should be scoped by Alphabet provided a sane way of 
> cleaning up the serialization code without having to write 
> totally new Symbol and Alphabet implementations for all the 
> well-known cases.  At least in the Protein/protein-term case 
> is probably does make sense to fix this.  I shall ponder -- 
> all suggestions welcome.
> 
> The division between protein and protein-term is really
> rather articificial.  As far as I can tell, the termination 
> symbol is a bit like the gap symbol, in that it never occurs 
> in "biologically real" sequences, but is a useful convenience 
> for computation.  Maybe we'll be able to build on that idea 
> for BJ2 and get rid of the annoying distinction.
> 
>      Thomas.
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================