[Biojava-l] three-letter Protein alphabet names
Richard Holland
richard.holland at ebi.ac.uk
Tue Aug 1 08:19:36 UTC 2006
I'm not sure, but it should simply be a matter of defining an alphabet
where each symbol in the alphabet is a 3-letter combo. Then you can use
the alphabet to tokenize the input string appropriately.
Mark will know more about this than me. Mark - comments?
cheers,
Richard
On Tue, 2006-08-01 at 17:41 +1000, Neil Bacon wrote:
> Hi,
> I'm looking at extending biojava sequence io to read sequences from
> patents (initially current US data formats, later perhaps older formats
> and other jurisdictions).
> Anyone done this already or interested?
>
> Protein data uses 3-letter codes. I found an old posting about 3-letter
> codes:
>
> [Biojava-dev] Protein alphabet names
> http://lists.open-bio.org/pipermail/biojava-dev/2002-October/000143.html
>
> >/ - Add an additional tokenization (probably called
> />/ "three-letter"
> />/ unless someone comes up with a better
> />/ suggestion) for people
> />/ who actually want 3-letter codes.
> /
>
> Did this happen (I can't find it)?
> I'll try extending WordTokenization to do this unless someone has
> already done it or can advise me better (I'm new here and advice would
> be very welcome).
>
> Cheers,
> Neil Bacon
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
--
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416
More information about the Biojava-l
mailing list