[Biopython-dev] RNA alphabets; was Bio.PDB enhancements

Wed Jun 2 12:21:43 UTC 2010

Hi Peter,

I'm afraid the matter is more complicated. To date, we have 115 modified
RNA bases, which means in practice that you run out of nice ASCII
characters. Moreover, some people use one-letter symbols in RNA as
wildcards (R for purine, Y for pyrimidine). As a consequence, several sets
of abbreviations have been developed - see
http://modomics.genesilico.pl/modification_list to get an impression.

We've written for our own purposes a class containing different ways of
nomenclature, but I think its incompatible to Bio.Alphabet - but I'd like
to change that.

Best Regards,
   Kristian

> On Wed, Jun 2, 2010 at 9:17 AM, Kristian Rother <krother at rubor.de> wrote:
>>
>> A potential problem that I'd like to point out early is that we are
>> working with modified RNA nucleotides a lot (up to 20% of residues in
>> every tRNA). This would require extending the RNA Alphabet (which now
>> just
>> is "AGCU") - but I see this as remote from the Bio.XXXX.read() thread.
>>
> What letters are you missing? There is a commented out ExtendedIUPACRNA
> alphabet that may be relevant in Bio/Alphabets/IUPAC.py
>
> Peter
>
>