[Biojava-dev] [BioJava - Bug #3364] JRONN protein disorder fails to run on protein sequences with non-standard characters

Wed Aug 8 16:22:54 UTC 2012

Issue #3364 has been updated by Steven Darnell.

Pyl (O), Monomethylamine methyltransferase mtmB1, UniProt O30642
Sec (U), Selenoprotein V, UniProt P59797

Is this sufficient?
----------------------------------------
Bug #3364: JRONN protein disorder fails to run on protein sequences with non-standard characters
https://redmine.open-bio.org/issues/3364

Author: Steven Darnell
Status: New
Priority: Normal
Assignee: biojava-dev list
Category: Others
Target version: 
URL: 

The JRONN protein disorder predictor fails to run on protein sequences with non-standard characters (e.g. ambiguity codes, non-standard residues). I understand that the model is limited to the standard 20 residues, but it is not desirable for JRONN to refuse to process a sequence because of the presence of a single 'X' (or other character). Imprecise predictions in an isolated region of the protein are much more preferable than no predictions throughout the entire protein.

A simple (and imperfect) solution would be to substitute 'BJOUXZ' with reasonable standard characters and leave the alignment mechanism unchanged. For example:

B (Asx) --> D (Asp)
J (Xle) --> L (Leu)
O (Pyl) --> K (Lys)
U (Sec) --> C (Cys)
X (Xaa) --> A (Ala)
Z (Glx) --> E (Glu)

Other solutions will exist. The main goal is to get JRONN to run on any protein sequence even if there is added uncertainty near non-standard residues and ambiguities.

-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org