[Biojava-dev] [BioJava - Bug #3364] (Resolved) JRONN protein disorder fails to run on protein sequences with non-standard characters

Thu Aug 9 21:41:16 UTC 2012

Issue #3364 has been updated by Andreas Prlic.

Status changed from New to Resolved
Target version set to BioJava 3.0.5
% Done changed from 0 to 100

fixed in SVN
----------------------------------------
Bug #3364: JRONN protein disorder fails to run on protein sequences with non-standard characters
https://redmine.open-bio.org/issues/3364

Author: Steven Darnell
Status: Resolved
Priority: Normal
Assignee: biojava-dev list
Category: Others
Target version: BioJava 3.0.5
URL: 

The JRONN protein disorder predictor fails to run on protein sequences with non-standard characters (e.g. ambiguity codes, non-standard residues). I understand that the model is limited to the standard 20 residues, but it is not desirable for JRONN to refuse to process a sequence because of the presence of a single 'X' (or other character). Imprecise predictions in an isolated region of the protein are much more preferable than no predictions throughout the entire protein.

A simple (and imperfect) solution would be to substitute 'BJOUXZ' with reasonable standard characters and leave the alignment mechanism unchanged. For example:

B (Asx) --> D (Asp)
J (Xle) --> L (Leu)
O (Pyl) --> K (Lys)
U (Sec) --> C (Cys)
X (Xaa) --> A (Ala)
Z (Glx) --> E (Glu)

Other solutions will exist. The main goal is to get JRONN to run on any protein sequence even if there is added uncertainty near non-standard residues and ambiguities.

-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org