[Biojava-dev] [BioJava - Bug #3364] JRONN protein disorder fails to run on protein sequences with non-standard characters

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Wed Aug 8 01:22:55 UTC 2012


Issue #3364 has been updated by Andreas Prlic.


Hi,

One of the problems is that the disorder module does currently not use the standard Sequence model provided by biojava3-core. I can probably fix this relatively easy. Do you have a uniprot ID as a test case?

Andreas
----------------------------------------
Bug #3364: JRONN protein disorder fails to run on protein sequences with non-standard characters
https://redmine.open-bio.org/issues/3364

Author: Steven Darnell
Status: New
Priority: Normal
Assignee: biojava-dev list
Category: Others
Target version: 
URL: 


The JRONN protein disorder predictor fails to run on protein sequences with non-standard characters (e.g. ambiguity codes, non-standard residues). I understand that the model is limited to the standard 20 residues, but it is not desirable for JRONN to refuse to process a sequence because of the presence of a single 'X' (or other character). Imprecise predictions in an isolated region of the protein are much more preferable than no predictions throughout the entire protein.

A simple (and imperfect) solution would be to substitute 'BJOUXZ' with reasonable standard characters and leave the alignment mechanism unchanged. For example:

B (Asx) --> D (Asp)
J (Xle) --> L (Leu)
O (Pyl) --> K (Lys)
U (Sec) --> C (Cys)
X (Xaa) --> A (Ala)
Z (Glx) --> E (Glu)

Other solutions will exist. The main goal is to get JRONN to run on any protein sequence even if there is added uncertainty near non-standard residues and ambiguities.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org




More information about the biojava-dev mailing list