BLAST,   X vs. U, and EMBOSS
    David Mathog 
    mathog at mendel.bio.caltech.edu
       
    Tue Jun  4 18:34:25 UTC 2002
    
    
  
> In nr there is an entry with gi= 2018236 which ends with:
Sorry, I dropped a character in the cut and paste, it's: 12018236
Tao Tao <tao at ncbi.nlm.nih.gov> points out that U is Iupac 
for selenocysteine, see:
http://www.chem.qmul.ac.uk/iupac/AminoAcid/A2021.html#AA212
This gets very confusing because entrez returns Genbank format
with U->X, but fasta (and ASN.1) with U as U.
Which protein alphabet is EMBOSS supposed to recognize
for protein?
And all that aside, X vs. X or X vs. U in blastp really does
introduce two  unnecessary gaps in the alignment, which
can be easily demonstrated with bl2seq on gi 14250938 vs. itself.
Regards,
David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
    
    
More information about the EMBOSS
mailing list