[Biopython] how to validate fasta format
Peter
biopython at maubp.freeserve.co.uk
Tue Oct 27 09:20:58 EDT 2009
On Tue, Oct 27, 2009 at 1:14 PM, Steve Darnell <darnells at dnastar.com> wrote:
>
> Greetings,
>
> This particular thread addresses a topic we've revisited lately,
> ambiguity codes (particularly in the amino acid alphabet). I would like
> to query the group for their opinion of the remaining 6 characters after
> you remove the 20 standard amino acids. Here's our list:
>
> B - Asn or Asp
> J - Ile or Leu
> O - ???
> U - seleno-Cys
> X - Any
> Z - Gln or Glu
Your list is incomplete. According to the Biopython
ExtendedIUPACProtein alphabet docstring, which is based on the IUPAC
standards or recommendations:
B = "Asx"; Aspartic acid (R) or Asparagine (N)
X = "Xxx"; Unknown or 'other' amino acid
Z = "Glx"; Glutamic acid (E) or Glutamine (Q)
J = "Xle"; Leucine (L) or Isoleucine (I), used in mass-spec (NMR)
U = "Sec"; Selenocysteine
O = "Pyl"; Pyrrolysine
In practice, X is also often used to mean any amino acid or a stop
codon too (although this really would benefit from a more explicit
character in my personal opinion).
Peter
More information about the Biopython
mailing list