[Biopython-dev] [Bug 2548] Updating IUPACData and ExtendedIUPACProtein for U and O

Mon Jul 21 07:18:12 EDT 2008

http://bugzilla.open-bio.org/show_bug.cgi?id=2548

------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-07-21 07:18 EST -------
Regarding Martin's example (erroneously added to Bugzilla as Bug 2547 comment
1), the protein GI:2983532

Martin wrote "GenBank format requires official IUPAC amino acid code that
doesn't include Selenocystein and therefore it uses 'X'."

That is out of date - IUPAC and GenBank both accept U for selenocysteine now
(see my notes in comment 4 of this bug).

Looking at these files:
ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Aquifex_aeolicus/AE000657.faa
ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Aquifex_aeolicus/AE000657.gbk
(feature translation)

They both give the same amino acid sequence for GI:2983532, which includes "U"
but not "X" as I had expected.

>gi|2983532|gb|AAC07107.1| formate dehydrogenase alpha subunit [Aquifex aeolicus VF5]
MNYMDISRRGFLKLSVGSVGAGILGGLGFDLTPAYARVRDLKITKAKVTKSICPYCSVSCGILAYSLSDG
AMNVKERIIHVEGNPDDPINRGTLCPKGATLRDFVNAPDRLTKPLYRPAGSTEWKEISWDEAIEKFARWV
KDTRDRTFIHKDKAGRVVNRCDSIVWAVGSPLGNEEGWLMVKIGIALGLSARETQATIUHAPTVASLAPT
FGRGAMTNNWVDISNSDLVFVMGGNPAENHPCGFKWAIKAREKRGAKIICIDPRFNRTAAVADIFVQIRP
GTDIAFLGGLINYVLQNEKYQKEYVRLHTTGPFIVREDFGFKDGLFTGYDPKTRSYDTTTWDYEFDPATG
YPKMDPEMKHPRCVLNILKEHYSRYTPEVVSQICGCSKEDFLRVAEEVAKCGAPNKFMTILYALGWTHHS
YGTQLIRTACMLQLLLGNIGCPGGGINALRGHSNVQGMTDLAGQNKNLPTYIKPPKPEEQTLAQHLKNRT
PRKLHPTSLNYWANYPKFFISFLKCMWGDAATPENDFAYDYLYKPEGGYNSWDKFIDDMYKGKIEGVVTA
ALNFLNNTPNAKKTVRALKNLKWMVVMDPFMIETAQFWKAEGLDPKEVKTEILVLPTAVFLEKEGSFTNS
ARWVKWKYKATDPPGDAKDEFWIFGRFFMKLKEFYEKEGGAFPEPILNLVWPYKNPYYPTAEEILTEING
YYTRDVDGHKKGERVRLFTDLRDDGSTACGGWLYCGVFPPEGNLAKRTDLSDPLGLGTYPNYAWNWPANR
RVLYNRASCDEKGRPWDPERPLLRWDPERDMWVGDIPDYPATAPPEKGIGAFIMLPEGKGRLFAAKSYVT
FKDGPLPEHYEPYESPVTNILHPNVPHNPVAKVYKSDLDLLGTPDKFPHVATTYRLTEHYHFWTKHLYGP
SLLAPVMFIEIPEELAKEKGIQNGDLVRVSTARASIEAIALVTKRIKPLKVAGKTVYTIGIPIHWGFEGL
VKGAITNFITPNVWDPNSRTPEFKGFLANIEKVKT

It is quite possible that during the transition from X to U for selenocysteine
there were inconsistencies in GenBank - but I hope/expect the NCBI have fixed
them all by now.

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.