[BioPython] Sequence Annotation: sequence numbering
Iddo Friedberg
idoerg@cc.huji.ac.il
Tue, 26 Jun 2001 12:21:47 +0300 (GMT+0300)
Hi all,
I would like to start a discussion about the annotation of protein
sequence numbering in Biopython. You are probably all aware of the fact
that, given a protein sequence, each position in that sequence can be
overloaded with several ordinal numbering schemes. This usually arises
when doing cross-database work. For example, I extracted a SwissProt
sequence, now I would like to look it up in PDB. So positions along the SP
sequence will now have two sets of numberings: the SP one, and the PDB
one. Furthermore, going from PDB to the FSSP, we receive yet a 3rd
numbering system.
Example:
FSSP: 1 2 3 4 5 6
Sequence: A G C V S L F
PDB: 23 24 25 26 27 27A
Sequence: A G C V S L F
SP: 1 2 3 4 5 6 7 8
Sequence: A G C V T S L F
Note the omission of A1 and T5 from the structural data. Additionally,
note the insertion code ('27A') in F8. These are quite typical phenomena.
I have addressed proteins hrere, but I believe that there is a similar
need for nucleic acid sequences, the most immediate example being the gDNA
<--> cDNA transition.
Would anybody care to comment in terms of:
o Need?
o Suggested implementation?
o Anything else?
Iddo
--
Iddo Friedberg | Tel: +972-2-6758647
Dept. of Molecular Genetics and Biotechnology | Fax: +972-2-6757308
The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il
POB 12272, Jerusalem 91120 |
Israel |
http://bioinfo.md.huji.ac.il/marg/people-home/iddo/