[Biopython-dev] SwissProt parsing inconsistency between Bio.SeqIO, Bio.SwissProt
Michiel de Hoon
mjldehoon at yahoo.com
Tue Apr 21 07:55:36 EDT 2009
> Have you got a link for the full record in your example?
>
You can find it here:
http://www.uniprot.org/uniprot/Q9XHP0.txt
> For interaction with other Bio.SeqIO formats, I generally
> expect the description to be a single line string (with no
> embedded newlines).
> It looks like the SwissProt format has changed, and we
> should be parsing the new extended DE lines more
> carefully, and splitting these entries up and recording
> them in the SeqRecord.annotations dictionary?
>
That sounds reasonable. The dictionary will have to be nested though. Something like this:
annotations["RecName"] = [{"Full": "11S globulin seed storage protein 2"}]
annotations["AltName"] = [{"Full": "11S globulin seed storage protein II"},
{"Full": "Alpha-globulin"}]
annotations["Contains"] = [{"RecName": {"Full": "11S globulin seed storage protein 2 acidic chain"}},
"AltName": {"Full": "Full=11S globulin seed storage protein II acidic chain"}},
{"RecName": {"Full": "11S globulin seed storage protein 2 basic chain"}},
"AltName": {"Full": "Full=11S globulin seed storage protein II basic chain"}},
]
annotations["Flags"] = "Precursor"
--Michiel
More information about the Biopython-dev
mailing list