[Biopython-dev] SwissProt parsing inconsistency between Bio.SeqIO, Bio.SwissProt

Michiel de Hoon mjldehoon at yahoo.com
Tue Apr 21 07:55:36 EDT 2009


> Have you got a link for the full record in your example?
> 
You can find it here:

http://www.uniprot.org/uniprot/Q9XHP0.txt

> For interaction with other Bio.SeqIO formats, I generally
> expect the description to be a single line string (with no
> embedded newlines).

> It looks like the SwissProt format has changed, and we
> should be parsing the new extended DE lines more
> carefully, and splitting these entries up and recording
> them in the SeqRecord.annotations dictionary?
> 
That sounds reasonable. The dictionary will have to be nested though. Something like this:

annotations["RecName"] = [{"Full": "11S globulin seed storage protein 2"}]
annotations["AltName"] = [{"Full": "11S globulin seed storage protein II"},
                          {"Full": "Alpha-globulin"}]
annotations["Contains"] = [{"RecName": {"Full": "11S globulin seed storage protein 2 acidic chain"}},
                            "AltName": {"Full": "Full=11S globulin seed storage protein II acidic chain"}},
                           {"RecName": {"Full": "11S globulin seed storage protein 2 basic chain"}},
                            "AltName": {"Full": "Full=11S globulin seed storage protein II basic chain"}},
                          ]
annotations["Flags"] = "Precursor"


--Michiel





      


More information about the Biopython-dev mailing list