[Biopython-dev] [Bug 3156] New: UniProt XML and SwissProt parsers silently fail to parse all of database references
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Thu Nov 11 23:09:04 UTC 2010
http://bugzilla.open-bio.org/show_bug.cgi?id=3156
Summary: UniProt XML and SwissProt parsers silently fail to parse
all of database references
Product: Biopython
Version: Not Applicable
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: rjalves at igc.gulbenkian.pt
Example code:
from Bio import SeqIO, ExPASy
entry = SeqIO.read(ExPASy.get_sprot_raw('P31946'), 'swiss')
If you then inspect entry.dbxrefs, you can see that it includes:
['Ensembl:ENST00000353703', 'Ensembl:ENST00000372839']
but not
['Ensembl:ENSP00000300161', 'Ensembl:ENSG00000166913'.
'Ensembl:ENSP00000361930', 'Ensembl:ENSG00000166913']
which are present in the original file as:
DR Ensembl; ENST00000353703; ENSP00000300161; ENSG00000166913.
DR Ensembl; ENST00000372839; ENSP00000361930; ENSG00000166913.
The same happens with the XML format and the new uniprot-xml parser where the
original file contains:
<dbReference type="Ensembl" id="ENST00000353703" key="75">
<property type="protein sequence ID" value="ENSP00000300161" />
<property type="gene ID" value="ENSG00000166913" />
</dbReference>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list