[Biopython-dev] [Bug 3156] New: UniProt XML and SwissProt parsers silently fail to parse all of database references

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Thu Nov 11 18:09:04 EST 2010


http://bugzilla.open-bio.org/show_bug.cgi?id=3156

           Summary: UniProt XML and SwissProt parsers silently fail to parse
                    all of database references
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: rjalves at igc.gulbenkian.pt


Example code:

from Bio import SeqIO, ExPASy
entry = SeqIO.read(ExPASy.get_sprot_raw('P31946'), 'swiss')

If you then inspect entry.dbxrefs, you can see that it includes:

['Ensembl:ENST00000353703', 'Ensembl:ENST00000372839']

but not
['Ensembl:ENSP00000300161', 'Ensembl:ENSG00000166913'.
'Ensembl:ENSP00000361930', 'Ensembl:ENSG00000166913']

which are present in the original file as:
DR   Ensembl; ENST00000353703; ENSP00000300161; ENSG00000166913.
DR   Ensembl; ENST00000372839; ENSP00000361930; ENSG00000166913.


The same happens with the XML format and the new uniprot-xml parser where the
original file contains:

<dbReference type="Ensembl" id="ENST00000353703" key="75">
<property type="protein sequence ID" value="ENSP00000300161" />
<property type="gene ID" value="ENSG00000166913" />
</dbReference>


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list