[Biopython-dev] [Bug 1948] uniprot release 49/SProt.Record Parser
Problem
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Mon Feb 13 08:59:53 EST 2006
http://bugzilla.open-bio.org/show_bug.cgi?id=1948
------- Comment #4 from gould at embl.de 2006-02-13 08:59 -------
(In reply to comment #3)
> I'm unclear what you meant in comment 2 Kate.
>
> Your original bug report had the following:
>
> SyntaxError: Line does not start with 'ID':
> <HTML LANG="EN">
>
> This suggests that instead of getting a plain text SProt file
> (which should start 'ID'), you got an HTML file.
>
> Onre reason for this MIGHT be a temporary problem with the ExPASy
> website - returning an error message in HTML.
>
> If you still get the <HTML LANG="EN"> error message, could you
> attach the raw HTML to this bug (you could use "print results"
> at the Python prompt).
>
> If the HTML problem has gone away on its own (which wouldn't
> surprise me if it was a temporary problem with the server) do you
> see the problem I talked about in comment 1 of the bug?
>
> I have tried this on both Linux and Windows now, both show the
> problem described in comment 1 where the 'DT' lines do not match
> what BioPython is expecting.
>
> Quoting your original bug report:
> > I see from the release notes that some changes were made to the
> > annotation format and suspect this is why the biopython scripts
> > are no longer happy?
>
> Yes - this does explain the 'DT' line problem, BioPython will need
> to be updated to cope with the new format DT lines:
>
> http://ca.expasy.org/sprot/relnotes/sp_news.html#rel7.0
>
> Quoting:
>
> Changes concerning dates and versions numbers (DT lines)
>
> We changed from showing only the dates corresponding to full UniProtKB releases
> in the DT lines to displaying the date of the biweekly release at which an
> entry is integrated or updated. We dropped the information concerning the
> release number and introduced entry and sequence version numbers in the DT
> lines.
>
> The new format of the three DT lines is:
>
> DT DD-MMM-YYYY, integrated into UniProtKB/database_name.
> DT DD-MMM-YYYY, sequence version version_number.
> DT DD-MMM-YYYY, entry version version_number.
>
> Example for UniProtKB/Swiss-Prot:
>
> DT 01-JAN-1998, integrated into UniProtKB/Swiss-Prot.
> DT 15-OCT-2001, sequence version 3.
> DT 01-APR-2004, entry version 14.
>
> Example for UniProtKB/TrEMBL:
>
> DT 01-FEB-1999, integrated into UniProtKB/TrEMBL.
> DT 15-OCT-2000, sequence version 2.
> DT 15-DEC-2004, entry version 5.
>
> The sequence version number of an entry is incremented by one when its amino
> acid sequence is modified. The entry version number is incremented by one
> whenever any data in the flat file representation of the entry is modified.
>
> We retrofitted the entry and sequence version numbers, as well as all dates,
> using archived UniProtKB releases.
>
Yes, I understand what you are saying now....I'm no longer getting the HTML
file but a plain text SProt file which is not being parsed correctly
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list