[Biopython-dev] [Bug 1956] New: SwissProt release 49 - Support for new DT lines

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Wed Feb 15 08:01:10 EST 2006


           Summary: SwissProt release 49 - Support for new DT lines
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk

See also bug 1948 (which I am marking fixed) where the parser would fail on the
new files.  I am checking in a fix to recognise the new DT lines but ignore

This bug is to do something useful with the new format DT lines.


Changes concerning dates and versions numbers (DT lines)

We changed from showing only the dates corresponding to full UniProtKB releases
in the DT lines to displaying the date of the biweekly release at which an
entry is integrated or updated. We dropped the information concerning the
release number and introduced entry and sequence version numbers in the DT

The new format of the three DT lines is:

DT   DD-MMM-YYYY, integrated into UniProtKB/database_name.
DT   DD-MMM-YYYY, sequence version version_number.
DT   DD-MMM-YYYY, entry version version_number.

Example for UniProtKB/Swiss-Prot:

DT   01-JAN-1998, integrated into UniProtKB/Swiss-Prot.
DT   15-OCT-2001, sequence version 3.
DT   01-APR-2004, entry version 14.

Example for UniProtKB/TrEMBL:

DT   01-FEB-1999, integrated into UniProtKB/TrEMBL.
DT   15-OCT-2000, sequence version 2.
DT   15-DEC-2004, entry version 5.

The sequence version number of an entry is incremented by one when its amino
acid sequence is modified. The entry version number is incremented by one
whenever any data in the flat file representation of the entry is modified.

We retrofitted the entry and sequence version numbers, as well as all dates,
using archived UniProtKB releases.
End quote.

We should expose the three new bits of information:

database_name, e.g. "UniProtKB/Swiss-Prot" or maybe just "Swiss-Prot"
sequence_version, e.g. 3
entry_version, e.g. 14

Also the precise meaning of the three dates has changed...

Finally as the "release number" is no longer included, perhaps that record
property should be depreciated.

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

More information about the Biopython-dev mailing list