[Biojava-dev] Forthcoming change in the EMBL database

Carola Kanz ckanz at ebi.ac.uk
Wed Apr 26 14:07:25 UTC 2006


Dear colleagues,

We would like to announce the following important change in the EMBL 
database in June this year.

At the time of release 87 (available from JUN-2006) the format of the 
EMBL flat file will undergo a change: the ID line will have a different 
structure (see below) and the SV line will be removed.

The changes affecting the ID line structure are:

     * All tokens will be separated by a semicolon.
     * The entry name will not be displayed, in its place there will be  
       the primary accession number.
     * The sequence version will be indicated.
     * The topology will be a separate token and will be indicated for 
       both circular and linear molecules.
     * Both the data class and the taxonomic divisions will be displayed.

This is an example of the new ID line:

ID   CD789012; SV 4; linear; genomic DNA; HTG; MAM; 500 BP.
        (1)     (2)     (3)      (4)       (5)  (6)   (7)


The tokens represent:

    1. Primary accession number.
    2. 'SV' + sequence version number.
    3. Topology: 'circular' or 'linear'.
    4. Molecule type.
    5. Data class (ANN, CON, PAT, EST, GSS, HTC, HTG, MGA, WGS, TPA, 
       STS, STD, "normal" entries will have STD for standard).
    6. Taxonomic division (HUM, MUS, ROD, PRO, MAM, VRT, FUN, PLN, ENV, 
       INV, SYN, UNC, VRL, PHG)."
    7. Sequence length + 'BP.'.

The entry name will not be displayed any more in the ID line. Since EMBL 
release 3 (Dec 1983) the stable identifier of an entry has been the 
primary accession number.

A mapping file (entryname to accession number) will be provided with the
next release for those entries where the entryname doesn't coincide with 
the accession number.

To give users a test dataset, one file with new-style ID lines called 
new_id_line.test.gz was provided together with the March release of the 
EMBL database: 
ftp://ftp.ebi.ac.uk/pub/databases/embl/release/new_id_line.test.gz 

Feedback from users is sought; please use the "Contact us" link at the 
bottom of the EBI home page and specify "EMBL" in the feedback form.

Note: this information was first made available on our
"Forthcoming changes" page ( 
http://www.ebi.ac.uk/embl/Documentation/forthcomingchanges.html#0606 ) 
and in the EMBL database release notes.

Regards,
Carola Kanz
EMBL database








More information about the biojava-dev mailing list