[Biojava-dev] Forthcoming change in the EMBL database
mark.schreiber at novartis.com
mark.schreiber at novartis.com
Tue May 23 02:18:58 UTC 2006
Hi Richard -
Can you be in charge of future proofing the biojavax embl format object to
cope with this?
Thanks.
- Mark
Carola Kanz <ckanz at ebi.ac.uk>
Sent by: biojava-dev-bounces at lists.open-bio.org
04/26/2006 11:00 PM
To: biojava-dev at biojava.org
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-dev] Forthcoming change in the EMBL database
Dear colleagues,
We would like to announce the following important change in the EMBL
database in June this year.
At the time of release 87 (available from JUN-2006) the format of the
EMBL flat file will undergo a change: the ID line will have a different
structure (see below) and the SV line will be removed.
The changes affecting the ID line structure are:
* All tokens will be separated by a semicolon.
* The entry name will not be displayed, in its place there will be
the primary accession number.
* The sequence version will be indicated.
* The topology will be a separate token and will be indicated for
both circular and linear molecules.
* Both the data class and the taxonomic divisions will be displayed.
This is an example of the new ID line:
ID CD789012; SV 4; linear; genomic DNA; HTG; MAM; 500 BP.
(1) (2) (3) (4) (5) (6) (7)
The tokens represent:
1. Primary accession number.
2. 'SV' + sequence version number.
3. Topology: 'circular' or 'linear'.
4. Molecule type.
5. Data class (ANN, CON, PAT, EST, GSS, HTC, HTG, MGA, WGS, TPA,
STS, STD, "normal" entries will have STD for standard).
6. Taxonomic division (HUM, MUS, ROD, PRO, MAM, VRT, FUN, PLN, ENV,
INV, SYN, UNC, VRL, PHG)."
7. Sequence length + 'BP.'.
The entry name will not be displayed any more in the ID line. Since EMBL
release 3 (Dec 1983) the stable identifier of an entry has been the
primary accession number.
A mapping file (entryname to accession number) will be provided with the
next release for those entries where the entryname doesn't coincide with
the accession number.
To give users a test dataset, one file with new-style ID lines called
new_id_line.test.gz was provided together with the March release of the
EMBL database:
ftp://ftp.ebi.ac.uk/pub/databases/embl/release/new_id_line.test.gz
Feedback from users is sought; please use the "Contact us" link at the
bottom of the EBI home page and specify "EMBL" in the feedback form.
Note: this information was first made available on our
"Forthcoming changes" page (
http://www.ebi.ac.uk/embl/Documentation/forthcomingchanges.html#0606 )
and in the EMBL database release notes.
Regards,
Carola Kanz
EMBL database
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev
More information about the biojava-dev
mailing list