[Biojava-l] Re: Biojava-l digest, Vol 1 #334 - 2 msgs
Keith James
kdj@sanger.ac.uk
12 Jun 2001 11:15:19 +0100
>>>>> "Sarath" == Sarath <sarath@decodon.com> writes:
Sarath> hi there I do think its an occasional bug with the genbank
Sarath> files i have come across it quite a number of times and i
Sarath> even mailed the urls where i found the recent sequences of
Sarath> Staphylococcus aureus(both strains N315 and Mu50)
Sarath> completed sequencing on june 1 in the genebank format are
Sarath> making the same fuss with absence of GI field.You can
Sarath> check the files with the names BA000017.gbk and
Sarath> BA000018.gbk by browsing to the appropriate strain at
Sarath> ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Bacteria/
The README file on this ftp site indicates that these files are the
original submission files from the author(s). However, this doesn't
always seems to be the case.
In cases where these are the originals I would not always expect them
conform fully to Genbank format. If they undergo a similar process to
our EMBL submissions then certain details are added by the curators
after they recieve the file (e.g. versioning)
I suggest that the Staph file is a pre-submission original because of
the apparent y2k date problem on the originator's machine ;)
LOCUS BA000018 2813641 bp DNA circular BCT 21-APR-1901
DEFINITION Staphylococcus aureus N315, complete genome.
^^^^^^^^^^^
I would guess that these files deviate from the strict definition of
Genbank format because they have not been fully processed.
Keith
--
-= Keith James - kdj@sanger.ac.uk - http://www.sanger.ac.uk/Users/kdj =-
The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA