[Biojava-l] Re: Biojava-l digest, Vol 1 #334 - 2 msgs
Sarath
sarath@decodon.com
Tue, 12 Jun 2001 13:37:23 +0200 (MEST)
Hi keith
I would have accepted your point very well if the files found at
ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Staphylococcus_aureus/
read the sequences as these are supposed to contain the actual gene bank
formatas per the README you have refered to but neverthless they failed
and the README at this url below
ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/
says even the so called better versions have the same sequences
"At present only Archaeoglobus fulgidus genomes is considered
"reviewed", other genomes are presented as found in the source GenBank
records "
sarath chandra
On 12 Jun 2001, Keith James wrote:
> >>>>> "Sarath" == Sarath <sarath@decodon.com> writes:
>
> Sarath> hi there I do think its an occasional bug with the genbank
> Sarath> files i have come across it quite a number of times and i
> Sarath> even mailed the urls where i found the recent sequences of
> Sarath> Staphylococcus aureus(both strains N315 and Mu50)
> Sarath> completed sequencing on june 1 in the genebank format are
> Sarath> making the same fuss with absence of GI field.You can
> Sarath> check the files with the names BA000017.gbk and
> Sarath> BA000018.gbk by browsing to the appropriate strain at
>
> Sarath> ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Bacteria/
>
> The README file on this ftp site indicates that these files are the
> original submission files from the author(s). However, this doesn't
> always seems to be the case.
>
> In cases where these are the originals I would not always expect them
> conform fully to Genbank format. If they undergo a similar process to
> our EMBL submissions then certain details are added by the curators
> after they recieve the file (e.g. versioning)
>
> I suggest that the Staph file is a pre-submission original because of
> the apparent y2k date problem on the originator's machine ;)
>
> LOCUS BA000018 2813641 bp DNA circular BCT 21-APR-1901
> DEFINITION Staphylococcus aureus N315, complete genome.
> ^^^^^^^^^^^
>
> I would guess that these files deviate from the strict definition of
> Genbank format because they have not been fully processed.
>
> Keith
>
> --
>
> -= Keith James - kdj@sanger.ac.uk - http://www.sanger.ac.uk/Users/kdj =-
> The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA
>