[BioPython] Entrez.efetch

Peter biopython at maubp.freeserve.co.uk
Wed Oct 8 14:02:54 UTC 2008


On Wed, Oct 8, 2008 at 2:48 PM, Stephan <stephan80 at mac.com> wrote:
>
> Hi guys,
>
> OK, there is two different problems here that Brad and Peter independently
> pointed out to me. Peter, you are right that not closing the file actually
> caused the error. Your hint fixes that, thanks.

Great.

> But that doesnt fix that there is a part of line 3 missing over the download,
> and although I actually updated to the newest cvs-version of biopython as
> Brad suggested (sorry for accidently putting my answer not on the mailing-list)
> that does not fix that line...

This is the issue where you get different GenBank files using
Bio.Entrez.efetch and a "manual download"?  First of all what did you
mean by "manual download" - for example FTP (what URL), or from a
browser?  Secondly, does this difference to the ACCESSION line (line
3) actually have any ill effects?

To be clear using Bio.Entrez.efetch as in your script, I get this:

LOCUS       NC_004353            1351857 bp    DNA     linear   INV 14-MAY-2008
DEFINITION  Drosophila melanogaster chromosome 4, complete sequence.
ACCESSION   NC_004353
VERSION     NC_004353.3  GI:116010290
PROJECT     GenomeProject:164
KEYWORDS    .
SOURCE      Drosophila melanogaster (fruit fly)
  ORGANISM  Drosophila melanogaster
...

Using FTP from ftp://ftp.ncbi.nih.gov/genomes/Drosophila_melanogaster/CHR_4/NC_004353.gbk
I get something similar but different:

LOCUS       NC_004353            1351857 bp    DNA     linear   INV 14-MAY-2008
DEFINITION  Drosophila melanogaster chromosome 4, complete sequence.
ACCESSION   NC_004353
VERSION     NC_004353.3  GI:116010290
KEYWORDS    .
SOURCE      Drosophila melanogaster (fruit fly)
  ORGANISM  Drosophila melanogaster
...

Notice the FTP file lacks the PROJECT line, and also differs slightly
in its feature table.

Using the NCBI website I suspect you can get other slight variations
(like the different ACCESSION line you reported).

Peter



More information about the Biopython mailing list