[BioPython] Entrez.efetch

Peter biopython at maubp.freeserve.co.uk
Wed Oct 8 13:37:24 UTC 2008


On Wed, Oct 8, 2008 at 12:33 PM, Stephan <stephan80 at mac.com> wrote:
> Hi,
>
> I am using biopython for a week or so. The package is amazing, I wonder how I possibly ignored this for so long now.
> Since I am not only new to biopython I am also new in this mailing list, so forgive me if this is not the right forum for a question like this.
>
> Anyway, here is a weird little problem with the Bio.Entrez.efetch tool:
> (I use python 2.5 and the latest Biopython 1.48)
> I want to run the following little test-code, using etetch to get chromosome 4 of Drosophila melanogaster as a genbank-file:
>
> ---------------------------CODE------------------------------------
> from Bio import Entrez, SeqIO
>
> print Entrez.read(Entrez.esummary(db="genome", id="56"))[0]["Title"]
> handle = Entrez.efetch(db="genome", id="56", rettype="genbank")
> print "downloading to SeqRecord..."
> record = SeqIO.read(handle, "genbank")
> print "...done"

I assume this is just test code - as it would be silly to download the
GenBank file twice in a real script.

> handle = Entrez.efetch(db="genome", id="56", rettype="genbank")
> filehandle = open("NCBI_DroMel", "w")
> print "downloading to file..."
> filehandle.write(handle.read())

You should now close the file, which should ensure it is fully written to disk:
filehandle.close()

> print "...done"
>
> handle = open("NCBI_DroMel")
> print "reading from file..."
> record = SeqIO.read(handle, "genbank")
> ---------------------------END-CODE------------------------------------
>
> In the last line we have a crash,
>  ...
> ValueError: Premature end of file in sequence data

This is because you started reading in the file without finishing
writing to it - the parser could only read in part of the data, and is
complaining about it ending prematurely.

Peter



More information about the Biopython mailing list