[BioPython] Entrez.efetch large files

Wed Oct 8 17:11:25 UTC 2008

Sorry to have an Entrez.efetch-issue again, but somehow there seems to be a problem with very large files.

So when I run the following code using the newest cvs-version of biopython:

------------------------------------CODE-----------------------------------
from Bio import Entrez, SeqIO

id = "57"
print Entrez.read(Entrez.esummary(db="genome", id=id))[0]["Title"]
handle = Entrez.efetch(db="genome", id=id, rettype="genbank")
print "downloading to SeqRecord..."
record = SeqIO.read(handle, "genbank")
print "...done"
------------------------------------END-CODE-----------------------------

it fails with the output:

------------------------------------OUTPUT-----------------------------
Drosophila melanogaster chromosome X, complete sequence
downloading to SeqRecord...
Traceback (most recent call last):
  File "efetch-test.py", line 7, in <module>
    record = SeqIO.read(handle, "genbank")
  File "/NetUsers/stschiff/lib/python/Bio/SeqIO/__init__.py", line 366, in read
    first = iterator.next()
  File "/NetUsers/stschiff/lib/python/Bio/GenBank/Scanner.py", line 410, in parse_records
    record = self.parse(handle)
  File "/NetUsers/stschiff/lib/python/Bio/GenBank/Scanner.py", line 393, in parse
    if self.feed(handle, consumer) :
  File "/NetUsers/stschiff/lib/python/Bio/GenBank/Scanner.py", line 370, in feed
    misc_lines, sequence_string = self.parse_footer()
  File "/NetUsers/stschiff/lib/python/Bio/GenBank/Scanner.py", line 723, in parse_footer
    raise ValueError("Premature end of file in sequence data")
ValueError: Premature end of file in sequence data
------------------------------------END-OUTPUT-----------------------------

If I change the id to "56" (chromosome 4, which is shorter) it works. But for all the other chromosomes (ids: 57 - 61) it fails.
If I download the genbank files manually from the ftp-server and then use SeqIO.read() it works, so the download-process corrupts the genbank files if they are very large (about 35 MB) I guess...

Any hints?

Best,
Stephan