[BioPython] Entrez.efetch large files
Stephan
stephan80 at mac.com
Wed Oct 8 17:11:25 UTC 2008
Sorry to have an Entrez.efetch-issue again, but somehow there seems to be a problem with very large files.
So when I run the following code using the newest cvs-version of biopython:
------------------------------------CODE-----------------------------------
from Bio import Entrez, SeqIO
id = "57"
print Entrez.read(Entrez.esummary(db="genome", id=id))[0]["Title"]
handle = Entrez.efetch(db="genome", id=id, rettype="genbank")
print "downloading to SeqRecord..."
record = SeqIO.read(handle, "genbank")
print "...done"
------------------------------------END-CODE-----------------------------
it fails with the output:
------------------------------------OUTPUT-----------------------------
Drosophila melanogaster chromosome X, complete sequence
downloading to SeqRecord...
Traceback (most recent call last):
File "efetch-test.py", line 7, in <module>
record = SeqIO.read(handle, "genbank")
File "/NetUsers/stschiff/lib/python/Bio/SeqIO/__init__.py", line 366, in read
first = iterator.next()
File "/NetUsers/stschiff/lib/python/Bio/GenBank/Scanner.py", line 410, in parse_records
record = self.parse(handle)
File "/NetUsers/stschiff/lib/python/Bio/GenBank/Scanner.py", line 393, in parse
if self.feed(handle, consumer) :
File "/NetUsers/stschiff/lib/python/Bio/GenBank/Scanner.py", line 370, in feed
misc_lines, sequence_string = self.parse_footer()
File "/NetUsers/stschiff/lib/python/Bio/GenBank/Scanner.py", line 723, in parse_footer
raise ValueError("Premature end of file in sequence data")
ValueError: Premature end of file in sequence data
------------------------------------END-OUTPUT-----------------------------
If I change the id to "56" (chromosome 4, which is shorter) it works. But for all the other chromosomes (ids: 57 - 61) it fails.
If I download the genbank files manually from the ftp-server and then use SeqIO.read() it works, so the download-process corrupts the genbank files if they are very large (about 35 MB) I guess...
Any hints?
Best,
Stephan
More information about the Biopython
mailing list