[BioPython] Entrez.efetch

Stephan stephan80 at mac.com
Wed Oct 8 13:48:25 UTC 2008


Hi guys,

OK, there is two different problems here that Brad and Peter independently pointed out to me. Peter, you are right that not closing the file actually caused the error. Your hint fixes that, thanks.
But that doesnt fix that there is a part of line 3 missing over the download, and although I actually updated to the newest cvs-version of biopython as Brad suggested (sorry for accidently putting my answer not on the mailing-list) that does not fix that line...

Best,
Stephan

 
Am Mittwoch 08 Oktober 2008 um 03:37PM schrieb "Peter" <biopython at maubp.freeserve.co.uk>:
>On Wed, Oct 8, 2008 at 12:33 PM, Stephan <stephan80 at mac.com> wrote:
>> Hi,
>>
>> I am using biopython for a week or so. The package is amazing, I wonder how I possibly ignored this for so long now.
>> Since I am not only new to biopython I am also new in this mailing list, so forgive me if this is not the right forum for a question like this.
>>
>> Anyway, here is a weird little problem with the Bio.Entrez.efetch tool:
>> (I use python 2.5 and the latest Biopython 1.48)
>> I want to run the following little test-code, using etetch to get chromosome 4 of Drosophila melanogaster as a genbank-file:
>>
>> ---------------------------CODE------------------------------------
>> from Bio import Entrez, SeqIO
>>
>> print Entrez.read(Entrez.esummary(db="genome", id="56"))[0]["Title"]
>> handle = Entrez.efetch(db="genome", id="56", rettype="genbank")
>> print "downloading to SeqRecord..."
>> record = SeqIO.read(handle, "genbank")
>> print "...done"
>
>I assume this is just test code - as it would be silly to download the
>GenBank file twice in a real script.
>
>> handle = Entrez.efetch(db="genome", id="56", rettype="genbank")
>> filehandle = open("NCBI_DroMel", "w")
>> print "downloading to file..."
>> filehandle.write(handle.read())
>
>You should now close the file, which should ensure it is fully written to disk:
>filehandle.close()
>
>> print "...done"
>>
>> handle = open("NCBI_DroMel")
>> print "reading from file..."
>> record = SeqIO.read(handle, "genbank")
>> ---------------------------END-CODE------------------------------------
>>
>> In the last line we have a crash,
>>  ...
>> ValueError: Premature end of file in sequence data
>
>This is because you started reading in the file without finishing
>writing to it - the parser could only read in part of the data, and is
>complaining about it ending prematurely.
>
>Peter
>
>



More information about the Biopython mailing list