[BioPython] Entrez.efetch

Brad Chapman chapmanb at 50mail.com
Wed Oct 8 12:35:33 UTC 2008


Hi Stephan;

> It seems that downloading the file to disk will corrupt the genbank
> file, while downloading directly into biopythons SeqIO.read() function
> works properly. I dont get it! 
>
> When I download this chromosome manually from the NCBI-website,
> I indeed find a difference in one line, namely in line 3 of the
> genbank file. In the manually downloaded file line 3 reads:
> "ACCESSION NC_004353 REGION: 1..1351857", while in the file produced
> from my code I have only: "ACCESSION NC_004353". So without that
> region-information, the biopython parser of course runs to a premature
> end.

This is a tricky problem that I ran into as well and is fixed in the
latest CVS version. The issue is that the Biopython reader is using an
UndoHandle instead of a standard python handle. By default some of these
operations appear to be assuming an iterator, but UndoHandle did not
provide this.

As a result, you can lose the first couple of lines which are
previously examined to determine the filetype. The fix is to make
this a proper iterator. You can either check out current CVS, or
make the addition manually to Bio/File.py in your current version:

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/File.py.diff?r1=1.17&r2=1.18&cvsroot=biopython

Hope this helps,
Brad



More information about the Biopython mailing list