[BioPython] Entrez.efetch
Brad Chapman
chapmanb at 50mail.com
Wed Oct 8 12:35:33 UTC 2008
Hi Stephan;
> It seems that downloading the file to disk will corrupt the genbank
> file, while downloading directly into biopythons SeqIO.read() function
> works properly. I dont get it!
>
> When I download this chromosome manually from the NCBI-website,
> I indeed find a difference in one line, namely in line 3 of the
> genbank file. In the manually downloaded file line 3 reads:
> "ACCESSION NC_004353 REGION: 1..1351857", while in the file produced
> from my code I have only: "ACCESSION NC_004353". So without that
> region-information, the biopython parser of course runs to a premature
> end.
This is a tricky problem that I ran into as well and is fixed in the
latest CVS version. The issue is that the Biopython reader is using an
UndoHandle instead of a standard python handle. By default some of these
operations appear to be assuming an iterator, but UndoHandle did not
provide this.
As a result, you can lose the first couple of lines which are
previously examined to determine the filetype. The fix is to make
this a proper iterator. You can either check out current CVS, or
make the addition manually to Bio/File.py in your current version:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/File.py.diff?r1=1.17&r2=1.18&cvsroot=biopython
Hope this helps,
Brad
More information about the Biopython
mailing list