[BioPython] Entrez.efetch

Brad Chapman chapmanb at 50mail.com
Wed Oct 8 21:11:25 UTC 2008


Peter and Stephan;
My fault -- sorry about the red herring on this one. I shouldn't
have tried to answer this e-mail in 5 minutes before work this
morning. Sounds like y'all have it resolved with the missing close
so I will keep my mouth shut.

Peter, I don't remember my exact problem as it was in some
throw-away script and the fix seemed non-problematic. I was thrown
off by the "line 3" information Stephan mentioned because my issue
was with the first couple of lines missing when iterating with an
UndoHandle. No matter.

Thanks for coming up with the right fix!
Brad

> Stephan wrote:
> >> When I download this chromosome manually from the NCBI-website,
> >> I indeed find a difference in one line, namely in line 3 of the
> >> genbank file. In the manually downloaded file line 3 reads:
> >> "ACCESSION NC_004353 REGION: 1..1351857", while in the file produced
> >> from my code I have only: "ACCESSION NC_004353". So without that
> >> region-information, the biopython parser of course runs to a premature
> >> end.
> 
> Stephan - when you say manually, do you mean via a web browser?  If so
> it is likely to be using a subtly different URL, which might explain
> the NCBI generating slightly different data on the fly.  Either way,
> this ACCESSION line difference shouldn't trigger the "Premature end of
> file in sequence data" error in the GenBank parser.
> 
> On Wed, Oct 8, 2008 at 1:35 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> > This is a tricky problem that I ran into as well and is fixed in the
> > latest CVS version. The issue is that the Biopython reader is using an
> > UndoHandle instead of a standard python handle. By default some of these
> > operations appear to be assuming an iterator, but UndoHandle did not
> > provide this.
> 
> Brad, I'm pretty sure the GenBank parser is NOT using the UndoHandle.
> Just adding the close made Stephan's example work for me.  What
> exactly was the problem you ran into (one of the other parsers
> perhaps?).
> 
> > As a result, you can lose the first couple of lines which are
> > previously examined to determine the filetype. The fix is to make
> > this a proper iterator. You can either check out current CVS, or
> > make the addition manually to Bio/File.py in your current version:
> >
> > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/File.py.diff?r1=1.17&r2=1.18&cvsroot=biopython
> 
> Adding this to the UndoHandle seems a sensible improvement - but I
> don't see how it can affect Stephan's script.
> 
> Peter



More information about the Biopython mailing list