[Biopython] Entrez.efetch

Rohan Maddamsetti rohan.maddamsetti at gmail.com
Fri Feb 26 02:33:25 UTC 2010


Hello,

I'm new to biopython (installed yesterday), so please bear with me. This
problem is similar to one sent to list on Wed, Oct 8, 2008 with the same
subject line as this email, by a Stephan. Interestingly, though, my code
works in a couple cases (including the chromosome input used by Stephan),
but not in a third. I wrote the following simple function.

def parseGenome(genbank_id):
    handle = Entrez.efetch(db="genome",rettype="gb",id=genbank_id)
    for seq_record in SeqIO.parse(handle,"gb"):
        print "%s with %i features" % (seq_record.id,
len(seq_record.features))
    handle.close()

##Try on E. coli
genome:
parseGenome("CP000819.1")
##Try on Drosophila chromosome 4
parseGenome("NC_004353.3")
##Try on Drosophila X chromosome
parseGenome("NC_004354")

And this is the output I get:

CP000819.1 with 8759 features
NC_004353.3 with 1191 features
Traceback (most recent call last):
  File "BiasCalc.py", line 48, in <module>
    parseGenome("NC_004354")
  File "BiasCalc.py", line 38, in parseGenome
    for seq_record in SeqIO.parse(handle,"gb"):
  File
"/Library/Frameworks/Python.framework/Versions/6.0.4/lib/python2.6/site-packages/Bio/GenBank/Scanner.py",
line 420, in parse_records
    record = self.parse(handle, do_features)
  File
"/Library/Frameworks/Python.framework/Versions/6.0.4/lib/python2.6/site-packages/Bio/GenBank/Scanner.py",
line 403, in parse
    if self.feed(handle, consumer, do_features):
  File
"/Library/Frameworks/Python.framework/Versions/6.0.4/lib/python2.6/site-packages/Bio/GenBank/Scanner.py",
line 380, in feed
    misc_lines, sequence_string = self.parse_footer()
  File
"/Library/Frameworks/Python.framework/Versions/6.0.4/lib/python2.6/site-packages/Bio/GenBank/Scanner.py",
line 762, in parse_footer
    raise ValueError("Premature end of file in sequence data")
ValueError: Premature end of file in sequence data

Is this a bug, or am I doing something wrong? My eventual goal is to iterate
through the features in the seq_record, and collect GC content statistics
for the coding regions and introns.

Thanks,
Rohan



More information about the Biopython mailing list