[Biopython] Entrez.efetch
Rohan Maddamsetti
rohan.maddamsetti at gmail.com
Fri Feb 26 02:33:25 UTC 2010
Hello,
I'm new to biopython (installed yesterday), so please bear with me. This
problem is similar to one sent to list on Wed, Oct 8, 2008 with the same
subject line as this email, by a Stephan. Interestingly, though, my code
works in a couple cases (including the chromosome input used by Stephan),
but not in a third. I wrote the following simple function.
def parseGenome(genbank_id):
handle = Entrez.efetch(db="genome",rettype="gb",id=genbank_id)
for seq_record in SeqIO.parse(handle,"gb"):
print "%s with %i features" % (seq_record.id,
len(seq_record.features))
handle.close()
##Try on E. coli
genome:
parseGenome("CP000819.1")
##Try on Drosophila chromosome 4
parseGenome("NC_004353.3")
##Try on Drosophila X chromosome
parseGenome("NC_004354")
And this is the output I get:
CP000819.1 with 8759 features
NC_004353.3 with 1191 features
Traceback (most recent call last):
File "BiasCalc.py", line 48, in <module>
parseGenome("NC_004354")
File "BiasCalc.py", line 38, in parseGenome
for seq_record in SeqIO.parse(handle,"gb"):
File
"/Library/Frameworks/Python.framework/Versions/6.0.4/lib/python2.6/site-packages/Bio/GenBank/Scanner.py",
line 420, in parse_records
record = self.parse(handle, do_features)
File
"/Library/Frameworks/Python.framework/Versions/6.0.4/lib/python2.6/site-packages/Bio/GenBank/Scanner.py",
line 403, in parse
if self.feed(handle, consumer, do_features):
File
"/Library/Frameworks/Python.framework/Versions/6.0.4/lib/python2.6/site-packages/Bio/GenBank/Scanner.py",
line 380, in feed
misc_lines, sequence_string = self.parse_footer()
File
"/Library/Frameworks/Python.framework/Versions/6.0.4/lib/python2.6/site-packages/Bio/GenBank/Scanner.py",
line 762, in parse_footer
raise ValueError("Premature end of file in sequence data")
ValueError: Premature end of file in sequence data
Is this a bug, or am I doing something wrong? My eventual goal is to iterate
through the features in the seq_record, and collect GC content statistics
for the coding regions and introns.
Thanks,
Rohan
More information about the Biopython
mailing list