[Biopython-dev] [Bug 2825] New: SeqIO does not successfully parse Genbank records related to whole genome sequencing deposits, as Did not recognise the LOCUS line layout
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Fri May 8 22:14:36 UTC 2009
http://bugzilla.open-bio.org/show_bug.cgi?id=2825
Summary: SeqIO does not successfully parse Genbank records
related to whole genome sequencing deposits, as Did not
recognise the LOCUS line layout
Product: Biopython
Version: 1.49
Platform: All
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: david.wyllie at ndm.ox.ac.uk
Hi
I'm using the BioPython distribution 1.49 obtained as a Package using the
Ubuntu 9 synaptic package manager. The below describes the problem:
NCBI has a record type which describes the contents of whole-genome sequencing
projects. The record doesn't itself contain sequence, by constrast to most
genbank records.
this URL gives an example
http://www.ncbi.nlm.nih.gov/nuccore/162285818
should the SeqIO parser be able to read this? it cannot. Here is an example:
# import modules
from Bio import Entrez
from Bio import SeqIO
# read the record from NCBI, print out the contents.
handle = Entrez.efetch(db="nucleotide", rettype="gb", id="162285818")
masterrecord=handle.readlines()
for line in masterrecord:
print line
handle.close()
# let's read it again, and try to parse with with SeqIO.
handle = Entrez.efetch(db="nucleotide", rettype="gb", id="162285818")
# this line causes the crash
seq_record = SeqIO.read(handle, "genbank")
handle.close()
# fails. the traceback reads
"""
Traceback (most recent call last):
File "bugreport.py", line 25, in <module>
seq_record = SeqIO.read(handle, "genbank")
File "/var/lib/python-support/python2.6/Bio/SeqIO/__init__.py", line 435, in
read
first = iterator.next()
File "/var/lib/python-support/python2.6/Bio/GenBank/Scanner.py", line 410, in
parse_records
record = self.parse(handle)
File "/var/lib/python-support/python2.6/Bio/GenBank/Scanner.py", line 393, in
parse
if self.feed(handle, consumer) :
File "/var/lib/python-support/python2.6/Bio/GenBank/Scanner.py", line 360, in
feed
self._feed_first_line(consumer, self.line)
File "/var/lib/python-support/python2.6/Bio/GenBank/Scanner.py", line 907, in
_feed_first_line
raise ValueError('Did not recognise the LOCUS line layout:\n' + line)
ValueError: Did not recognise the LOCUS line layout:
LOCUS ABIN01000000 353 rc DNA linear BCT 10-DEC-2007
"""
# by contrast, reading one of the constituent genbank records, like this one
# http://www.ncbi.nlm.nih.gov/nuccore/162285817
# works correctly;
handle = Entrez.efetch(db="nucleotide", rettype="gb", id="162285817")
seq_record = SeqIO.read(handle, "genbank")
handle.close()
print "Successfully loaded record GI=162285817"
print seq_record.description
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list