[Biopython-dev] [Biopython (old issues only) - Bug #2825] (Resolved) Parsing whole genome sequencing (WGS) Genbank records

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Thu Jan 19 17:07:57 UTC 2017


Issue #2825 has been updated by Peter Cock.

Description updated
Status changed from New to Resolved
% Done changed from 0 to 100

I think this was fixed a while back with this commit:

https://github.com/biopython/biopython/commit/82a410fedccc67c4ab01eb76b0d91e6d14edd29b


----------------------------------------
Bug #2825: Parsing whole genome sequencing (WGS) Genbank records
https://redmine.open-bio.org/issues/2825#change-15391

* Author: David Wyllie
* Status: Resolved
* Priority: Normal
* Assignee: Biopython Dev Mailing List
* Category: Main Distribution
* Target version: 1.49
* URL: 
----------------------------------------
Hi

I'm using the BioPython distribution 1.49 obtained as a Package using the Ubuntu 9 synaptic package manager.  The below describes the problem:

NCBI has a record type which describes the contents of whole-genome sequencing projects.  The record doesn't itself contain sequence, by constrast to most genbank records.

this URL gives an example
http://www.ncbi.nlm.nih.gov/nuccore/162285818
should the SeqIO parser be able to read this? it cannot.  Here is an example:

# import modules
from Bio import Entrez
from Bio import SeqIO

# read the record from NCBI, print out the contents.
handle = Entrez.efetch(db="nucleotide", rettype="gb", id="162285818")
masterrecord=handle.readlines()
for line in masterrecord:
	print line
handle.close()

# let's read it again, and try to parse with with SeqIO.
handle = Entrez.efetch(db="nucleotide", rettype="gb", id="162285818")

# this line causes the crash
seq_record = SeqIO.read(handle, "genbank")

handle.close()

# fails.  the traceback reads
"""
Traceback (most recent call last):
  File "bugreport.py", line 25, in <module>
    seq_record = SeqIO.read(handle, "genbank")
  File "/var/lib/python-support/python2.6/Bio/SeqIO/__init__.py", line 435, in read
    first = iterator.next()
  File "/var/lib/python-support/python2.6/Bio/GenBank/Scanner.py", line 410, in parse_records
    record = self.parse(handle)
  File "/var/lib/python-support/python2.6/Bio/GenBank/Scanner.py", line 393, in parse
    if self.feed(handle, consumer) :
  File "/var/lib/python-support/python2.6/Bio/GenBank/Scanner.py", line 360, in feed
    self._feed_first_line(consumer, self.line)
  File "/var/lib/python-support/python2.6/Bio/GenBank/Scanner.py", line 907, in _feed_first_line
    raise ValueError('Did not recognise the LOCUS line layout:\n' + line)
ValueError: Did not recognise the LOCUS line layout:
LOCUS       ABIN01000000             353 rc    DNA     linear   BCT 10-DEC-2007
"""

# by contrast, reading one of the constituent genbank records, like this one
# http://www.ncbi.nlm.nih.gov/nuccore/162285817
# works correctly;

handle = Entrez.efetch(db="nucleotide", rettype="gb", id="162285817")
seq_record = SeqIO.read(handle, "genbank")
handle.close()
print "Successfully loaded record GI=162285817"
print seq_record.description



-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20170119/bae95687/attachment-0001.html>


More information about the Biopython-dev mailing list