[Biopython-dev] Notification: incoming/116

biopython-bugs at bioperl.org biopython-bugs at bioperl.org
Tue Jan 14 10:37:26 EST 2003


JitterBug notification

new message incoming/116

Message summary for PR#116
	From: Andreas Kuntzagk <andreas.kuntzagk at mdc-berlin.de>
	Subject: Re: [Biopython-dev] Notification: incoming/109
	Date: 14 Jan 2003 16:38:28  0100
	0 replies 	0 followups

====> ORIGINAL MESSAGE FOLLOWS <====

>From andreas.kuntzagk at mdc-berlin.de Tue Jan 14 10:37:25 2003
Received: from server1.bbb2.mdc-berlin.de (root at server1.bbb2.mdc-berlin.de [141.80.34.10])
	by pw600a.bioperl.org (8.12.6/8.12.6) with ESMTP id h0EFbNSQ002360
	for <biopython-bugs at bioperl.org>; Tue, 14 Jan 2003 10:37:24 -0500
Received: from sulawesi.bioinf.mdc-berlin.de (sulawesi.bioinf.mdc-berlin.de [141.80.80.60])
	by server1.bbb2.mdc-berlin.de (8.11.4/8.11.4) with ESMTP id h0EFcAt22512;
	Tue, 14 Jan 2003 16:38:11 +0100 (MET)
Subject: Re: [Biopython-dev] Notification: incoming/109
From: Andreas Kuntzagk <andreas.kuntzagk at mdc-berlin.de>
To: biopython-bugs at bioperl.org
Cc: BioPython Mailing List <biopython at biopython.org>
In-Reply-To: <200212121344.gBCDiYSQ009955 at pw600a.bioperl.org>
References: <200212121344.gBCDiYSQ009955 at pw600a.bioperl.org>
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
X-Mailer: Ximian Evolution 1.0.8 
Date: 14 Jan 2003 16:38:28 +0100
Message-Id: <1042558708.28387.16.camel at sulawesi>
Mime-Version: 1.0
X-Spam-Status: No
X-Scanned-By: MIMEDefang 2.26 (www . roaringpenguin . com / mimedefang)

> > >From andreas.kuntzagk at mdc-berlin.de Tue Dec 10 12:31:46 2002
> [...]
> > 
> > While parsing the recent GenBank-Release, I got followin error:
> > 
> > 
> > >>> from Bio import GenBank
> > >>> f=file("gbest1.seq")
> > 
> > >>> GenBank.Iterator(f,has_header=1)
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in ?
> >   File "/usr/lib/python2.2/site-packages/Bio/GenBank/__init__.py", line 171, in
> > __init__
> >     self._reader = RecordReader.StartsWith(handle, "LOCUS")
> >   File "/usr/lib/python2.2/site-packages/Martel/RecordReader.py", line 133, in
> > __init__
> >     self.tagtable)
> >   File "/usr/lib/python2.2/site-packages/Martel/RecordReader.py", line 92, in
> > _find_begin_positions
> >     raise ReaderError("invalid format starting with %s" % repr(text[:50]))
> > Martel.RecordReader.ReaderError: invalid format starting with 'DEFINITION 
> > zd84h07.s1 Soares_fetal_heart_NbHH19W '
> > 
> > Problems seems, that in this file there is only one empty line after the
> > "...reported sequences" instead of the expected two lines.
> > 
> 
> I would suggest the following patch. This reads all text from the handle
> into a string (which can consume quit some memory :-( ) and skips to the
> first LOCUS. All remaining text is the turned into a StrinIO (would
> cStringIO better?)
[patch deleted]

Answering myself again. Here is a better patch (against the
biopython-1.10). Using cStringIO only when the handle doesn't have a
seek, I read to the first "LOCUS" and then 'unread' the last line.
This gives also more flexibility for the structure of the header.

Is there anybody else there working with full GenBank-Releases and can
confirm this patch? 

---patch---

# diff GenBank/__init__.py ~/biopython-1.10/Bio/GenBank/
162,166d161
<             try:
<                 handle.__getattribute__("seek") #Need seek to place
file-position back after reading "LOCUS"
<             except:
<                 import cStringIO #if there is no seek, we read all
into a string and use a StringIO
<                 handle=cStringIO.StringIO(handle.read())
169,170c164
<                 if cur_line.startswith("LOCUS") or cur_line=="":
<                     handle.seek(-len(cur_line),1)
---
>                 if cur_line.find("reported sequences") >= 0:
171a166,169
> 
>             # read off two more lines and we are ready to go
>             handle.readline()
>             handle.readline()






More information about the Biopython-dev mailing list