[Biopython-dev] Notification: incoming/109

biopython-bugs at bioperl.org biopython-bugs at bioperl.org
Thu Dec 12 08:44:34 EST 2002


JitterBug notification

new message incoming/109

Message summary for PR#109
	From: Andreas Kuntzagk <andreas.kuntzagk at mdc-berlin.de>
	Subject: Re: [Biopython-dev] Notification: incoming/108
	Date: 12 Dec 2002 14:45:24  0100
	0 replies 	0 followups

====> ORIGINAL MESSAGE FOLLOWS <====

>From andreas.kuntzagk at mdc-berlin.de Thu Dec 12 08:44:33 2002
Received: from guard.edv.mdc-berlin.de (guard.edv.mdc-berlin.de [141.80.8.30])
	by pw600a.bioperl.org (8.12.6/8.12.6) with ESMTP id gBCDiWSQ009950
	for <biopython-bugs at bioperl.org>; Thu, 12 Dec 2002 08:44:33 -0500
Received: from sulawesi.bioinf.mdc-berlin.de (sulawesi.bioinf.mdc-berlin.de [141.80.80.60])
	by guard.edv.mdc-berlin.de (8.11.4/8.11.4) with ESMTP id gBCDiwh07148
	for <biopython-bugs at bioperl.org>; Thu, 12 Dec 2002 14:44:58 +0100 (MET)
Subject: Re: [Biopython-dev] Notification: incoming/108
From: Andreas Kuntzagk <andreas.kuntzagk at mdc-berlin.de>
To: biopython-bugs at bioperl.org
In-Reply-To: <200212101731.gBAHVlAW020429 at pw600a.bioperl.org>
References: <200212101731.gBAHVlAW020429 at pw600a.bioperl.org>
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
X-Mailer: Ximian Evolution 1.0.8 
Date: 12 Dec 2002 14:45:24 +0100
Message-Id: <1039700724.20575.40.camel at sulawesi>
Mime-Version: 1.0
X-Spam-Status: No
X-Scanned-By: MIMEDefang 2.26 (www . roaringpenguin . com / mimedefang)

> >From andreas.kuntzagk at mdc-berlin.de Tue Dec 10 12:31:46 2002
[...]
> 
> While parsing the recent GenBank-Release, I got followin error:
> 
> 
> >>> from Bio import GenBank
> >>> f=file("gbest1.seq")
> >>> help(GenBank.Iterator)
> 
> >>> GenBank.Iterator(f,has_header=1)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
>   File "/usr/lib/python2.2/site-packages/Bio/GenBank/__init__.py", line 171, in
> __init__
>     self._reader = RecordReader.StartsWith(handle, "LOCUS")
>   File "/usr/lib/python2.2/site-packages/Martel/RecordReader.py", line 133, in
> __init__
>     self.tagtable)
>   File "/usr/lib/python2.2/site-packages/Martel/RecordReader.py", line 92, in
> _find_begin_positions
>     raise ReaderError("invalid format starting with %s" % repr(text[:50]))
> Martel.RecordReader.ReaderError: invalid format starting with 'DEFINITION 
> zd84h07.s1 Soares_fetal_heart_NbHH19W '
> 
> Problems seems, that in this file there is only one empty line after the
> "...reported sequences" instead of the expected two lines.
> 

I would suggest the following patch. This reads all text from the handle
into a string (which can consume quit some memory :-( ) and skips to the
first LOCUS. All remaining text is the turned into a StrinIO (would
cStringIO better?)

--start patch--
RCS file: /home/repository/biopython/biopython/Bio/GenBank/__init__.py,v
retrieving revision 1.34
diff -r1.34 __init__.py
154,156c154
<         GenBank). If so, we'll iterate over the header to get past it,
and
<         then the iterator will be set up to return the first record in
<         the file.
---
>         GenBank). If so, we'll skip to the next LOCUS.
159,170c157,161
<             first_line = handle.readline()
<             assert first_line.find("Genetic Sequence Data Bank") >= 0,
\
<                    "Doesn't seem to have a GenBank header."
<             while 1:
<                 cur_line = handle.readline()
<                 if cur_line.find("reported sequences") >= 0:
<                     break
< 
<             # read off two more lines and we are ready to go
<             handle.readline()
<             handle.readline()
<             
---
>             re_locus = re.compile(r".*?(?=LOCUS)")
>             split_text = re_locus.split(handle.read(),1)
>             assert len(split_text)==2
>             import StringIO
>             handle=StringIO.StringIO(split_text[1])






More information about the Biopython-dev mailing list