[Biopython-dev] Notification: incoming/109
biopython-bugs at bioperl.org
biopython-bugs at bioperl.org
Thu Dec 12 08:44:34 EST 2002
JitterBug notification
new message incoming/109
Message summary for PR#109
From: Andreas Kuntzagk <andreas.kuntzagk at mdc-berlin.de>
Subject: Re: [Biopython-dev] Notification: incoming/108
Date: 12 Dec 2002 14:45:24 0100
0 replies 0 followups
====> ORIGINAL MESSAGE FOLLOWS <====
>From andreas.kuntzagk at mdc-berlin.de Thu Dec 12 08:44:33 2002
Received: from guard.edv.mdc-berlin.de (guard.edv.mdc-berlin.de [141.80.8.30])
by pw600a.bioperl.org (8.12.6/8.12.6) with ESMTP id gBCDiWSQ009950
for <biopython-bugs at bioperl.org>; Thu, 12 Dec 2002 08:44:33 -0500
Received: from sulawesi.bioinf.mdc-berlin.de (sulawesi.bioinf.mdc-berlin.de [141.80.80.60])
by guard.edv.mdc-berlin.de (8.11.4/8.11.4) with ESMTP id gBCDiwh07148
for <biopython-bugs at bioperl.org>; Thu, 12 Dec 2002 14:44:58 +0100 (MET)
Subject: Re: [Biopython-dev] Notification: incoming/108
From: Andreas Kuntzagk <andreas.kuntzagk at mdc-berlin.de>
To: biopython-bugs at bioperl.org
In-Reply-To: <200212101731.gBAHVlAW020429 at pw600a.bioperl.org>
References: <200212101731.gBAHVlAW020429 at pw600a.bioperl.org>
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
X-Mailer: Ximian Evolution 1.0.8
Date: 12 Dec 2002 14:45:24 +0100
Message-Id: <1039700724.20575.40.camel at sulawesi>
Mime-Version: 1.0
X-Spam-Status: No
X-Scanned-By: MIMEDefang 2.26 (www . roaringpenguin . com / mimedefang)
> >From andreas.kuntzagk at mdc-berlin.de Tue Dec 10 12:31:46 2002
[...]
>
> While parsing the recent GenBank-Release, I got followin error:
>
>
> >>> from Bio import GenBank
> >>> f=file("gbest1.seq")
> >>> help(GenBank.Iterator)
>
> >>> GenBank.Iterator(f,has_header=1)
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> File "/usr/lib/python2.2/site-packages/Bio/GenBank/__init__.py", line 171, in
> __init__
> self._reader = RecordReader.StartsWith(handle, "LOCUS")
> File "/usr/lib/python2.2/site-packages/Martel/RecordReader.py", line 133, in
> __init__
> self.tagtable)
> File "/usr/lib/python2.2/site-packages/Martel/RecordReader.py", line 92, in
> _find_begin_positions
> raise ReaderError("invalid format starting with %s" % repr(text[:50]))
> Martel.RecordReader.ReaderError: invalid format starting with 'DEFINITION
> zd84h07.s1 Soares_fetal_heart_NbHH19W '
>
> Problems seems, that in this file there is only one empty line after the
> "...reported sequences" instead of the expected two lines.
>
I would suggest the following patch. This reads all text from the handle
into a string (which can consume quit some memory :-( ) and skips to the
first LOCUS. All remaining text is the turned into a StrinIO (would
cStringIO better?)
--start patch--
RCS file: /home/repository/biopython/biopython/Bio/GenBank/__init__.py,v
retrieving revision 1.34
diff -r1.34 __init__.py
154,156c154
< GenBank). If so, we'll iterate over the header to get past it,
and
< then the iterator will be set up to return the first record in
< the file.
---
> GenBank). If so, we'll skip to the next LOCUS.
159,170c157,161
< first_line = handle.readline()
< assert first_line.find("Genetic Sequence Data Bank") >= 0,
\
< "Doesn't seem to have a GenBank header."
< while 1:
< cur_line = handle.readline()
< if cur_line.find("reported sequences") >= 0:
< break
<
< # read off two more lines and we are ready to go
< handle.readline()
< handle.readline()
<
---
> re_locus = re.compile(r".*?(?=LOCUS)")
> split_text = re_locus.split(handle.read(),1)
> assert len(split_text)==2
> import StringIO
> handle=StringIO.StringIO(split_text[1])
More information about the Biopython-dev
mailing list