[Biopython-dev] [Bug 2454] Iterators can't use file-like objects
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Wed Jun 18 15:36:48 UTC 2008
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #16 from mdehoon at ims.u-tokyo.ac.jp 2008-06-18 11:36 EST -------
(In reply to comment #15)
> I've removed the strict file-like test in:
>
> Bio/Sequencing/Ace.py revision: 1.12
> Bio/Sequencing/Phd.py revision: 1.6
>
> In these cases, the handle is immediately turned into an UndoHandle which will
> be able to check for a sufficiently file like object.
>
> Hopefully that's what you meant Michiel
Actually, I think we should avoid using an UndoHandle altogether, now that
Python has generator functions.
> - we could go further and introduce a
> parse() function and deprecate the Iterator objects in these modules.
>
That would make things a lot easier. An Iterator class was useful in older
versions of Python, but generator functions provide a cleaner alternative.
In Ace.py, we'd need three functions:
1) read(handle), which returns one record (Contig) read from the handle, and
None otherwise;
2) parse(handle), a generator function returning an iterator over the records;
3) a local function _process_line(line, record)
These functions then look like this:
def read(handle):
record = None
for line in handle:
if line[:2]=='CO':
break
else:
return None
record = Contig()
for line in handle:
if line[:2]=='CO':
return record
else:
_process_line(line, record)
def parse(handle):
record = None
for line in handle:
if line[:2]=='CO':
if record:
yield record
record = Contig()
_process_line(line, record)
if record:
return record
The actual work is done in _process_line.
So we don't need to store the read lines explicitly; this is now taken care of
by the generator function. Hence, we don't need to convert the handle to an
UndoHandle. In addition, handle can now also be a list of lines instead of a
file handle. In this respect, I think Zachary was right in comment #11:
> Maybe it's a good idea for any parsers/iterators to just
> use the iterator-like ability of file handles?
In other words, as long as we can pull lines from the handle, we can parse it.
In Phd.py, it's even simpler. Here, we only need the read() and parse()
function:
def read(handle):
for line in handle:
if line.startswith("BEGIN_SEQUENCE"):
record = Record()
elif line.startswith("END_SEQUENCE"):
return record
else:
# do the actual processing of the other lines here
def parse(handle):
while True:
record = read(handle)
if not record:
return
yield record
Again, we can process each line just as they come along. No UndoHandle, no
Parser, no Consumer, no Scanner needed.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list