[Biopython-dev] Bio.SCOP

Michiel de Hoon mjldehoon at yahoo.com
Sat Jun 21 01:11:18 EDT 2008


Bio.SCOP is one of the modules affected by Bug 2454
(http://bugzilla.open-bio.org/show_bug.cgi?id=2454),
which is basically about how Biopython uses file handles.

Bio.SCOP contains parsers for several file
formats used by SCOP. I am using Bio.SCOP.Hie
as an example here, but the same applies to
the other parsers.

The Bio.SCOP parsers define a Parser and a Iterator
class (similar to other older Biopython parsers).
Typical usage is as follows:

>>> from Bio.SCOP import Hie
>>> handle = open("mydatafile.txt")
>>> parser = Hie.Parser()
>>> records = Hier.Iterator(handle, parser)
>>> for record in records:
...     # record is an instance of Bio.SCOP.Hie.Record

Now, in the SCOP file format, each record is on one
line in the data file. So we don't need the Iterator:
>>> from Bio.SCOP import Hie
>>> handle = open("mydatafile.txt")
>>> parser = Hie.Parser()
>>> for line in handle:
...     record = parser.parse(line)
...     # record is an instance of Bio.SCOP.Hie.Record

This solves Bug #2454 (which occurs in the Iterator
class), and is more general than the Iterator class
(e.g., now we can parse a list of lines).

To take this one step further, the Parser class is not
really needed either. Although Parser is a class, we
are not using the functionality of a class (no
inheritance, and the object self is never used). In
essence, the parse() function inside the Parser class
may as well live outside of it.

There are several ways to simplify this module; each
of them essentially amount to moving the parse()
function:

1) Move the parse() function to the Record class initializer:
>>> from Bio.SCOP import Hie
>>> handle = open("mydatafile.txt")
>>> for line in handle:
...     record = Hie.Record(line)
...     # record is an instance of Bio.SCOP.Hie.Record

2) Move the parse() function outside of the Parser class,
and rename it read() for consistency with other Biopython
parsers:
>>> from Bio.SCOP import Hie
>>> handle = open("mydatafile.txt")
>>> while True:
...     record = Hie.read(handle)
...     if not record: break
...     # record is an instance of Bio.SCOP.Hie.Record

3) Move the parse() function outside of the Parser class,
and use it as a generator function:
>>> from Bio.SCOP import Hie
>>> handle = open("mydatafile.txt")
>>> records = Hie.parse(handle)
>>> for record in records:
...     # record is an instance of Bio.SCOP.Hie.Record


Comments, suggestions, preferences?

--Michiel.


       


More information about the Biopython-dev mailing list