[Biopython-dev] Bio.SeqIO

Michiel de Hoon mdehoon at c2b2.columbia.edu
Mon Feb 26 01:28:06 UTC 2007


Peter wrote:
> SequenceIterator(handle, format)
> SequencesToDict(sequences, key_function=None)
> SequencesToAlignment(sequences, ...)
> WriteSequences(sequences, handle, format)
> 
> Does anyone want to suggest different names for these functions?
> 
Instead of
>>> from Bio.SeqIO import SequenceIterator, WriteSequences
>>> SequenceIterator(handle, format)
>>> WriteSequences(sequences, handle, format)

I would prefer
>>> from Bio import SeqIO
>>> SeqIO.read(handle, format)
>>> SeqIO.write(sequences, handle, format)

for the following reasons:

1) Similar functions in the Python standard library use a short verb
that describes what the function does, not what the function returns.
For example:
>>> myfile = open("myfile.txt") # Note: this returns an iterator
>>> myfile.read()
>>> pickle.load(handle)
>>> pickle.dump(object, handle)
>>> xml.sax.parse(source, handler)

2) The lack of symmetry between SequenceIterator and WriteSequences 
makes them harder to remember. Each time I use Bio.SeqIO, I wonder 
whether it is SequenceIterator or ReadSequences.

3) SequenceIterator is not factually correct; it would be a 
SeqRecordIterator. But that is even harder to remember, and involves 
even more typing.

4) The "Sequence" in SequenceIterator and WriteSequences is redundant. 
As these functions are in the SeqIO module, we already know they handle 
sequences. In addition, new users will probably not know what an 
iterator is.

5) Bio.SeqIO being a new module allows us to correct some design errors 
from the past. One thing that always bothered me in Biopython is that it 
is hard to guess its usage; I always need to look up in the manual how 
to use a particular parser.
Now, "read" and "write" are generic names that can be used by similar 
functions in other Biopython modules. For example, the new Blast XML 
parser tentatively uses NCBIXML.parse. This function returns an 
iterator, with a Blast record for each Blast query, resembling how 
"read" works in Bio.SeqIO. Renaming the NCBIXML parser function to 
NCBIXML.read would give us some internal consistency in Biopython and 
enable us to guess the function name without having to look it up in the 
manual each time.


--Michiel.



More information about the Biopython-dev mailing list