[Biopython-dev] [BioPython] Next release plans; was: what to use for working with fasta sequences and alignments?
Michiel Jan Laurens de Hoon
mdehoon at c2b2.columbia.edu
Tue Jan 16 17:51:23 UTC 2007
Peter wrote:
> Regarding the fix checked in on bug 1970 I still would prefer we call
> the new XML iterator NCBIXML.Iterator(handle) rather than
> NCBIXML.parse(handle) but I'll live ;)
>
I chose "parse" because it is used in the old (Biopython release 1.42)
Blast XML parser:
Old:
>>> from Bio.Blast import NCBIXML
>>> b_parser = NCBIXML.BlastParser()
>>> b_record = b_parser.parse(blast_out)
New:
>>> from Bio.Blast import NCBIXML
>>> b_records = NCBIXML.parse(blast_out)
>>> b_record = b_records.next() # Repeat to get subsequent Blast records
Whereas I am not dead set on "parse", it agrees with similar functions
in Python:
1) Function name is a verb, not a noun
2) Function name describes what the function does, not what the function
returns
3) Function names are short, and start with a lower case letter.
For example, to read a file line-by-line in Python:
>>> inputfile = open("somefunnyfile")
# "open"; not "Iterator", nor "FileToLineIterator",
# even though "open" returns an iterator:
>>> for line in inputfile:
... print line
To read an image file with the Python Imaging Library:
>>> import Image
>>> im = Image.open("lena.ppm")
# "open"; not "Image", nor "FileNameToImage".
To read a Python object from a pickled file:
>>> import pickle
>>> inputfile = open("somepickledfile")
>>> myobject = pickle.load(inputfile)
# "load"; not "FileToObject".
>>> inputfile.close()
To parse an XML file with the sax parser framework in Python:
>>> from xml.sax.handler import ContentHandler
>>> from xml import sax
>>> handler = SomeSubclassOfContentHandler()
>>> inputfile = open("myxmlfile.xml")
>>> sax.parse(inputfile, handler)
# "parse", same as in the new Bio.Blast.NCBIXML
>>> inputfile.close()
So, for Bio.Blast.NCBIXML, good names would be "load", "read", "parse",
or something similar. "Iterator" would not be consistent; besides, until
recently I didn't know what an iterator is, so I doubt that new users
would know.
What we could do is to have two functions in Bio.Blast.NCBIXML, perhaps
one called "read" and the other "iterate", where the former returns a
single Blast record (for an XML file containing only one Blast result),
and the latter an iterator over multiple Blast records.
--Michiel.
--
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1130 St Nicholas Avenue
New York, NY 10032
More information about the Biopython-dev
mailing list