[Biopython-dev] GSoC SearchIO project

Michiel de Hoon mjldehoon at yahoo.com
Sat Apr 7 04:43:56 UTC 2012


--- On Tue, 4/3/12, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> The reason for using SearchIO (despite not being PEP8
> compatible - something I regret in the naming of SeqIO
> and the pattern it set) is to match SeqIO and AlignIO and
> BioPerl. Anyone familiar with BioPerl will immediately see
> what it is for - and some of the student applicants have
> already used BioPerl's SearchIO. Personally I find this
> quite a compelling argument.

Sorry but I am not convinced. I doubt that somebody familiar with BioPerl's Align and AlignIO modules will have trouble finding the parser in Biopython if in Biopython there is only a Bio.Align module. Also this means that some modules in Biopython are split up in Module and ModuleIO, whereas most others are not. In this particular case, for consistency you would have to create a Bio.Search and a Bio.SearchIO module. I'd rather have a clean module organization in Biopython instead of strictly following what BioPerl did.

> That said, the name SearchIO isn't the clearest in the
> the world for a newcomer - however I haven't come up
> with anything significantly better myself. Perhaps there
> is a better name out there, which would justify breaking
> the pattern? I've considered pairwise and palign, but
> neither feels right.

How about including this module as a submodule in Bio.Align? If we think of Bio.Align as a general module for alignments, then pairwise alignments fit in it too. It depends a bit on the exact API, but I expect that we can come up with something elegant.

> Given a clean slate (Biopython 2?), then yes, I would
> agree with consolidating Bio.Align and Bio.AlignIO as
> one namespace, probable "align" (lower case). The
> situation with Bio.Seq, Bio.SeqRecord and Bio.SeqIO
> isn't quite so simple - perhaps "seq" (lower case)?

There are two steps here: consolidation of some modules, and changing the names of modules to comply with PEP8. The consolidation can happen without waiting for a Biopython 2, as long as there are clear deprecating warnings in the modules that will be removed. Compliance with PEP8 is a bit trickier, since it means relearning all module names, and some systems (Windows?) may not distinguish between lower and upper case.

> Then (in the absence of any other ideas), SearchIO
> would become "search" (lower case).

If we already know now that we will drop the IO from SearchIO at some point, then SearchIO doesn't seem to be a good name.

Best,
-Michiel.




More information about the Biopython-dev mailing list