[Biopython-dev] Code review request for phyloxml branch

Mon Jan 11 15:02:46 UTC 2010

--- On Mon, 1/11/10, Peter <biopython at maubp.freeserve.co.uk> wrote:
> What is wrong with leaving the IO functions
> (read, parse, write) as Bio.Phylo.IO.read etc
> e.g.
> 
> >>> from Bio import Phylo
> >>> tree =
> Phylo.IO.read(open("int_node_labels.nwk"),"newick")
> 
> What is the benefit of having them also exposed under the
> Bio.Phylo namespace, e.g. as Bio.Phylo.read? This means
> there are two ways to access them which is confusing.

If we use Bio.Phylo.IO.read directly, then for consistency we'd have to do the same for all other modules. Otherwise, we'd be guessing each time whether the read() and parse() functions are in Bio.SomeModule, or Bio.SomeModule.IO.

For Bio.Phylo, a simple solution is to put whatever is in Bio.Phylo.IO.__init__.py in Bio.Phylo.__init__.py, and remove Bio.Phylo.IO.__init__.py. Then there is only one way to access the read() etc. functions.

[About doing the same for Bio.Seq and Bio.Align]
> On the other hand, all that upheaval would cause a
> lot of pain for end users, for relatively little gain.

For new users, it may be confusing to have all those different modules dealing with sequences. At least, it was for me when I started with Biopython. Therefore, for a long term solution, I'd prefer a single Bio.Seq module that incorporates all (Seq, SeqRecord, SeqIO, SeqFeature).

I agree that that may cause a lot of upheaval for end users, but a suitably long transition period may mitigate those concerns. I'd prefer that to being stuck with a less-than-optimal code organization forever.

--Michiel