[Biopython-dev] Deprecating PropertyManager, Encodings and Bio.utils?

Peter biopython at maubp.freeserve.co.uk
Mon Mar 29 08:36:19 EDT 2010


Hi all,

I think we've done pretty well at carefully removing, fixing or
replacing most of the dusty bits of code Biopython had acquired
over the years. There are still things to clean up though... in
particular modules Bio.PropertyManager and Bio.Encodings
seem rather unnecessary.

Bio.Encodings is tied into the old (and now deprecated)
Bio.Translate and Bio.Transcribe code. Once they are
removed (after the next release) we can at least cut a lot
of Bio.Encodings.

Bio.PropertyManager and Bio.Encodings only seem to be
used by Bio.utils, which I would also like to deprecate. This
is an undocumented module with no unit tests. It offers a
few bits of sequence related functionality which would be
better off in Bio.Seq or Bio.SeqUtils, and some fairly trivial
functions we could just deprecate.

These strike me as the only bits of functionality worth keeping
in Bio.utils:

Function verify_alphabet (which is being used by the code in
Bio.NeuralNetwork.Gene) just checks a Seq object's sequence
obeys the alphabet letters. This essentially is something I think
the Seq object should do itself, during initialisation (Bug 2597).
With that done, then Bio.utils.verify_alphabet could be
deprecated.

There are a few functions for getting molecular weights via
the IUPAC alphabet objects. These could be reimplemented
by using weight tables belonging to the IUPAC alphabet
classes explicitly, perhaps exposed as new functions under
Bio.SeqUtils. It would be interesting to look at refinements
like handling the start/end of the sequence explicitly (i.e.
the 5' and 3' ends of a nucleotide sequence, or the N and C
terminals of a peptide).

Function reduce_sequence (linked to Bio.Alphabet.Reduced)
is for things like mapping a protein sequence to a simplified
sequence using the Murphy alphabet (e.g. using a single
letter for all the aliphatics: I,L,V). This is perhaps interesting
enough to retain - again perhaps under Bio.SeqUtils. It does
need documentation and unit tests though.

Is anyone interested in updating, documenting and then
testing the molecular weight and reduced alphabet code?
[I suggest starting a new thread if you are.]

If not, should we consider just deprecating Bio.utils,
Bio.PropertyManager and Bio.Encodings in the next
release?

Peter


More information about the Biopython-dev mailing list