[Biopython-dev] Deprecating Bio.mathfns, Bio.stringfns and their C code?

Bruce Southey bsouthey at gmail.com
Thu Oct 23 12:28:48 EDT 2008


Peter wrote:
> This is about three Biopython "support" modules: Bio.mathfns,
> Bio.listfns, Bio.stringfns, each of which has its own C implementation
> for speed.  These haven't been touched for 6 years (which suggests
> they are stable and well tested), but they are now hardly used in
> Biopython.
>
> By removing these we not only reduce the amount of C code in Biopython
> (although here it is optional) which is a good thing for portability
> and supporting other python variants, but we also can reduce the
> "clutter" under the Bio.* namespace, e.g.
>   
>>>> import Bio
>>>> help(Bio)
>>>>         
>
> On 9th Oct I wrote:
>   
>> Until recently Bio.mathfns was used in Bio/NaiveBayes.py but that now
>> uses numpy more heavily instead.  I think that Bio.mathfns (and its C
>> implementation) are no longer used anywhere in Biopython (and I would
>> be surprised if anyone else is using this module).  I'm suggesting
>> deprecating Bio.mathfns and Bio.cmathfns for the next release.
>>     
>
> Any objections to deprecating Bio.mathfns and Bio.cmathfns?
>   
Nope, the functions used by Bio/NaiveBayes.py are:
mathfns.safe_log (also defines safe_log2) but is not very good because 
it sets a hard constant (1E-100) as a limit.
mathfns.safe_exp

The other functions included are:
fcmp       Compare two floating point numbers, up to a specified precision.
intd       Represent a floating point number as an integer.

I presume that you mean adding mathfns.safe_log and mathfns.safe_exp to 
Bio/NaiveBayes.py first because these are needed by Bio/NaiveBayes.py.

Note that the safe_log in Bio/MarkovModel.py is not the same as 
mathfns.safe_log.
> On 9th Oct I wrote:
>   
>> I think Bio.stringfns and its C implementation Bio.cstringfns are also
>> now unused in Biopython, and like Bio.mathfns and Bio.cmathfns
>> should be deprecated for the next release.
>>     
>
> Any objections to deprecating Bio.stringfns and Bio.cstringfns?
>   
Nope, as you say these are not used. But just to be clear, the 
functions, lost are
splitany       Split a string using many delimiters.
find_anychar   Find one of a list of characters in a string.
rfind_anychar  Find one of a list of characters in a string, from end to 
start.
starts_with    Check whether a string starts with another string 
[DEPRECATED].
> On 9th Oct I wrote:
>   
>> Similarly, Bio.listfns and its C implementation Bio.clistfns might
>> also be deprecated with a little effort ... only three modules
>> currently use Bio.listfns
>>     
>
> We could just label Bio.listfns (and Bio.clistfns) as obsolete for the
> next release, or just add a note in the docstring that this might be
> deprecated shortly.
>   
Used by:
Bio/MaxEntropy.py
Bio/NaiveBayes.py
Bio/MarkovModel.py
Bio/pairwise2.py

Functions directly used:
itemindex     Make an index of the items in the list.
items         Get one of each item in a list.
contents      Calculate percentage each item appears in a list.

Functions indirectly or not used:
asdict        Make the list into a dictionary (for fast testing of 
membership).
count         Count the number of times each item appears.
intersection  Get the items in common between 2 lists.
difference    Get the items in 1 list, but not the other.
indexesof     Get a list of the indexes of some items in a list.
take          Take some items from a list.

Also Bio.listfns used by pairwise2.py which also has a c implementation 
(cpairwise2) that I would also suggest is a candidate for removal.

At present I do not know enough about Bio/MaxEntropy.py, 
Bio/NaiveBayes.py, and Bio/MarkovModel.py to indicate if Bio.listfns 
functions are really required or to port them to numpy. (I may try look 
at trying to port them but not soon.)

In summary I have no objection to removing the c code associated with 
this code.

Bruce



More information about the Biopython-dev mailing list