[Biopython-dev] 'testseq' function update

Adil Iqbal aiqbal85 at gmail.com
Wed May 31 14:04:57 UTC 2017


Thank you for all of your suggestions.

I've made updates to the 'testseq' function. I've detailed the changes
below. You can view the code on my github:
https://github.com/Adil-Iqbal/Personal-Projects/blob/master/Test%20Sequence/testseq.py

I agree with Peter that perhaps Scripts would be a better location for this
function. All of the utilities in SeqUtils seem to be augmenting already
existing sequences. "testseq" seems a bit out of place.

I have instantiated the Random class as suggested by Andrew and everything
is working as intended. "testseq" should not be interfering with any other
code in Biopython.

I have also now added Biopython Warnings to better communicate with the end
user.
Thanks for the suggestion Andrew.

I ran into some issues when I tried to simplify the seeding code. I made
some notes during testing, I'll copy paste them below:

My design goal is to write a function that is both able to produce
unique sequences with
each function call AND able to pass the doctests reliably.
Unfortunately, the random seeding seems to have some odd behavior. I
will detail them below:
1. If I have NEVER seeded the RNG, the function will re-seed the RNG
every time it is called --
thereby by producing unique sequences with each function call.
Unfortunately, this approach lead to
the doctest failures and could not be utilized.
2. If I seed the RNG, and then REMOVE the seed to try and reproduce
earlier behavior, the
behavior changes to seed the RNG using the system date and time (You
can read about this in the
python documentation. "random.seed(a=None)") Such behavior can only
produce a unique sequence
once every second. If the function is called more than once per
second, e.g. in a for-loop, the
design-goal fails.
3. My solution to the above problems was to have the global variable
"shuffle_seed" which
increments with each function call. When the "shuffle_seed" global
seeds the RNG, it results
in a different sequence every time. And since it's use is turned off
by default, the doctests
will always be passed. The issue now is that it requires the use of a
global variable -- which
may be undesirable. However, as of Python version 2.7, there is no
upper limit to how large integers
can be. The max limit is dictated by the system space, which on a
32-bit system is [(2^31)-1].
It is difficult to think of a use-case that could test such a generous
boundry. Though, if it is not a
satisfactory solution, I can try something else. I'm open to suggestions.


Best
Regards,
​Adil Iqbal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20170531/724826c0/attachment.html>


More information about the Biopython-dev mailing list