[BioPython] (fwd) Python module for DNA to amino acid and reversecomplement translation.

Andrew Dalke dalke@acm.org
Sun, 24 Sep 2000 22:33:43 -0600


Okay, so I'm a bit behind in responding to email :(

Jeff:
>Andrew, is there stuff in the C code [for doing translations] that we
>can use, and works well with the stuff we already have?  It would be
>nice to get a quick speed-up, if it's general and easy to integrate.

I benchmarked Alex's code and found it was about 40 times faster than
a pure Python implementation.  (A modified version of the biopython code;
which should preallocate the output array but instead uses lots of
appends, which reallocates space as needed.  The biopython code is 60
slower than the C code for a megabase of 'A's.)

I have several concerns about the idea:

 - In Python it only takes 4.3 seconds to translate that megabase.  Is
that slow enough to be a problem?  No one has complained about the
performance.

 - It doesn't support using a different codon table.  The standard table
is hard coded into the source.

 - It doesn't allow a stop codon indicator other than '-'.

 - It doesn't support ambiguity codes

 - None of the biopython code uses a C extension.  Adding C code, even if
there is failover to working Python code, will increase the distribution
complexity.  (Meaning more people will ask questions about how to install
it, esp. for boxes without a C compiler.)

To answer myself:
 - If someone wants to go through the effort of writing a C extension, then
they may already have performance concerns, so why not include working code?

 - We're going to include C extensions at some time in the future, like
for doing sequence alignment.  Might as well start now.  And distutils is
supposed to simplify distribution by supporting the making of binary
distributions.

To readdress my concerns:
 - Some people will implement code in C because, well, they like C, so
the existance of a C module does not imply its need

 - Two implementations of the same code (in C and Python) increases the
testing costs (though with the right scaffolding the overhead is minimal)

So to answer Jeff's question, yes, it would be nice to get a speedup but
I don't think it's yet worth the complexity, especially since the code
isn't really general enough to replace all but the most standard of the
existing 'translate' capabilities.

                    Andrew
                    dalke@acm.org