[Biopython] Sequence object "find" is still case specific?

Peter Cock p.j.a.cock at googlemail.com
Wed Mar 6 10:11:16 UTC 2013


On Wed, Mar 6, 2013 at 7:45 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> --- On Mon, 3/4/13, Martin Mokrejs <mmokrejs at fold.natur.cuni.cz> wrote:
>> I do use mixed-casing quite often and I think it is
>> acceptable to ask user to do the
>> .find like:
>>
>> s.tostring().upper().find('ACGTT')
>>
>> and leave the user slice out the mixed-cased match
>> eventually from the original sequence object.
>
> The problem though is that the call to .upper() will be slow if s is a
> long sequence. Trying this for human chromosome 1 showed that
> the search will take 20,000 times longer, and is unacceptably slow
> if you want to execute this search often.

With the current code, the simple route is to standardise all your
query and search strings into one case (e.g. upper case).

Might optional case insensitive search might be useful if we
can make it fast with some optional C code (and a pure Python
fallback for PyPy, Jython, etc)?

Peter




More information about the Biopython mailing list