[Biopython] Sequence object "find" is still case specific?

Peter Cock p.j.a.cock at googlemail.com
Sun Mar 3 20:39:22 UTC 2013


On Sun, Mar 3, 2013 at 7:13 PM, hari jayaram <harijay at gmail.com> wrote:
>
>
> On Google Plus Chris Lasher wrote:
>> Hmm, well, lower case nucleotides have often represented "masked regions"
>> of sequences. It seems that Biopython sequences were meant to be
>> case-sensitive (e.g.,
>> http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc22).From the
>> documentation there, it seems like you've discovered a bug in the API; I
>> feel that Seq should raise a ValueError when instantiating with lower-case
>> nucleotiods and unambiguous_dna.

Yes, it has some appeal - the trouble is if we suddenly start
enforcing this it will likely break many existing scripts:
https://redmine.open-bio.org/issues/2597

> Would it not make sense to have either of the following behavior
>
> seq = Seq("atgCTCGAGcatcatcat",IUPAC.unambiguous_dna) throws an error since
> mixed case is used which is not allowed

Yes, if we keep IUPAC.unambiguous_dna as upper case only,
then an error makes sense. https://redmine.open-bio.org/issues/2597

Or we could make IUPAC.unambiguous_dna mixed case, and add
new more specific upper only and lower only alphabets? Sadly that
would also probably break some existing usage.

(Whatever change is made will require a transition period with
deprecation warnings in order to move to a strict by default mode)

> or
>
> It just silently converts it all to the case of the Unambiguous_DNA
> specification and then all "find" and "search" works regardless of case on
> this internal representation which is just "DNA".

You mean if an all upper case alphabet is used, silently switch
the sequence to upper case? And vice verse for lower case?
That seems to0 magic/implicit, so I'd not support that.

> *But for now I will just force case to upper when instantiating*
>

Yes, that is the pragmatic solution for the current and recent
versions of Biopython.

Peter



More information about the Biopython mailing list