[Biopython] Sequence object "find" is still case specific?
Iddo Friedberg
idoerg at gmail.com
Sun Mar 3 21:10:43 UTC 2013
The thing is, I am a bit unsure of the utility of alphabets associated with
a Seq object in general. (And I was the one who was one of the original
crafters of the Seq object). It seems like *any* letter is acceptable -
there is no strict alphabet checking. I inserted "Z"s into an
unambiguous-dna Seq object. So I am not sure when this happened, but aren't
alphabets supposed to provide some constraints?
On Sun, Mar 3, 2013 at 3:39 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:
> On Sun, Mar 3, 2013 at 7:13 PM, hari jayaram <harijay at gmail.com> wrote:
> >
> >
> > On Google Plus Chris Lasher wrote:
> >> Hmm, well, lower case nucleotides have often represented "masked
> regions"
> >> of sequences. It seems that Biopython sequences were meant to be
> >> case-sensitive (e.g.,
> >> http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc22).From the
> >> documentation there, it seems like you've discovered a bug in the API; I
> >> feel that Seq should raise a ValueError when instantiating with
> lower-case
> >> nucleotiods and unambiguous_dna.
>
> Yes, it has some appeal - the trouble is if we suddenly start
> enforcing this it will likely break many existing scripts:
> https://redmine.open-bio.org/issues/2597
>
> > Would it not make sense to have either of the following behavior
> >
> > seq = Seq("atgCTCGAGcatcatcat",IUPAC.unambiguous_dna) throws an error
> since
> > mixed case is used which is not allowed
>
> Yes, if we keep IUPAC.unambiguous_dna as upper case only,
> then an error makes sense. https://redmine.open-bio.org/issues/2597
>
> Or we could make IUPAC.unambiguous_dna mixed case, and add
> new more specific upper only and lower only alphabets? Sadly that
> would also probably break some existing usage.
>
> (Whatever change is made will require a transition period with
> deprecation warnings in order to move to a strict by default mode)
>
> > or
> >
> > It just silently converts it all to the case of the Unambiguous_DNA
> > specification and then all "find" and "search" works regardless of case
> on
> > this internal representation which is just "DNA".
>
> You mean if an all upper case alphabet is used, silently switch
> the sequence to upper case? And vice verse for lower case?
> That seems to0 magic/implicit, so I'd not support that.
>
> > *But for now I will just force case to upper when instantiating*
> >
>
> Yes, that is the pragmatic solution for the current and recent
> versions of Biopython.
>
> Peter
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
--
Iddo Friedberg
http://iddo-friedberg.net/contact.html
++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.>
++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----.
.>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>>
>>----.<--.>++++++.<<<<------------------------------------.
More information about the Biopython
mailing list