[Biopython-dev] Bio.SeqUtils.MeltingTemp Tm_staluc does not accept sequence object
Markus Piotrowski
Markus.Piotrowski at ruhr-uni-bochum.de
Wed Dec 21 07:22:55 EST 2011
Peter Cock <p.j.a.cock <at> googlemail.com> writes:
>
> On Sun, Dec 18, 2011 at 2:26 PM, Peter Cock <p.j.a.cock <at> googlemail.com>
wrote:
> >> That looks like an oversight in the Seq object, which
> >> is easily fixed (the unit tests need a bit more work...).
> >
> > Actually, I might have been worried about what to do
> > with the existing MutableSeq's index method (which
> > is list/array like not string like).
> >
> > Perhaps in the short term a fix to the TM function is
> > easier.
>
> Done:
> https://github.com/biopython/biopython/commit
/cd01acc2cdfb1a3e9e16e5107924168231514842
> Markus, could you test the latest code from github please?
>
> Thanks,
>
> Peter
OK, worked for me:
>>> from Bio.Seq import Seq
>>> import MeltingTemp #This is the latest code from github
>>> mySeq = Seq("ATGGCGCTCGTCCCAGCACC")
>>> MeltingTemp.Tm_staluc(mySeq)
61.837438978725686
>>> myString = "ATGGCGCTCGTCCCAGCACC"
>>> MeltingTemp.Tm_staluc(myString)
61.837438978725686
I'm still worried about whitespaces, which, in this method, would influence the
calculation by three ways:
1. In line 169 (of the latest code) len(s) is used in a calculation.
2. The method itself consists of taking thermodynamic values of neighboring
bases. Thus "GC" gives a different value than ("G C"), since in the latter case
G and C are not neighbored.
3. Also leading and trailing spaces would give wrong calculations since the
function tercorr (lines 59-101) uses startswith and endswith methods to look for
the first and last base, respectively.
Example (same sequence as above with space):
>>> mySeq = Seq("ATG GCGCTCGTCCCAGCACC")
>>> MeltingTemp.Tm_staluc(mySeq)
58.16633757439382
I would suggest to change line 117 to
sup = str(s).upper().translate(None, string.whitespace)
which requires import of the string module, or simpler:
sup = str(s).upper().replace(" ","")
and changing line 169 to:
ds = ds-0.368*(len(sup)-1)*math.log(saltc/1e3) #instead of len(s)
Putting these changes into MeltingTempN:
>>> import MeltingTempN
>>> mySeq = Seq("ATGGCGCTCGTCCCAGCACC")
>>> MeltingTempN.Tm_staluc(mySeq)
61.837438978725686
>>> mySeq = Seq("ATG GCGCTCGTCCCAGCACC") #space in sequence
>>> MeltingTempN.Tm_staluc(mySeq)
61.837438978725686
>>> mySeq = Seq(" ATGGCGCTCGTCCCAGCACC") #leading space
>>> MeltingTempN.Tm_staluc(mySeq)
61.837438978725686
It is of course a more philosophical question if the method should catch these
problems or if the method can rely on a 'proper' sequence (i.e., without blanks).
Markus
More information about the Biopython-dev
mailing list