[Biopython] Sequence object "find" is still case specific?

hari jayaram harijay at gmail.com
Sun Mar 3 18:34:26 UTC 2013

I am relatively new to biopython having not used it for a while. I have the
"bad" habit of storing sequences in an internal database with mixed case
strings i.e "atgCTCGAGcatcatcat" where the upper case strings are a
restriction site I use normally for cloning purposes.

I am interested in using biopython to write a pdf based (using reportlab)
plasmid vector map drawing utility for all the sequences in my database.

I am just getting started and was wondering why the Sequence object "find"
still behaves like an ordinary python string find for eg.

>>> from Bio.Seq import Seq
>>> raw_seq_mixed_case = "atgCTCGAGcatcatcatcatcat"
>>> from Bio.Alphabet import IUPAC
>>> my_seq = Seq(raw_seq_mixed_case, IUPAC.unambiguous_dna)
>>> my_seq.find("ctcgag")
>>> my_seq.find("CTCGAG")

Along these lines , this does not work either.
>>> search_sequence = Seq("ctcgag",IUPAC.unambiguous_dna)
>>> my_seq.find(search_sequence)
>>> my_seq.find(search_sequence.tostring())
>>> my_seq.find(search_sequence.tostring().upper())

I wonder if I am doing something wrong.

It seems strange that the Seq object would behave like a python String
after going through the  process of telling it that it is
"unambiguous_dna". Didnt want to roll my own solution for handling
sequences etc and would prefer playing along with biopython conventions.

Thanks for your help

More information about the Biopython mailing list