[Biopython] Sequence object "find" is still case specific?

hari jayaram harijay at gmail.com
Sun Mar 3 18:34:26 UTC 2013


I am relatively new to biopython having not used it for a while. I have the
"bad" habit of storing sequences in an internal database with mixed case
strings i.e "atgCTCGAGcatcatcat" where the upper case strings are a
restriction site I use normally for cloning purposes.

I am interested in using biopython to write a pdf based (using reportlab)
plasmid vector map drawing utility for all the sequences in my database.


I am just getting started and was wondering why the Sequence object "find"
still behaves like an ordinary python string find for eg.


>>> from Bio.Seq import Seq
>>> raw_seq_mixed_case = "atgCTCGAGcatcatcatcatcat"
>>> from Bio.Alphabet import IUPAC
>>> my_seq = Seq(raw_seq_mixed_case, IUPAC.unambiguous_dna)
>>> my_seq.find("ctcgag")
-1
>>> my_seq.find("CTCGAG")
3

Along these lines , this does not work either.
>>> search_sequence = Seq("ctcgag",IUPAC.unambiguous_dna)
>>> my_seq.find(search_sequence)
-1
>>> my_seq.find(search_sequence.tostring())
-1
>>> my_seq.find(search_sequence.tostring().upper())
3

I wonder if I am doing something wrong.

It seems strange that the Seq object would behave like a python String
after going through the  process of telling it that it is
"unambiguous_dna". Didnt want to roll my own solution for handling
sequences etc and would prefer playing along with biopython conventions.

Thanks for your help
Hari



More information about the Biopython mailing list