[BioPython] Making the Seq object act more like a string

Peter biopython at maubp.freeserve.co.uk
Wed Aug 22 11:53:59 EDT 2007


Sebastian Bassi wrote:
> On 8/22/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> A couple of times (on bugs or the developers mailing list), Michiel de
>> Hoon has previously suggested we could make the Seq class (Bio.Seq.Seq)
>> a subclass of python string. I agree with him - the Seq object should
>> act more like a string.
> 
> I agree. Seq acting more like a str would also lower the entry level
> to use biopython for non OOP seasoned programmers.

Good :)

>> As a simple example, although there are functions in Biopython to
> ...
> 
> Here is another example (from
> http://www.biopython.org/wiki/SeqIO#Using_the_SEGUID_checksum ):

I added that to the wiki recently - although it is perhaps premature 
given your CheckSum code hasn't been officially release yet. This was my 
draft, moving/adding it to the tutorial is on my to do list.

> Current situation:
> 
> from Bio import SeqIO
> from Bio.SeqUtils.CheckSum import seguid
> seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), "genbank"),
>                             lambda rec : seguid(rec.seq))
> record = seguid_dict["MN/s0q9zDoCVEEc+k/IFwCNF2pY"]
> print record.id
> print record.description
> 
> If seq were more like a string:
> 
> ...
> seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), "genbank"),
>                             seguid(rec.seq))

Nope ;)

You have to give a function to the key_function argument in 
SeqIO.to_dict(), and in your example seguid(rec.seq) would be a string 
(the result of the seguid function acting on a seq object). Or at least, 
it would if you had a rec variable in scope.

However, if SeqRecord acted more like a Seq (and therefore more like a 
string) then you could do this which does avoid the lambda:

seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), \
                             "genbank"), seguid)

Or, we could enhance your the CheckSum functions to cope with a 
SeqRecord, a Seq or a string - right now they cope with a Seq or a string.

Peter



More information about the BioPython mailing list