[BioPython] about the SeqRecord slicing

Peter biopython at maubp.freeserve.co.uk
Thu Mar 26 08:05:25 EDT 2009


On Thu, Mar 26, 2009 at 11:48 AM, Jose Blanca <jblanca at btc.upv.es> wrote:
> Hi:
> I'm working with the SeqRecord slicing from cvs and I think that the behaviour
> could be sligthly changed. In fact that same opinion is written in the
> __getitem__ method:
>
>        if isinstance(index, int) :
>            #NOTE - The sequence level annotation like the id, name, etc
>            #do not really apply to a single character.  However, should
>            #we try and expose any per-letter-annotation here?  If so how?
>            return self.seq[index]
>
> I don't like the fact that the SeqRecord returns different classes depending
> on the index type. I think is better to return always a SeqRecord because:
> - It simplifies the interface. It's easier to deal with the SeqRecord class if
> its behaviour is simple. Otherwise we have to check in the code that uses the
> SeqRecord if it's returning an str or a SeqRecord.
> - It looses the per-letter-annotation. I'm working with qualities and I'm
> interested in keeping them.
> - It's redundant because if we want to slice the seq property we can do it
> with: seqrec.seq[index]
> Best regards,

Hi Jose,

As we are talking about the CVS code, maybe this could have been on
the dev mailing list, but as its of general interest let's carry on
here for now.

You note that (currently in CVS) the new SeqRecord slicing returns a
SeqRecord for a slice, but a single letter string for a single integer
index.

This isn't so different from the Seq object - it returns a new Seq
object for a slice, but a single letter string for a single integer
index:
>>> from Bio.Seq import Seq
>>> s = Seq("ACGT")
>>> s
Seq('ACGT', Alphabet())
>>> s[0]
'A'
>>> s[0:3]
Seq('ACG', Alphabet())

More generally, consider lists in Python:
>>> x = [1,2,3,4,5]
>>> x[0]
1
>>> x[0:3]
[1, 2, 3]

So I don't agree with this expectation that slicing and indexing a
SeqRecord should automatically both give a SeqRecord.  You really want
a SeqRecord for a single character string?

Can you give me an example of where you want to pull out a single
character from a SeqRecord, and its quality?  I would consider things
like this quite elegant:

for letter, quality in zip(record.seq,
record.letter_annotations("phred_quality") :
   #do stuff

Peter



More information about the BioPython mailing list