[Biopython] SeqRecord substring should return SeqRecord or character?

Peter Cock p.j.a.cock at googlemail.com
Wed Jul 11 15:52:41 UTC 2012


On Wed, Jul 11, 2012 at 4:24 PM, Nick Loman <n.j.loman at bham.ac.uk> wrote:
> On Wed, Jul 11, 2012 at 4:21 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> On Wed, Jul 11, 2012 at 4:02 PM, Nick Loman <n.j.loman at bham.ac.uk> wrote:
>>> Hi there
>>>
>>> I wanted to add the last character of a SeqRecord s1 to another
>>> SeqRecord s2. However s1[-1] + s2 fails because s1[-1] returns a
>>> string rather than a SeqRecord just containing a single base and
>>> associated annotations. I have to do s1[-1:] to get a sliced
>>> SeqRecord.
>>
>> You should be able to do SeqRecord+string, and string+SeqRecord,
>> both of which are specifically tested in the docstring. Have you got
>> any more details? e.g. Version? Mini-example?
>
> Hi Peter,
>
> It was doing this on a FASTQ record so it's the missing quality
> annotation that cause the problem when trying to do this.

Ah - so the addition should have worked, but you'd lose the
partial quality string. You're stuck with ensuring you have two
SeqRecords, so as you suggested rather than s1[-1]+s2 please
use s1[-1:]+s2 instead. Slightly less clear, but only character more.

This actually reminds me of similar behaviour with the bytes
string in Python 3, where the same trick is required to get a
single letter bytes string.

>>> Is this behaviour intentional? I kind of assumed I would always get a
>>> SeqRecord from any given slice, and it's seems weird to get just a
>>> string back instead, although no doubt there's a good reason for this.
>>
>> For a single base/residue, the whole SeqRecord overhead does
>> seem unnecessary. As to why you get a single letter string, not
>> a single letter Seq, IIRC it was mimicking the Seq object.
>
> Yes, I guessed the overhead was likely to be the reason ..  not sure
> if there's a satisfactory solution?

Returning a single letter SeqRecord have might been a better
choice, and going back much further in Biopython's history the
Seq object should probably have returned a single letter Seq
(not a single letter string). There is a similar issue with the
columns of an alignment.

Peter



More information about the Biopython mailing list