[Biopython-dev] [BioPython] about the SeqRecord slicing

Peter biopython at maubp.freeserve.co.uk
Fri Mar 27 15:51:53 UTC 2009


On Fri, Mar 27, 2009 at 3:16 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> One way to allow non-default options for adding and slicing is to provide a
> couple of functions at the class or module level (classmethod, staticmethod,
> plain ol' function) that have the necessary keyword arguments. These
> functions would do the same thing by default as the corresponding syntax,
> and the syntax-friendly magic methods would just pass their arguments
> straight to these functions. This makes the syntax pretty for the common
> cases, and makes the nonstandard stuff visually obvious.
>
> Examples:
>
> my_record.slice(10, 50) == my_record[10:50]
> my_record.slice(10, 50, annotation=True) == my_record[10:50] plus updated
> annotations
> ...

I think I understand your idea, but I'm not very keen on adding slice
and add methods as alternatives to __getitem__ and __add__.

As things stand (with CVS after the change an hour ago), if you want
the annotations dictionary copied with a slice you must do this
explicitly:

>>> from Bio import SeqIO
>>> my_record = SeqIO.read(open("NC_005816.gb"),"genbank")
>>> my_record
SeqRecord(seq=Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG',
IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816',
description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1,
complete sequence.', dbxrefs=['Project:10638'])
>>> len(my_record)
9609
>>> len(my_record.features)
29
>>> len(my_record.annotations)
11
>>> len(my_record.dbxrefs)
1

Doing a slice will not copy/preserve the annotations dict or dbxrefs list:

>>> sub_record = my_record[1000:2000]
>>> sub_record
SeqRecord(seq=Seq('GAAAAAAGAGTATGACGTGCATCTTGATGAAAATCTGGTGAACTTCGACAAACA...GGA',
IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816',
description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1,
complete sequence.', dbxrefs=[])
>>> len(sub_record)
1000
>>> len(sub_record.features)
2
>>> assert not sub_record.annotations and not sub_record.dbxrefs

You can then choose to blindly reuse the annotations and dbxrefs if you want to:

>>> sub_record.annotations = my_record.anntations #shares the dict
>>> sub_record.dbxrefs = my_record.dbxrefs #shares the list

or as a simple copy:

>>> sub_record.annotations = my_record.annotations.copy()
>>> sub_record.dbxrefs = my_record.dbxrefs[:]

The good thing about this is it makes you think about the annotations,
and which (if any) are appropriate to transfer to the sub-record.  As
per my earlier email, maybe we should do the same with the
description?

Peter



More information about the Biopython-dev mailing list