[Biopython] Additions to the SeqRecord

Brad Chapman chapmanb at 50mail.com
Fri Nov 13 08:23:46 EST 2009


Hi Peter;

[...Discussion on what to do with full length features and annotations
 when slicing SeqRecords...]

> Exactly - SeqFeatures entirely within the sliced region are kept. Those
> outside the sliced region (or crossing the boundary) are lost. As a result,
> because GenBank-style source feature span the whole sequence, they
> are lost on slicing to a sub-sequence. This is the current behaviour and
> I wasn't suggesting any changes.
>
> General annotation in the SeqRecord's annotation dictionary has no
> location information - it may apply to the whole sequence (e.g from
> organism X) or just part (e.g. a text note it contains XXX domain).
> Likewise the database cross reference list.
> 
> The dbxref list and annotations dict are thus the hardest to handle -
> the only practical automatic actions on slicing are to discard them
> (the current behaviour on Biopython 1.50 to date), or keep them all
> as per my suggestion (which as you stress, is risky).

Good discussion. Agreed that copying may be confusing. One hybrid
approach is to provide a function make makes copying them easy if
someone does want to save the annotations, dbxrefs and full length
feature sources:

sliced = rec[:100]
sliced.set_full_length_features(rec)

where set_full_length_features copied over the annotations and
dbxrefs, ala your code example:

deletion_mutant.dbxrefs = record.dbxrefs[:]
deletion_mutant.annotations = record.annotations.copy()

and perhaps also added any whole sequence sequence features from the
original SeqRecord. This would help with discoverability for people
who do want to retain all of the source and other high level information
when they slice.

Brad


More information about the Biopython mailing list