[Biopython] Transferring SeqFeatures between aligned sequences

Peter Cock p.j.a.cock at googlemail.com
Thu Mar 10 22:46:32 UTC 2011


On Thu, Mar 10, 2011 at 10:07 PM, Uri Laserson <laserson at mit.edu> wrote:
> Say I have a SeqRecord called A and a SeqRecord called B.  A has a bunch of
> SeqFeatures associated with it, while B has none.  I perform a gapped
> alignment between the two sequences.  Now I want to copy the SeqFeatures
> from A onto B in a way that respects the coordinates of all the features.
>
> For example (and please use a fixed-width font for this):
> <cut>

I'm not quite sure I followed that figure.

> In sequence A, the coords of Feature 1 and Feature 2 should be (0,7) and
> (10,19), respectively.  Now I want to copy it to sequence B, where the
> feature coords should instead be (3,12) and (15,23).
>
> Is there an easy way to do this in biopython already?

No, but I'm not sure how advisable it is anyway (if I have
understood you right - see below).

> Or are there any ideas for an elegant solution?

I actually wanted to do something similar to this myself.
I had a draft genome I had annotated in GenBank format.
We did some more sequencing and/or I tweaked the
assembly, and I had a new very similar sequence in a
FASTA file, and I wanted to copy the old annotation over.

What I did was look for perfect matches between the regions
spanned by the features (no introns in this case), and that
meant all I needed to do was apply a shift to the SeqFeature
location. There is a (private) method _shift which helped here
(written for use in slicing a SeqRecord).

In my case, that handled most of the annotation, and I did
the nasty cases by hand (since I wanted to examine what
had happened in the new assembly - it was a small genome).

In your case the start and end co-ordinates may be shifted
by different amounts (since you are doing gapped alignments).
This worries me as the length of your features can change.
For any gene or CDS features that is a problem (frame shifts).
Have you thought about that? Perhaps you're dealing with
non-coding features only?

Peter




More information about the Biopython mailing list