[Biopython] sequence coordinate mapping

Peter biopython at maubp.freeserve.co.uk
Fri Jun 18 13:39:04 UTC 2010


On Fri, Jun 18, 2010 at 1:58 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Reece and Peter;
>
> Peter wrote:
>> Something like this? This implements __contains__ on the SeqFeature
>> so that you can check if a simple location (integer) is within a feature.
>> http://github.com/peterjc/biopython/tree/feature-in
>>
>> There is a docstring with examples, just look at the diff here:
>> http://github.com/peterjc/biopython/commit/83c44e8f6ee62a9c5855b603cb3c080d367e23d6
>
> That's nice.

Nice enough to be worth committing in its own right?

> The next part would be remapping the coordinates so
> once you have the feature you can easily address the relative
> position you are interested in.

Perhaps one approach would be to do this in the SeqFeature. If we
define a SeqFeature's length in the natural way, then we have
len(SeqFeature) == len(SeqFeature.extract(parent_seq)).
Now we have two coordinates systems, 0 to len(SeqFeature) and
the regions it describes on the parent sequence. Then we could
discuss a pair of methods on the SeqFeature for converting
between the two coordinate systems. Once you have that, the
special case of amino acid coordinates is much easier to do
(account for where the start codon is, divide by three).

I've made another commit on the __contains__ branch to
also implement __len__ for the SeqFeature:
http://github.com/peterjc/biopython/commit/74b264acacd228d64859d28d75e2c30a8030d03f

Peter



More information about the Biopython mailing list