[Biopython] sequence coordinate mapping

Peter biopython at maubp.freeserve.co.uk
Fri Jun 18 18:19:59 UTC 2010


On Fri, Jun 18, 2010 at 7:00 PM, Reece Hart <reece at berkeley.edu> wrote:
> Thanks, all, for feedback. I'm still digesting some of the previous
> comments. For the purposes of discussion, I've attached the crude
> (pre-crude, even) implementation that I mentioned.

Thanks

> Caveats/ToDos:
> * The interface is sufficient for my needs, but for a large number of CDS
> subfeatures, it might make sense to change the implementation index
> rather than linear search.

It looks like the core idea you are using is the same - loop over the exons
(subfeatures) to keep track of where you are.

> * I ignore strand for the moment.

That makes like a bit more fun! I haven't tested my code on mixed
strand features yet (e.g. some crazy tRNA annotation I've seen).

> * I don't use SeqFeature.AbstractPosition and friends.

Unfortunately they crop up in lots of real world GenBank/EMBL files,
so anything we add to the SeqFeature object has to cope with them.
Things like GFF3 files avoid this of course.

Peter



More information about the Biopython mailing list