[Biopython] sequence coordinate mapping

Peter biopython at maubp.freeserve.co.uk
Wed Jun 23 09:16:38 UTC 2010


On Fri, Jun 18, 2010 at 7:19 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Fri, Jun 18, 2010 at 7:00 PM, Reece Hart <reece at berkeley.edu> wrote:
>> Thanks, all, for feedback. I'm still digesting some of the previous
>> comments. For the purposes of discussion, I've attached the crude
>> (pre-crude, even) implementation that I mentioned.
>
> Thanks
>
>> Caveats/ToDos:
>> * The interface is sufficient for my needs, but for a large number of CDS
>> subfeatures, it might make sense to change the implementation index
>> rather than linear search.
>
> It looks like the core idea you are using is the same - loop over the exons
> (subfeatures) to keep track of where you are.
>
>> * I ignore strand for the moment.
>
> That makes like a bit more fun! I haven't tested my code on mixed
> strand features yet (e.g. some crazy tRNA annotation I've seen).
>
>> * I don't use SeqFeature.AbstractPosition and friends.
>
> Unfortunately they crop up in lots of real world GenBank/EMBL files,
> so anything we add to the SeqFeature object has to cope with them.
> Things like GFF3 files avoid this of course.

I should also point out that if accessing location positions in your own
code, using nofuzzy_start and nofuzzy_end is better since they give
the appropriate integer values. i.e. change this:

sf.location.start.position,sf.location.end.position

to:

sf.location.nofuzzy_start,sf.location.nofuzzy_end

That should then take care of the fuzzy locations as best as possible.

Peter



More information about the Biopython mailing list