[Biopython] sequence coordinate mapping
Peter
biopython at maubp.freeserve.co.uk
Wed Jun 23 05:16:38 EDT 2010
On Fri, Jun 18, 2010 at 7:19 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Fri, Jun 18, 2010 at 7:00 PM, Reece Hart <reece at berkeley.edu> wrote:
>> Thanks, all, for feedback. I'm still digesting some of the previous
>> comments. For the purposes of discussion, I've attached the crude
>> (pre-crude, even) implementation that I mentioned.
>
> Thanks
>
>> Caveats/ToDos:
>> * The interface is sufficient for my needs, but for a large number of CDS
>> subfeatures, it might make sense to change the implementation index
>> rather than linear search.
>
> It looks like the core idea you are using is the same - loop over the exons
> (subfeatures) to keep track of where you are.
>
>> * I ignore strand for the moment.
>
> That makes like a bit more fun! I haven't tested my code on mixed
> strand features yet (e.g. some crazy tRNA annotation I've seen).
>
>> * I don't use SeqFeature.AbstractPosition and friends.
>
> Unfortunately they crop up in lots of real world GenBank/EMBL files,
> so anything we add to the SeqFeature object has to cope with them.
> Things like GFF3 files avoid this of course.
I should also point out that if accessing location positions in your own
code, using nofuzzy_start and nofuzzy_end is better since they give
the appropriate integer values. i.e. change this:
sf.location.start.position,sf.location.end.position
to:
sf.location.nofuzzy_start,sf.location.nofuzzy_end
That should then take care of the fuzzy locations as best as possible.
Peter
More information about the Biopython
mailing list