[Biopython] SeqIO feature.location.start and end for genes spanning origin

Peter Cock p.j.a.cock at googlemail.com
Thu May 8 17:19:23 UTC 2014


On Thu, May 8, 2014 at 5:02 PM, Richard Llewellyn <llewelr at gmail.com> wrote:
> I was surprised to find that the start and end of a SeqIO
> record.feature.location for a gene spanning the origin was
> given as 0 and the end the length of the circular chromosome
> (see below).
>
> I know it is difficult to deal with features spanning the origin, and
> imagine that there are issues if the start location is given as greater
> than the end.

Yes. The Biopython model has start <= end, regardless of
strand - like GFF etc.

> I wonder if you have a suggested work around.
>
> Off the top of my head, I could test whether the feature.location is of
> type CompoundLocation, and if so, determine whether it spans the origin
> (for instance, test if the end of one location is chromosome length, start
> of another is 0), and then take the minimum of the former and the max of
> the latter).  Since I am currently working with prokaryotic sequence this
> would just add the type test to each parse, a relatively small overhead.

You could (in theory) have some (trans) splicing going on, but in
most origin wrapping yes, you have a 0/length join point.

It depends what the goal of your code is - if just to get the
sequence described, the extract method does all the hard
work. But generally you are going to have to special case
features wrapping the origin - however the parser/object
model handled it.

What numbers are you hoping to get out of this location?

> Thanks for the great work.
>
> #####################################
>
>
> I ran into this problem with Nanoarchaeum equitans Kin4-M,
> http://www.ncbi.nlm.nih.gov/nuccore/38349555,
>
> where parsing the first CDS, location.start is 0 and location.end is
> 490885.
>
> ...

This is one of my favourite test cases for features wrapping
the origin :)

Peter



More information about the Biopython mailing list