[Biopython] Start and end of feature locations in circular sequences

Peter Cock p.j.a.cock at googlemail.com
Thu Aug 31 05:46:09 EDT 2023


Hello Jan,

Yes, we have talked about this - see e.g.
https://github.com/biopython/biopython/issues/897

That has a couple of workarounds, but perhaps you'd like to comment on if
we need biological start/end properties as well, and how you would name
them?

This probably depends on what you want to use the values for - the main use
case I can think of is extracting the described sequence which is handled
for you via the extract method.

For other usages like drawings and finding overlaps, I think the current
left/right style start/end are more useful.

Peter


On Thu, Aug 31, 2023 at 6:42 AM Jan T. Kim <jttkim at googlemail.com> wrote:

> Hi All,
>
> I've recently encountered features in circular sequences that start near
> the end of the (probably arbitrarily) linearised sequence and end near
> its start. For an example see the first CDS feature in [1] (locus tag
> "X600_gp001"):
>
>     join(139629..139738,1..196)
>
> To my surprise, the start attribute of this feature's location is 0,
> and its end attribute is the end of the sequence:
>
>     >>> f1.location.start
>     ExactPosition(0)
>     >>> f1.location.end
>     ExactPosition(139738)
>
> So by using the start and end positions of the feature, without checking
> whether its location is compound and going through the parts in this
> case, it appears that the feature is comprised of the entire sequence (!!).
>
> Technically, the findings above are consistent with the documentation which
> states that start and end give the minimal and maximal positions occurring
> in
> a feature, respectively.
>
> This behaviour is not quite consistent with my expectations in this case,
> however. Is there any way (attribute, method or whatever) to detect whether
> a feature straddles the cut point of a circular sequence? I realise that
> when taking non-exact positions into account and when making no assumptions
> about the ordering of parts, such a check can be difficult and may not
> have a well defined result in all cases, but on the other hand I don't
> think it's likely that I'm the first person requiring such a check...?
>
> My main objective with this post is to find out whether there's anyting
> in Biopython that does this type of job already. If there isn't I'll
> code up some heuristic.
>
> Best regards, Jan
>
>
> [1] https://www.ncbi.nlm.nih.gov/nuccore/NC_022920.1/
>
> _______________________________________________
> Biopython mailing list  -  Biopython at biopython.org
> https://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20230831/7950e029/attachment.htm>


More information about the Biopython mailing list