[Biopython] gff3: feature.location.end problem

Brad Chapman chapmanb at 50mail.com
Wed May 29 16:32:29 UTC 2013


Mic;

>> ##gff-version 3
>> ##sequence-region ID1 1 20
>> ID1     prediction      gene    1       20      10.0    +       .
>>  other=Some,annotations;ID=gene1
[...]
>> I get the following output:
>> ID1 1 20
>> gene1 0 20
>>
>> Why is it not "gene1 0 19" and "ID1 0 19"?

> That looks correct, just like when parsing a GenBank/EMBL
> feature with a location string 1..20 you'd get the start as 0
> and the end as 20 in Biopython. This is using Python style
> slice notation - the start is inclusive and the end is exclusive
> meaning sequence[0:20] will give the first 20 bases as you
> would expect for this location.

Peter is right on with the conversion information: you expect this to be
0, 20. This is Python 0-based indexing so you convert from GFF 1-based
by subtracting from the start base.

The code wasn't doing anything special with the sequence-region
directive which is why they stay as a raw parse of the test: 1 20. I
agree it would be useful to convert these to 0-based for consistency. I
pushed a fix which handles this as well:

https://github.com/chapmanb/bcbb/commit/51e7f2742059608f98d948fca5b342a9edf9e7a8

Thanks for the feedback,
Brad




More information about the Biopython mailing list