[Biopython] gff3: feature.location.end problem
Brad Chapman
chapmanb at 50mail.com
Wed May 29 12:32:29 EDT 2013
Mic;
>> ##gff-version 3
>> ##sequence-region ID1 1 20
>> ID1 prediction gene 1 20 10.0 + .
>> other=Some,annotations;ID=gene1
[...]
>> I get the following output:
>> ID1 1 20
>> gene1 0 20
>>
>> Why is it not "gene1 0 19" and "ID1 0 19"?
> That looks correct, just like when parsing a GenBank/EMBL
> feature with a location string 1..20 you'd get the start as 0
> and the end as 20 in Biopython. This is using Python style
> slice notation - the start is inclusive and the end is exclusive
> meaning sequence[0:20] will give the first 20 bases as you
> would expect for this location.
Peter is right on with the conversion information: you expect this to be
0, 20. This is Python 0-based indexing so you convert from GFF 1-based
by subtracting from the start base.
The code wasn't doing anything special with the sequence-region
directive which is why they stay as a raw parse of the test: 1 20. I
agree it would be useful to convert these to 0-based for consistency. I
pushed a fix which handles this as well:
https://github.com/chapmanb/bcbb/commit/51e7f2742059608f98d948fca5b342a9edf9e7a8
Thanks for the feedback,
Brad
More information about the Biopython
mailing list