[Biopython] gff3: feature.location.end problem
Peter Cock
p.j.a.cock at googlemail.com
Tue May 28 04:41:58 EDT 2013
On Tue, May 28, 2013 at 2:49 AM, Mic <mictadlo at gmail.com> wrote:
> Hi,
> When parsing this gff3 file:
>
> ##gff-version 3
> ##sequence-region ID1 1 20
> ID1 prediction gene 1 20 10.0 + .
> other=Some,annotations;ID=gene1
> ID1 prediction exon 1 5 . + .
> Parent=gene1
> ID1 prediction exon 16 20 . + .
> Parent=gene1
>
>
> with this code:
>
> from BCBio import GFF # handles GFF files
>
> with open("test.gff3") as file:
> for rec in GFF.parse(file):
> annotations = rec.annotations['sequence-region'][0]
> id = annotations[0]
> start = int(annotations[1])
> end = int(annotations[2])
> print id, start, end
>
> for feature in rec.features:
> contig_id = feature.qualifiers['ID'][0]
> print contig_id, int(feature.location.start),
> int(feature.location.end)
>
> I get the following output:
> ID1 1 20
> gene1 0 20
>
>
> Why is it not "gene1 0 19" and "ID1 0 19"?
>
> Thank you in advance.
>
> Mic
Hi Mic,
That looks correct, just like when parsing a GenBank/EMBL
feature with a location string 1..20 you'd get the start as 0
and the end as 20 in Biopython. This is using Python style
slice notation - the start is inclusive and the end is exclusive
meaning sequence[0:20] will give the first 20 bases as you
would expect for this location.
Peter
More information about the Biopython
mailing list