[Biopython] gff3: feature.location.end problem

Peter Cock p.j.a.cock at googlemail.com
Tue May 28 08:41:58 UTC 2013


On Tue, May 28, 2013 at 2:49 AM, Mic <mictadlo at gmail.com> wrote:
> Hi,
> When parsing this gff3 file:
>
> ##gff-version 3
> ##sequence-region ID1 1 20
> ID1     prediction      gene    1       20      10.0    +       .
>  other=Some,annotations;ID=gene1
> ID1     prediction      exon    1       5       .       +       .
>  Parent=gene1
> ID1     prediction      exon    16      20      .       +       .
>  Parent=gene1
>
>
> with this code:
>
> from BCBio import GFF  # handles GFF files
>
> with open("test.gff3") as file:
>     for rec in GFF.parse(file):
>         annotations = rec.annotations['sequence-region'][0]
>         id = annotations[0]
>         start = int(annotations[1])
>         end = int(annotations[2])
>         print id, start, end
>
>         for feature in rec.features:
>             contig_id = feature.qualifiers['ID'][0]
>             print contig_id, int(feature.location.start),
> int(feature.location.end)
>
> I get the following output:
> ID1 1 20
> gene1 0 20
>
>
> Why is it not "gene1 0 19" and "ID1 0 19"?
>
> Thank you in advance.
>
> Mic

Hi Mic,

That looks correct, just like when parsing a GenBank/EMBL
feature with a location string 1..20 you'd get the start as 0
and the end as 20 in Biopython. This is using Python style
slice notation - the start is inclusive and the end is exclusive
meaning sequence[0:20] will give the first 20 bases as you
would expect for this location.

Peter



More information about the Biopython mailing list