[Biopython] Gene postion is shifting one nucleotide after using BCBio's GFF parser

Islam Amin eng.islamamin at gmail.com
Sat Jan 28 00:42:58 UTC 2017


Dear All,

I'm new with parsing gff files while I'm trying to parse the gff files I
found that the start position of the gene is *8774510 instead of **8774511
(in the original files)*, Could any one explain that for me:

> grep "gene:Bra000001" Brassica_rapa.IVFCAASv1.34.chr.gff3

>A03 brad gene *8774511* *8777095* . + . ID=gene:Bra000001;biotype=prot
ein_coding;description=AT2G37440 (E%3D7e-179) |
endonuclease/exonuclease/phosphatase family protein
;gene_id=Bra000001;logic_name=glean
A03 brad mRNA 8774511 8777095 . + . ID=transcript:Bra000001.1;Pare
nt=gene:Bra000001;biotype=protein_coding;transcript_id=Bra000001.1


after using the following script
=============================
from BCBio import GFF
in_file = "chr.gff3"
limits = dict(gff_type = ["gene","mRNA","exon"])
gff_handle = open(in_file)
for rec in GFF.parse(gff_handle,target_lines=1000,limit_info=limits):
    for gene_feature in rec.features:
if gene_feature.id == 'gene:Bra000001':
print(gene_feature.id,gene_feature.location)
============================
The result in the following, tell us that the start position for the same
gene is *8774510 instead of **8774511*
>('gene:Bra000001', FeatureLocation(ExactPosition(*8774510*), ExactPosition(
*8777095*), strand=1)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20170128/1bb5238c/attachment.html>


More information about the Biopython mailing list