[Biopython] Gene postion is shifting one nucleotide after using BCBio's GFF parser

Bastian Greshake bgreshake at googlemail.com
Sat Jan 28 00:55:22 UTC 2017


Hey there,
that’s because GFFs use a 1-based index, while the FeatureLocation’s in BioPython use a zero-based index, see: http://biopython.org/DIST/docs/api/Bio.SeqFeature.FeatureLocation-class.html, specifically 

"Note that the start and end location numbering follow Python's scheme, thus a GenBank entry of 123..150 (one based counting) becomes a location of [122:150] (zero based counting).“

Hope that helps!

Cheers,
Bastian

—
www.ruleofthirds.de

While I may be sending this email outside my normal office hours, I have no expectation to receive a reply outside yours.

> On 28 Jan 2017, at 01:42, Islam Amin <eng.islamamin at gmail.com> wrote:
> 
> Dear All,
> 
> I'm new with parsing gff files while I'm trying to parse the gff files I found that the start position of the gene is 8774510 instead of 8774511 (in the original files), Could any one explain that for me:
> 
>> grep "gene:Bra000001" Brassica_rapa.IVFCAASv1.34.chr.gff3
> 
>> A03 brad gene 8774511 8777095 . + . ID=gene:Bra000001;biotype=protein_coding;description=AT2G37440 (E%3D7e-179) | endonuclease/exonuclease/phosphatase family protein ;gene_id=Bra000001;logic_name=glean
> A03 brad mRNA 8774511 8777095 . + . ID=transcript:Bra000001.1;Parent=gene:Bra000001;biotype=protein_coding;transcript_id=Bra000001.1
> 
> 
> after using the following script 
> =============================
> from BCBio import GFF
> in_file = "chr.gff3"
> limits = dict(gff_type = ["gene","mRNA","exon"])
> gff_handle = open(in_file)
> for rec in GFF.parse(gff_handle,target_lines=1000,limit_info=limits):
>    for gene_feature in rec.features:
> 	if gene_feature.id == 'gene:Bra000001':
> 		print(gene_feature.id,gene_feature.location)
> ============================
> The result in the following, tell us that the start position for the same gene is 8774510 instead of 8774511
>> ('gene:Bra000001', FeatureLocation(ExactPosition(8774510), ExactPosition(8777095), strand=1)
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython




More information about the Biopython mailing list