[Biopython] GFF parsing: getting features of specific proteins in gff

Philipp Schiffer philipp.schiffer at gmail.com
Sun Jan 19 09:36:53 UTC 2014


Hi all and Brad Chapman in particular,

I just started exploring the GFF parser for some Augustus derived gff3 files, but running into trouble when trying to collect information for a specific protein. Ultimately my goal is to get introns and exons for a specific set of genes. Following the wiki I can replicate everything with my data and have adjusted the following piece to my data:

from BCBio import GFF    
in_file = "your_file.gff"    
limit_info = dict( gff_id = ["chr1"],
 gff_source = ["Coding_transcript"])    
in_handle = open(in_file)  
for rec in GFF.parse(in_handle, limit_info=limit_info):  
print rec.features[0] in_handle.close()

For testing on a subset I changed "chr1" to one of my contig IDs and that works. Then I limited to gff_type = ["intron"] and that also works for my data.

However now I'd like not to print all rec.features, but only for a specific gene. Picked the first one "g1.t1", which is on the contig and is displayed as an id in the printout of all features. It is also contained in the "list" that rec.features appears to be, but apparently you can't do something like `if x in list:` with the rec.features, at least I get an error when trying. I looked through the Biopython tutorial to see if there is an attribute to rec.features that I could query for the id, but somehow that didn’t make me any wiser.
I guess this is just me being thick and newbie, but could anybody point me in the right direction maybe?

Thanks

Philipp



More information about the Biopython mailing list