[Biopython] gff3 problem

Brad Chapman chapmanb at 50mail.com
Tue Apr 5 13:22:47 UTC 2011


Michal;

> I have found http://www.biopython.org/wiki/GFF_Parsing  for
> BioPython in order to read GFF3 files. 

Thanks for trying out the GFF parser and for the feedback.

> How can I access exon and cds information from gff3 file?

These are stored as sub_features of the features on each record.
The GFF parser does the work of nesting exons and CDSs within their
parent features, using the parent/child relationships in GFF3.

> Why does start position is always one less than in the gff3 file,
> but the end position is the same?

As Peter mentioned, we convert to standard python 0-based
coordinates; this helps maintain consistency throughout your
code.

> Why do not I get Note=Elongation factor P (EF-P)...?

These are stored in the qualifiers attribute of each feature.

To demonstrate, if we modify your code slightly:

in_handle = open(in_file)
for rec in GFF.parse(in_handle):
    for feature in rec.features:
        print feature.type, feature.location
        print feature.qualifiers
        for sub_feature in feature.sub_features:
            print " ", sub_feature.type, sub_feature.location
in_handle.close()

This will print out details of each feature. For instance, here is
a gene with exon sub_features:

gene [2234:3344]
{'Note': ['Elongation factor P (EF-P) family protein n:2 Tax:Arabidopsis RepID:D7L774_ARALY'],
 'source': ['x'], 'ID': ['BC-x.1'], 'Name': ['BC-x.1']}
  exon [2234:2279]
  exon [2422:2535]
  exon [2609:2691]
  exon [2762:2864]
  exon [2971:3049]
  exon [3125:3251]
  exon [3320:3344]

Hope this helps,
Brad



More information about the Biopython mailing list