[Biopython] GFF parsing

Peter biopython at maubp.freeserve.co.uk
Fri Feb 26 10:43:54 UTC 2010


On Fri, Feb 26, 2010 at 10:22 AM, John Reid <j.reid at mail.cryst.bbk.ac.uk> wrote:
> The GFF page on the BioPython wiki
> (http://www.biopython.org/wiki/GFF_Parsing) contains the following
> contradictory statements:
>
> Note: GFF parsing is not yet integrated into Biopython. This
> documentation is work towards making it ready for inclusion.
>
> Biopython provides a full featured GFF parser which will handle several
> versions of GFF: GFF3, GFF2, and GTF. It supports writing GFF3, the
> latest version.
>
> As far as I can work out if I have biopython 1.53 and I want to parse
> GFF, I should get the latest version of the parser from:
> http://github.com/chapmanb/bcbb/tree/master/gff
>
> I've tried using this to parse my 40Mb GFF file and it takes a long time.
> From inspecting my GFF file I thought it should be able to parse the records
> independently or does it need to parse the whole file before outputting the
> first record?
>
> Is there a roadmap for biopython anywhere?

Not explicitly no, code development depends very much on time
availability of volunteers. There is a partial list of active projects
here: http://biopython.org/wiki/Active_projects

Regarding the GFF code, Brad and I managed to chat about
this briefly earlier this month, and I think we have agreed in
principle on how to represent feature parent/child relationships
without "breaking" the existing code for GenBank/EMBL join
features. For now the only copy of the code is on Brad's
github - hopefully there will be a development/test branch of
Biopython with this included before too long.

Peter



More information about the Biopython mailing list