[Biopython] GFF parsing

Brad Chapman chapmanb at 50mail.com
Mon Mar 1 13:19:40 UTC 2010


John;

[GFF parser testing]

> For my purposes the python csv module is doing the job. I would prefer 
> to use a proper GFF parser but for the moment your parser is taking 100 
> seconds to parse a 40Mb file and the csv reader is doing it in about 10 
> seconds. Do you think this is reasonable or do you want to take a closer 
> look?

The straight CSV module will always destroy a full featured parser, but
we may be able to get that 10x multiplier down. I'm happy to take a look
if you want to send a pointer to your GFF file; if it's not publicly
available feel free to send a representative subset of it to me off list.

I'd be interested to hear your use case as well. Are there general 
things you want to do for which you had to write code and a supplemental
GFF library would help? The trick with developing a GFF parser is to
provide useful high level functionality, since it is relatively 
easy to split strings and write a one-off solution.

Thanks,
Brad



More information about the Biopython mailing list