[Biopython-dev] KEGG Gene Parser

Michael Maibaum mike at maibaum.org
Wed Nov 17 07:33:42 EST 2004

	From: 	  mike at maibaum.org
	Subject: 	KEGG Gene Parser
	Date: 	17 November 2004 11:23:08 GMT
	To: 	  biopython-dev at biopython.org


I've been working on a KEGG Gene parser attached and wondered if it 
would be of use to the project as a whole. It is based on the existing 
Bio.KEGG.Compound/Enzyme modules. I'm still in the process of testing 
the parser against more kegg files but it I believe it parses all the 
current kegg gene files successfully (except c.hominis).

Known bugs and missing features:
	A method for outputting a record as straight text (__str__)
	Properly parsing the CODON_USAGE section.
	More refinement of expressions for POSITION section.
	Methods to handle CODON_USAGE and POSITION callbacks
	Parsing c.hominis aa_sequences (I'm not sure this is exactly a bug, 
see below)

I'm not really interested in codon usage, position and returning a 
record in string form (for my current needs), so I haven't spent the 
time handling this stuff. It should be fairly easy to add if someone 
cares enough. Otherwise I may get around to it one day.

I know c.hominis parsing is broken because they have very odd aa_seq 
entries and I'm trying to figure out if they have a broken program 
creating the file, mean something sensible, or it is just another dumb 
stretching of an inadequate flat-file format designed to test the 
patience of people writing parsers.

I think it could do with some polishing and the parsing regexes could 
probably be optimised a fair bit but it is useful as it is.

The files are available at

I'm continuing to work on the module to fix any parsing errors I come 
across with further testing/usage and will post updates soon. I'd be 
grateful for any comments or suggestions (be nice, I've only been using 
Python a little while ;) )



Dr Michael Maibaum
Department of Biochemistry and Molecular Biology, UCL
email: maibaum at biochemistry.ucl.ac.uk

More information about the Biopython-dev mailing list