[Biopython-dev] KEGG Gene Parser
Michael Maibaum
mike at maibaum.org
Wed Nov 17 07:33:42 EST 2004
From: mike at maibaum.org
Subject: KEGG Gene Parser
Date: 17 November 2004 11:23:08 GMT
To: biopython-dev at biopython.org
Hi,
I've been working on a KEGG Gene parser attached and wondered if it
would be of use to the project as a whole. It is based on the existing
Bio.KEGG.Compound/Enzyme modules. I'm still in the process of testing
the parser against more kegg files but it I believe it parses all the
current kegg gene files successfully (except c.hominis).
Known bugs and missing features:
A method for outputting a record as straight text (__str__)
Properly parsing the CODON_USAGE section.
More refinement of expressions for POSITION section.
Methods to handle CODON_USAGE and POSITION callbacks
Parsing c.hominis aa_sequences (I'm not sure this is exactly a bug,
see below)
I'm not really interested in codon usage, position and returning a
record in string form (for my current needs), so I haven't spent the
time handling this stuff. It should be fairly easy to add if someone
cares enough. Otherwise I may get around to it one day.
I know c.hominis parsing is broken because they have very odd aa_seq
entries and I'm trying to figure out if they have a broken program
creating the file, mean something sensible, or it is just another dumb
stretching of an inadequate flat-file format designed to test the
patience of people writing parsers.
I think it could do with some polishing and the parsing regexes could
probably be optimised a fair bit but it is useful as it is.
The files are available at
<http://www.gene-hacker.net/python/__init__.py>
<http://www.gene-hacker.net/python/gene_format.py>
I'm continuing to work on the module to fix any parsing errors I come
across with further testing/usage and will post updates soon. I'd be
grateful for any comments or suggestions (be nice, I've only been using
Python a little while ;) )
cheers
Michael
--
Dr Michael Maibaum
Department of Biochemistry and Molecular Biology, UCL
email: maibaum at biochemistry.ucl.ac.uk
More information about the Biopython-dev
mailing list