[Biopython-dev] WIT and KEGG
Tarjei S Mikkelsen
tarjei at genome.wi.mit.edu
Sat Aug 11 00:35:04 EDT 2001
> I made these changes to a copy of KEGG/enzyyme_format.py,
>
> html_tag = Expression.Literal( '<' ) + Rep( AnyBut( '>\n\r' ) ) +
> Expression.Literal( '>' )
>
> entry = Group("entry",
> Str1("EC ") +
> Rep( Str( " " ) ) + Opt( html_tag ) +
> Rep(Rep1(Integer()) + point) +
> Rep1(Integer()) +
> Rep( Str( " " ) ) + Opt( html_tag ) )
I'm not too fond of adding this to the format file. HTML markup isn't
part of the KEGG format description, so this seems a bit ad hoc.
Instead I suggest that you either run the input through
File.SGMLHandle or File.SGMLStripper before you pass the
WIT record to KEGG.Enzyme.Parser OR write a separate Parser
class in your WIT module that wraps a ParserSupport.SGMLStrippingConsumer
around KEGG.Enzyme._Consumer.
> The format failed halfway through the file. I think the problem is the
> order of entries. The format specifies GENES before MOTIF but
> this order is
> reversed in the test file. Maybe the format should be less sensitive to
> order ,where it doesn't convey information.
Yeah, the entries are supposed to come in a specified order, but even
the KEGG people don't follow that rule. I've committed a change to
KEGG.Enzyme.enzyme_format.py that assumes very little about entry
ordering. If that's the error, it should work for you now.
Tarjei
More information about the Biopython-dev
mailing list