[Biopython-dev] WIT and KEGG

Tarjei S Mikkelsen tarjei at genome.wi.mit.edu
Sat Aug 11 00:35:04 EDT 2001


>   I made these changes to  a copy of KEGG/enzyyme_format.py,
> 
>   html_tag = Expression.Literal( '<' ) + Rep( AnyBut( '>\n\r' ) ) +
> Expression.Literal( '>' )
> 
> entry = Group("entry",
>               Str1("EC ") +
>               Rep( Str( " " ) ) + Opt( html_tag ) +
>               Rep(Rep1(Integer()) + point) +
>               Rep1(Integer()) +
>               Rep( Str( " " ) ) + Opt( html_tag ) )

 I'm not too fond of adding this to the format file. HTML markup isn't
part of the KEGG format description, so this seems a bit ad hoc.

 Instead I suggest that you either run the input through 
File.SGMLHandle or File.SGMLStripper before you pass the
WIT record to KEGG.Enzyme.Parser OR write a separate Parser
class in your WIT module that wraps a ParserSupport.SGMLStrippingConsumer
around KEGG.Enzyme._Consumer.
 
>   The format failed halfway through the file.  I think the problem is the
> order of entries.  The format specifies GENES before MOTIF but 
> this order is
> reversed in the test file.  Maybe the format should be less sensitive to
> order ,where it doesn't convey information.

 Yeah, the entries are supposed to come in a specified order, but even
the KEGG people don't follow that rule. I've committed a change to 
KEGG.Enzyme.enzyme_format.py that assumes very little about entry
ordering. If that's the error, it should work for you now.

 Tarjei



More information about the Biopython-dev mailing list