[BioPython] parsing with Martel

12 May 2002 14:29:01 -0500

I've been playing around with Martel, and have written a couple of
parser that work pretty well.  I'm running into a problem, however, in
that the format I'm parsing (from the FASTA alignment program) has a lot
of cruft that I don't want / need parsed.  I've written the parser so
that it just sticks tags around the crucial data, but when I actually
parse the file, the unwanted stuff (parentheses, etc.) is screwing up
the parse.  I'm not sure exactly why, but if I've got something like:

<tag> DATA </tag>;

where the semicolon at the end is unwanted, the semicolon ends up in a
TEXT node in the parsed xml.  I'm a bit confused about this, as I was
(naiively) under the impression that things like xml.sax.ContentHandler
don't care about untagged stuff.

I guess what I would like to do is be able to post-filter the output,
removing everything that remains untagged after converting the file to
xml.  Is there a built-in mechanism for this?

Thoughts / Ideas / Suggestions ???

Jay

-- 
______________________________________________________
Jay Hesselberth 

University of Texas    2500 Speedway
MBB 3.424 / A4800      Austin, TX  78712

phone: 512-471-6445    email: jhessel@ellingtonlab.org