[Biopython-dev] ToEol in Martel
Brad Chapman
chapmanb at arches.uga.edu
Fri Mar 30 20:40:02 EST 2001
Cayte and Andrew;
Thanks much for the feedback on EventGenerator. I've updated it
(changes in CVS) to, I think, handle your concerns. Generally, what I
did was change it so instead of trying to combine multiple with
spaces, it just collects up all of these lines and returns them as a
list. So, if we have Martel output that looks like:
<sequence>EKLAD</sequence>
<sequence>WERNDA</sequence>
EventGenerator will now call the consumer with a list like:
["EKLAD", "WERNDA"]
This way, you can deal with multiple lines on a case by case basis in
the consumer, if necessary.
Additionally, I added the ability to pass a "finalizer" function to
EventGenerator which, if present, will be called before a list of
information is returned. This way, if you always want to reformat
things (as I do in GenBank), then you can still do this. The finalizer
function gets passed the list of lines, and can do whatever it wants
with it.
If this doesn't seem like a good solution, let me know and we can work
on it more. I added tests to test_ParserGenerator, if you want to see
specific cases (besides GenBank) of how it works.
Andrew:
> However, I have some problems with how it does it.
> Specifically, how it handles whitespace. By default
> it strips whitespace from the ends of the element's
> contents and it merges multiple elements with the same
> tag name into a single string seperated by whitespace.
I agree with you (and Jeff, and probably everyone else in the world
:-). I shouldn't muck with whitespace by default. Bad Brad, bad!
Hopefully the new EventGenerator code more faithfully keeps it.
> There are a couple problems with the automatic joining
> of successive tag code. It is used for cases like:
[...lots of examples of how auto joining can mess up...]
Yup, I'm in full agreement with you here as well. My GenBank solution
is definately a quick-n-dirty one. I think it handles most cases
fairly well. Hopefully with this new format, specific heuristics for
specific cases can be introduced into the Biopython consumer classes,
if necessary. But, yeah, it is an ugly problem all around.
I'm trying to push this problem back into the Consumer classes and not
deal with it in EventGenerator. EventGenerator was just meant to do two
things:
1. Be a general way to turn Martel events into Biopython-type events.
2. Handle stuff that runs over multiple lines, so that this kind of
code wouldn't have to go into Biopython-consumers.
I think it does these things a little better now :-)
[style changes]
> Please change this to use
> callback_function = getattr(self._consumer, name)
> This is faster and safer. Evals can do nasty things,
> like if name is
> abc + __import__("shutil").rmtree("/")
>
> Also, you might want to change
> if name in self.flags.keys():
> to
> if self.flags.has_key(name):
Thanks for the pointers. Style changes are always very welcome. I had
been fighting in the past to get getattr to work right (I'm pretty
sure you mentioned this to me previously). Thanks!
Thanks again for the feedback on this. Hope this solution is workable!
Brad
More information about the Biopython-dev
mailing list