[Biopython-dev] [Bug 1747] GenBank parser is very slow and memory
hungry for large input files
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Wed Mar 9 21:39:47 EST 2005
http://bugzilla.open-bio.org/show_bug.cgi?id=1747
------- Additional Comments From dalke at dalkescientific.com 2005-03-09 21:39 -------
I think the history has shown that the idea of Martel, while interesting, has had problems in its
implementation. It could only be fixed with a lot of effort. Hand-written code to do the same parsing
doesn't have the purity to it but is easier to maintain, and easier to understand by a wider number of
people.
I think also that the Martel grammers I developed were too nit-picky and there are places where
perhaps it should have been a bit looser.
So I have no qualms with getting rid of Martel as the patcher suggests.
>From an email I wrote recently on the topic, included here for the record
Martel hasn't panned out as well as I had hoped. I think
I know the reasons:
- regexps are hard to write and debug
Could be improved with some sort of development/
testing environment
- Martel's grammars are hard to edit
When a grammar changes it's not possible to say "the
new format is the old format but change this one
bottom level node". I'm actually considering
switching over to a DOM-style description of the
tree so I can use XSLT as the editing language.
Except that I think XSLT's grammar is clumsy and ugly.
- Martel needs everything in memory
I implemented a hack to parse a record at a time but
it's a hack and fails (except on large memory machines)
for people who want to read a chromosome at a time.
I would also like it to be feed based instead of
pull based.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list