[Biopython-dev] [Bug 1747] GenBank parser is very slow and memory hungry for large input files

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Mon Nov 7 08:15:02 EST 2005


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2005-11-07 08:15 -------
I was aware there was some problem with the NCBIDictionary support (which had
been noted on the mailing list).

The problem with shown by Michael's example (comment 7 on the bug report) is
due to ReseekFile.py only supporting the 'read' method, and not the 'readline'
method.  According to the comments in this file, this is all the Martel parsers

I tried adding the following to the class ReseekFile in ReseekFile.py, and this
seems to fix Michael's example.

     def _readline(self, size):
        """The readline support is just a quick guess..."""
        if size < 0:
            y = self.file.readline()
            z = self.buffer_file.readline() + y
            return z
        if size == 0:
            return ""
        x = self.buffer_file.readline(size)
        if len(x) < size:
            y = self.file.readline(size - len(x))
            return x + y
        return x

    def readline(self, size = -1):
        x = self._readline(size)
        if self.at_beginning and x:
            self.at_beginning = 0
        return x

I want to stress that I'm not entirely sure that my 'readline' code is valid,
it was just my best guess based on how the 'read' method was done.  And the
test functions at the end of the ReseekFile.py file could be extended...

It would help if I had used the NCBIDictionary before ;)

P.S. I have just tried this change and the GenBank/__init__.py patch against
BioPython 1.41 on Linux, and test_GenBank.py passed fine.

P.P.S. Would it be easy to add an offline version of Michael's example using
NCBIDictionary to the GenBank unit test?

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

More information about the Biopython-dev mailing list