[Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets

Wed Dec 12 05:04:28 EST 2012

Issue #3395 has been updated by Michał Nowotka.

File minimal_data.pkl added

Minimal test case with stripped django dependencies, loading code below:

        from Bio import trie
        import pickle

        f = open('minimal_data.pkl', 'r')
        list = pickle.load(f)
        f.close()

        index = trie.trie()

        for item in list:
            for chunk in item[0].split('/')[1:]:
                if len(chunk) > 2:
                    if index.get(str(chunk)):
                        index[str(chunk)].append(item[1])
                    else:
                        index[str(chunk)] = [item[1]]

        f = open('trie.dat', 'w')
        trie.save(f, index)
        f.close()

        f = open('trie.dat', 'r')
        new_trie = trie.load(f)
        f.close()
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Michał Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 

Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?

-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org