[Biopython-dev] Trie with_prefix doesn't work as expected

Peter Cock p.j.a.cock at googlemail.com
Wed Jan 30 04:42:23 EST 2013


On Wed, Jan 30, 2013 at 2:09 AM, Kevin Wu <kjwu at ucsd.edu> wrote:
> Hi All,
>
> I'm attempting to use the trie implementation in biopython to develop a
> suffix trie. I'm using the with_prefix function to find all keys which
> start with a sequence, however, the function doesn't return values that I
> expect. I tested it with the canonical example "banana" and am a bit
> confused.
>
> from Bio.trie import trie
> t = trie()
> s = "BANANA"
> for i in range(len(s)):  # insert all suffixes into trie
>     t[s[i:]] = i
>
> t.with_prefix("NA")  # this works as expected
>>> ['NA', 'NANA']
>
> t.with_prefix("AN")
>>> ['AN', 'ANNA']  # this doesn't work as expected
>                            # expected output: ["ANANA", "ANA"]
>
> Can anyone clarify my confusion or confirm this bug? I'm on Biopython 1.60,
> Linux Mint 64-bit.

There is certainly something odd happening. I'm testing with the
current code in git (pre-Biopython 1.61) under Mac OS X.

>>> from Bio.trie import trie
>>> t = trie()
>>> s = "BANANA"
>>> for i in range(len(s)):  # insert all suffixes into trie
...     t[s[i:]] = i
...     print "%s -> %i" % (s[i:], i)
...     assert t[s[i:]] == i
...
BANANA -> 0
ANANA -> 1
NANA -> 2
ANA -> 3
NA -> 4
A -> 5
>>> t.values()
[5, 3, 1, 0, 4, 2]
>>> t.keys()
['A', 'ANA', 'ANANA', 'BANANA', 'NA', 'NANA']

These look fine:

>>> t.with_prefix("NA")
['NA', 'NANA']
>>> t.with_prefix("A")
['A', 'ANA', 'ANANA']
>>> t.with_prefix("ANA")
['ANA', 'ANANA']

As you point out, this example seems wrong:

>>> t.with_prefix("AN")
['AN', 'ANNA']

The value 'ANNA' shouldn't be in the trie.

Peter


More information about the Biopython-dev mailing list