[Biopython-dev] Trie with_prefix doesn't work as expected
Peter Cock
p.j.a.cock at googlemail.com
Thu Jan 31 11:38:43 UTC 2013
On Wed, Jan 30, 2013 at 9:42 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Wed, Jan 30, 2013 at 2:09 AM, Kevin Wu <kjwu at ucsd.edu> wrote:
>> Hi All,
>>
>> I'm attempting to use the trie implementation in biopython to develop a
>> suffix trie. I'm using the with_prefix function to find all keys which
>> start with a sequence, however, the function doesn't return values that I
>> expect. I tested it with the canonical example "banana" and am a bit
>> confused.
>>
>> from Bio.trie import trie
>> t = trie()
>> s = "BANANA"
>> for i in range(len(s)): # insert all suffixes into trie
>> t[s[i:]] = i
>>
>> t.with_prefix("NA") # this works as expected
>>>> ['NA', 'NANA']
>>
>> t.with_prefix("AN")
>>>> ['AN', 'ANNA'] # this doesn't work as expected
>> # expected output: ["ANANA", "ANA"]
>>
>> Can anyone clarify my confusion or confirm this bug? I'm on Biopython 1.60,
>> Linux Mint 64-bit.
>
> There is certainly something odd happening. I'm testing with the
> current code in git (pre-Biopython 1.61) under Mac OS X.
>
>>>> from Bio.trie import trie
>>>> t = trie()
>>>> s = "BANANA"
>>>> for i in range(len(s)): # insert all suffixes into trie
> ... t[s[i:]] = i
> ... print "%s -> %i" % (s[i:], i)
> ... assert t[s[i:]] == i
> ...
> BANANA -> 0
> ANANA -> 1
> NANA -> 2
> ANA -> 3
> NA -> 4
> A -> 5
>>>> t.values()
> [5, 3, 1, 0, 4, 2]
>>>> t.keys()
> ['A', 'ANA', 'ANANA', 'BANANA', 'NA', 'NANA']
>
> These look fine:
>
>>>> t.with_prefix("NA")
> ['NA', 'NANA']
>>>> t.with_prefix("A")
> ['A', 'ANA', 'ANANA']
>>>> t.with_prefix("ANA")
> ['ANA', 'ANANA']
>
> As you point out, this example seems wrong:
>
>>>> t.with_prefix("AN")
> ['AN', 'ANNA']
>
> The value 'ANNA' shouldn't be in the trie.
>
> Peter
Thanks to Jeff Chang for a very speedy fix (sent as an attachment off list),
which I have applied to the repository:
https://github.com/biopython/biopython/commit/cd7cc7174fd4b0607381e9c58f6ae0d17cca8f74
I've also added a unit test based on Kevin's example:
https://github.com/biopython/biopython/commit/efc289c8fe2e78ad12481973e42554fa40f2ea0a
Thank you for reporting this Kevin.
Peter
P.S. Nice to hear from you again Jeff :)
I think your last commit was before we moved from CVS to git, please
let us know if you want commit access on github.
More information about the Biopython-dev
mailing list