This is Words and Buttons Online — a collection of interactive #tutorials, #demos, and #quizzes about #mathematics, #algorithms and #programming.

Tries (radix trees, not “try” plural) as the evolution of nothing

Let's say we have an array of decimal digits. There are 10 digits, and every one occurs only once.

8 4 3 5 2 0 1 9 7 6

What should we do to have it sorted?

By Bungert55 [CC BY-SA 3.0], from Wikimedia Commons

Nothing! Nothing at all. The thing is, regardless of how the digits are placed in the unsorted array, the sorted array of unique digits is always like this:

0 1 2 3 4 5 6 7 8 9

Since we already know the answer, we don't have to do anything to compute it.

Of course, if you don't have a sorted digits array lying around, you can recreate it on top of the initial array.

8 4 3 5 2 0 1 9 7 6

This is simple and super-effective since the complexity of the operation is only O(n) where n is the number of elements.

Now let's say some of the digits are missing.

8 3 5 2 1 9 7 6

What do we do now?

By StoffelLombard [CC BY-SA 4.0], from Wikimedia Commons

One answer could be, we just make an array of flags, traverse the initial array, and every digit will raise a flag when occurred. Then we'll traverse the flags and recreate only the elements that occurred.

8 3 5 2 1 9 7 6

[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
 F   F   F   F   F   F   F   F   F   F

This has two traversals with the respecting complexity of O(n) and O(k) where the k is the number of all the possible digits.

What if we have repeating elements?

8 3 5 8 2 1 1 9 7 6

By Campbelle16 [CC BY-SA 4.0], from Wikimedia Commons

Well, we can just make our flags into counters.

8 3 5 8 2 1 1 9 7 6

[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
 0   0   0   0   0   0   0   0   0   0

It's still rather effective, it's still O(n) and O(k). It is still the sorting algorithm with linear complexity. The problem is, normally you want to sort something other than single-digit numbers, so your k grows quite a bit.

If you want to sort 16-bit integer numbers, you would need to allocate 65 536 counters. If it's 32-bit, then it's 4 294 967 296. And it's even impossible to count the counters if you want to sort something of dynamic size, like strings or arbitrary length numbers.

And that's when we need the trees. Radix trees, also known as tries as in retrieval.

The idea here is simple. Of course, we can't just allocate trillions of counters but the chances are, we wouldn't even need most of them. So instead of laying them in a linear way, why don't we organize them as a tree? Just split a number into digits, and every digit, when being addressed, will allocate another set of digits. The number 417 then will be then denoted like this:

allocated for 41:    [0][1][2][3][4][5][6][7][8][9]
                      0  0  0  0  0  0  0  1  0  0
                      :
allocated for 4:  [0][1][2][3][4][5][6][7][8][9]
                   0  0  0  0  0  0  0  0  0  0
                   :
root: [0][1][2][3][4][5][6][7][8][9]
       0  0  0  0  0  0  0  0  0  0

We stored a 3-digit number using 30 counters. And they are shared. It would cost nothing to store another 417 or 418, and 427 would only require 10 more counters. We're not really constrained by the size of a number either, we can do as many digits as fits in our memory by attaching the trie as a subtrie to a larger trie.

To put an array in a trie, we only need n*l operations, where l is the number of digits in a number including trailing zeroes.

Sorting in this structure is fairly trivial, you just traverse the trie from lower digit to higher entering the subtrees if there is something of interest there.

Let's sort this:

418 7 1 30 7 34 30 7 417
	

[0][1][2][3][4][5][6][7][8][9]
 0  1  0  0  0  0  0  3  0  0
 :
 :       [0][1][2][3][4][5][6][7][8][9]
 :        2  0  0  0  1  0  0  0  0  0
 :        :
[0][1][2][3][4][5][6][7][8][9]
 0  0  0  0  0  0  0  0  0  0
 :
 :             [0][1][2][3][4][5][6][7][8][9]
 :              0  0  0  0  0  0  0  1  1  0
 :              :
 :          [0][1][2][3][4][5][6][7][8][9]
 :           0  0  0  0  0  0  0  0  0  0
 :           :
[0][1][2][3][4][5][6][7][8][9]
 0  0  0  0  0  0  0  0  0  0


This is a digital tree also known as radix tree, and this is how radix sort works. Of course, in practice, the digits are usually binary not decimal.

It's not that effective as no sorting at all or as a single-digit radix sort. It has O(n*m) complexity where m is the length of sorted numbers in digits multiplied by the size of a radix. For binary digits, it's basically the size of a number in bits.

Standard sorting algorithms have O(n log(n)) complexity, so the competition is really between m and log(n). As we established earlier, log(4 294 967 296) is 32, so if you want to sort some 32-bit numbers, you should have your array larger than 4B to make radix sort justified. Unless your data is huge, or the numbers are really small, the standard sort will do the job better.

The same goes for the data structure itself. It also has O(m) retrieval complexity, and it's also justified on big data or shorter keys.

Radix trees are used in the most unexpected application like in IP routing or as a pointer to index array in the Linux kernel. But the most common application for them is text processing, although, to enjoy all the benefits we have to modify our structure a bit.

So far we were talking about digital trees where every digit is a node. The problem with this system is that the digits are positional, and we want out tree to be ordered. With numbers, 2 < 12, so we have to store 2 as 02 to keep the order when traversing. But this is not so with strings in alphabetical order. Letter 'a' comes before the letter 'b', and the word 'add' still precedes 'be' while being longer.

This means that instead of a digital tree, we might use a prefix tree that conceptually looks like this:

cat cats cow cower

        [a...s...z]
         :   1
         :
    [a...t...z]
     :   1
     :          [a...r...z]
     :           :   1
     :           :
     :      [a...e...z]
     :       :
     :       :
     :  [a...w...z]
     :   :   1
     :   :
    [a...o...z]
     :
     :
[a...c...z]

This tree can have elements on every node not only the leaves.

The numbers

To see how the tries perform in reality, I've made a simple benchmark: https://github.com/akalenuk/wordsandbuttons/tree/master/exp/sort/radix.

My tries are prefix tries of standard strings. I try them with different radix size from 1 to 8 bits. The larger the radix, the shorter is the lookup but also the more memory it takes to build a trie. These things should be kept in balance. As the experiments show, extra data allocations may be more harmful to performance than extra lookups.

On my machine it goes like this. Sorting a million quasi-words 3 to 4 letters long:

   std::sort - 240
   radix 1 sort - 221
   radix 2 sort - 180
   radix 4 sort - 146
   radix 8 sort - 410
	

These are the timings measured in milliseconds. As you can see, on these data radix 8 is way less effective than radix 4, and even worse than the standard sort.

Writing and reading 100 000 quasi-words of length 2 to 8 with a trie:

Trie::Map with 1-bits radix
   Writing: 602
   Reading: 636
   Size in bytes: 35 434 704

Trie::Map with 2-bits radix
   Writing: 411
   Reading: 404
   Size in bytes: 30 357 520

Trie::Map with 4-bits radix
   Writing: 714
   Reading: 727
   Size in bytes: 54 409 656

Trie::Map with 8-bits radix
   Writing: 1156
   Reading: 599
   Size in bytes: 466 866 200

std::map
   Writing: 1404
   Reading: 1408

std::unordered_map
   Writing: 479
   Reading: 466

Please note, my implementation is rather crude. Normally, it is not a good idea to allocate small bits of memory with the standard new and delete operators.

My experiments are aimed to show that even with ineffective implementation tries can outperform the standard containers and algorithms on the specific data.

I encourage you to conduct your own experiments on your own data to see if you're using the best structures possible.

Conclusion

The idea behind the tries evolves from doing nothing with unordered digits to the elaborate data structures that excel in big data storage. It is still the same idea, the same principle. Knowing this principle should help you understand the tries, and add them to your arsenal of data structures and algorithms.