Tries (radix trees, not “try” plural) as the evolution of nothing
Let's say we have an array of decimal digits. There are 10 digits, and every one occurs only once.
8 4 3 5 2 0 1 9 7 6
What should we do to have it sorted?
By Bungert55 [CC BY-SA 3.0], from Wikimedia Commons
Nothing! Nothing at all. The thing is, regardless of how the digits are placed in the unsorted array, the sorted array of unique digits is always like this:
0 1 2 3 4 5 6 7 8 9
Since we already know the answer, we don't have to do anything to compute it.
Of course, if you don't have a sorted digits array lying around, you can recreate it on top of the initial array.
This is simple and super-effective since the complexity of the operation is only O(n) where n is the number of elements.
Now let's say some of the digits are missing.
8 3 5 2 1 9 7 6
What do we do now?
By StoffelLombard [CC BY-SA 4.0], from Wikimedia Commons
One answer could be, we just make an array of flags, traverse the initial array, and every digit will raise a flag when occurred. Then we'll traverse the flags and recreate only the elements that occurred.
8 3 5 2 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F F F F F F F F F F
8 3 5 2 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F F F F F F F F T F
8 3 5 2 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F F F T F F F F T F
8 3 5 2 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F F F T F T F F T F
8 3 5 2 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F F T T F T F F T F
8 3 5 2 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T F F T F
8 3 5 2 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T F F T T8 3 5 2 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T F T T T
8 3 5 2 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T T T T T
8 3 5 2 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T T T T T 8 3 5 2 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T T T T T
8 3 5 2 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T T T T T 8 3 5 2 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T T T T T
8 3 5 2 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T T T T T 8 3 5 2 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T T T T T
8 3 5 2 1 9 7 6[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T T T T T
13 5 2 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T T T T T
1 25 2 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T T T T T
1 2 32 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T T T T T
1 2 32 1 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T T T T T
1 2 3 51 9 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T T T T T
1 2 3 5 69 7 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T T T T T
1 2 3 5 6 77 6
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T T T T T
1 2 3 5 6 7 86
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T T T T T
1 2 3 5 6 7 8 9
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T T T T T1 2 3 5 6 7 8 9
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
F T T T F T T T T T
This has two traversals with the respecting complexity of O(n) and O(k) where the k is the number of all the possible digits.
What if we have repeating elements?
8 3 5 8 2 1 1 9 7 6
By Campbelle16 [CC BY-SA 4.0], from Wikimedia Commons
It's still rather effective, it's still O(n) and O(k). It is still the sorting algorithm with linear complexity. The problem is, normally you want to sort something other than single-digit numbers, so your k grows quite a bit.
If you want to sort 16-bit integer numbers, you would need to allocate 65 536 counters. If it's 32-bit, then it's 4 294 967 296. And it's even impossible to count the counters if you want to sort something of dynamic size, like strings or arbitrary length numbers.
And that's when we need the trees. Radix trees, also known as tries as in retrieval.
The idea here is simple. Of course, we can't just allocate trillions of counters but the chances are, we wouldn't even need most of them. So instead of laying them in a linear way, why don't we organize them as a tree? Just split a number into digits, and every digit, when being addressed, will allocate another set of digits. The number 417 then will be then denoted like this:
We stored a 3-digit number using 30 counters. And they are shared. It would cost nothing to store another 417 or 418, and 427 would only require 10 more counters. We're not really constrained by the size of a number either, we can do as many digits as fits in our memory by attaching the trie as a subtrie to a larger trie.
To put an array in a trie, we only need n*l operations, where l is the number of digits in a number including trailing zeroes.
Sorting in this structure is fairly trivial, you just traverse the trie from lower digit to higher entering the subtrees if there is something of interest there.
This is a digital tree also known as radix tree, and this is how radix sort works. Of course, in practice, the digits are usually binary not decimal.
It's not that effective as no sorting at all or as a single-digit radix sort. It has O(n*m) complexity where m is the length of sorted numbers in digits multiplied by the size of a radix. For binary digits, it's basically the size of a number in bits.
Standard sorting algorithms have O(n log(n)) complexity, so the competition is really between m and log(n). As we established earlier, log(4 294 967 296) is 32, so if you want to sort some 32-bit numbers, you should have your array larger than 4B to make radix sort justified. Unless your data is huge, or the numbers are really small, the standard sort will do the job better.
The same goes for the data structure itself. It also has O(m) retrieval complexity, and it's also justified on big data or shorter keys.
Radix trees are used in the most unexpected application like in IP routing or as a pointer to index array in the Linux kernel. But the most common application for them is text processing, although, to enjoy all the benefits we have to modify our structure a bit.
So far we were talking about digital trees where every digit is a node. The problem with this system is that the digits are positional, and we want out tree to be ordered. With numbers, 2 < 12, so we have to store 2 as 02 to keep the order when traversing. But this is not so with strings in alphabetical order. Letter 'a' comes before the letter 'b', and the word 'add' still precedes 'be' while being longer.
This means that instead of a digital tree, we might use a prefix tree that conceptually looks like this:
My tries are prefix tries of standard strings. I try them with different radix size from 1 to 8 bits. The larger the radix, the shorter is the lookup but also the more memory it takes to build a trie. These things should be kept in balance. As the experiments show, extra data allocations may be more harmful to performance than extra lookups.
On my machine it goes like this. Sorting a million quasi-words 3 to 4 letters long:
These are the timings measured in milliseconds. As you can see, on these data radix 8 is way less effective than radix 4, and even worse than the standard sort.
Writing and reading 100 000 quasi-words of length 2 to 8 with a trie:
Trie::Map with 1-bits radix
Writing: 602
Reading: 636
Size in bytes: 35 434 704
Trie::Map with 2-bits radix
Writing: 411
Reading: 404
Size in bytes: 30 357 520
Trie::Map with 4-bits radix
Writing: 714
Reading: 727
Size in bytes: 54 409 656
Trie::Map with 8-bits radix
Writing: 1156
Reading: 599
Size in bytes: 466 866 200
std::map
Writing: 1404
Reading: 1408
std::unordered_map
Writing: 479
Reading: 466
Please note, my implementation is rather crude. Normally, it is not a good idea to allocate small bits of memory with the standard new and delete operators.
My experiments are aimed to show that even with ineffective implementation tries can outperform the standard containers and algorithms on the specific data.
I encourage you to conduct your own experiments on your own data to see if you're using the best structures possible.
Conclusion
The idea behind the tries evolves from doing nothing with unordered digits to the elaborate data structures that excel in big data storage. It is still the same idea, the same principle. Knowing this principle should help you understand the tries, and add them to your arsenal of data structures and algorithms.