We are already familiar with several string hashing functions, including linear and polynomial hash functions. This topic describes not a distinct function, but rather a property of hash functions that is applicable to both linear and polynomial hashing. Clever usage of this property is a cornerstone to building efficient string algorithms: substring search, text comparison, data compression, and many more.
The logic behind rolling hash
Let's consider the substring search task for a moment: we need to tell if a string is a substring of another string . Somehow, we need to check if any of the substrings of matches . We have previously learned an ingenious trick: we can compare not substrings, but their hash values. However, calculating hash values can be expensive in terms of time efficiency. This is when the rolling hash property comes in handy and allows us to efficiently solve some substring processing problems.
But what is the idea behind rolling hash? In three words, eliminating repetitive calculations. Imagine you are searching for in . Say, you have calculated the hash value of , compared it with the hash value of , and concluded that it is a mismatch. Then, it is logical to compute the hash value of .
However, we see that is common for and , so we perform calculations twice on this substring instead of remembering the value from the previous calculations. This is exactly what rolling hash does: it stores the result for , computes the value for the new in the end of , and subtracts the already calculated value of the in the beginning of . Now, the name rolling hash should make more sense: it is just like rolling:
Linear rolling hash
The idea described above corresponds to the linear rolling hash property. Let's see an example: consider a string . Let's use the linear hash function to calculate a hash for a prefix of of length :
Now, assume we need to calculate a hash value for the next substring of of length . This can be done as follows:
Note that the second sum can also be obtained if we subtract from the first sum and then add :
So, instead of additions, we perform only 2 elementary operations: one addition and one subtraction. This property of a hash function is called a rolling hash.
Polynomial rolling hash
The same property holds to the polynomial hash function: the only difference is that we need to start the calculations from the last substring (suffix) of for the property to work. We will explain that in a bit, but first, let's get back to the example from the section above.
Consider a string . Let's use the polynomial hash function to calculate a hash for a suffix of of length :
The next substring of from the end has the following hash value:
Using the hash value for the first substring, the second one can be obtained as follows:
It's finally time to explain this scary formula, if you are wondering, why it looks like that.
- Why, unlike linear rolling hash, here we have to start calculations from suffixes, and not prefixes? — This right-to-left approach is required for the implementation of the Rabin-Karp algorithm, as we will see in the following topic. The left-to-right approach is also possible.
- Where does come from? — Notice that is the length of the string , so this is all about shifting. Indeed:
- When is 1 (meaning there's only one character between and ), equals , which is 1. So, the character at has its original value.
-
As you move further to the left, the weight of each character decreases exponentially. The character at has a weight of , the one before that has a weight of , and so on.
- Why do we multiply by ? — Similarly, this multiplication simulates shifting the rolling hash one position to the right in the string.
Conclusion
So, here is a short summary of what we discussed in this topic:
-
Rolling hash is a property applicable to both linear and polynomial hash functions in string processing.
-
The key idea is to eliminate redundant hash calculations by updating the hash value incrementally as we move through the string.
-
In linear rolling hash, calculations start from the prefix of the string and efficiently update the hash as the window moves from left to right.
-
Polynomial rolling hash, on the other hand, uses a right-to-left approach, starting from the suffix of the string.
-
Rolling hash is a fundamental concept in string processing, optimizing hash value computation for various substring-related algorithms and applications. It enables efficient solutions to substring search, text comparison, data compression, and more.