Let's imagine the following scenario: you have a lot of friends and a big shelf full of books. Some of your friends want to borrow your books, some want to return them, and some want to know if you have a certain book. So, you want to write a program that lets you add books, remove books, and check if a book is available. For this scenario, what would be the best data structure to use?
Hash tables are structures that allow us to insert and remove values, and check if a value is present in time for each of those operations. Hash tables alone cannot guarantee this time. However, paired with a good hash function, everything works well, making it one of the best data structures for this purpose.
Hash tables
Hash tables are arrays where each entry is a bucket. Buckets can hold 0 or more values of a type. They are identified by their index in the array. If we want to insert a value in the hash table, we compute its hash value and insert it in the bucket at an index equal to the hash value (modulo the size of the array). You can do the same to remove an object or search for it.
If the hash value is greater than the maximum index of buckets, we use modulo division! So, if the hash table has buckets (the max index shall be since counting starts from ), and the hash value of our element is, say, , we take the remainder of this value with the size of the array. In this case, , hence the element with a hash value equal to will be stored in the bucket with index .
Example of a hash table
Let's look at an example now. Say we have a hash table with 5 buckets, and we are inserting integers {1, 3, 5, 6} with the identity hash function. The table looks like this:
|
Index |
0 |
1 |
2 |
3 |
4 |
|---|---|---|---|---|---|
|
Values |
{5} |
{1, 6} |
{} |
{3} |
{} |
Now let's figure out how it works. Since we're using the identity function, the hash value is equal to the value itself. Because of this, you can see that 1 and 3 are in buckets at indexes 1 and 3 respectively. Now, 5 and 6 have hash values 5 and 6, but there are too few buckets! To find a bucket for them, we take the modulo of the hash value by the total number of buckets, in this case, 5. So, 5 modulo 5 is 0, and 6 modulo 5 is 1, and you can see in the table that the values are placed in the correct buckets.
Hash set vs hash map
There are two types of hash tables. The one above is called a hash set. We can use it in scenarios where we only need to check if a value is present: for example, the book scenario in the introduction. We can still add or remove elements to a hash set, but in most cases we are not interested in getting a value out of a hash set. This happens because we already have an equal value that we need in order to search a hash table. Implementations of hash sets are unordered_set in C++ and HashSet in Java.
Another type of hash table is a hash map. Imagine you want to keep a phone book of names and numbers of friends, and want to search for phone numbers using a friend's name. If you kept a hash set of the name-number pairs, you would need to know the number to be able to search. Here, hash maps can help you. They are very similar to hash sets, but they store pairs. The first entry in the pair is called the key, and the second is the value. Only the hash of the key is used, and, when searching, we look for the value associated with a key. In the example above, the keys would be the names, and the values would be the phone numbers. Implementations of hash maps are unordered_map in C++ and HashMap in Java.
Summary
-
Hash tables are data structures that support fast inserting and removing values, and checking if a value is present.
-
Hash tables consist of buckets holding zero, one, or several values.
-
Hash functions are used to determine a bucket for a value.
-
Hash sets store objects based on their hash values; hash maps store key-value pairs based on the keys' hash values.
So far so good: however, have you asked yourself how to avoid having more than one element in one bucket? What if the number of elements is greater than the number of buckets? This and more are going to be discussed in the following topic. For now, make sure you have fully mastered the theory by solving some easy-peasy tasks.