Computer scienceAlgorithms and Data StructuresAlgorithmsPrinciples and techniquesHashing

Load factor in hash table

7 minutes read

Managing large amounts of data can be challenging. Hash tables can effectively add, remove, and check items very quickly. What makes them exceptionally efficient is something called the load factor. In this topic, you'll explore the load factor, learn how it boosts hash table performance, and understand why hash tables operate at such high speed. Excited to start? Let's dive in!

Load factor

Having huge buckets is always bad news. Imagine a hash table with one bucket and a huge number of elements. Then every time we want to do something, we have to search through the whole bucket to find the elements we need. That's the same thing as putting all the elements into an array without any order! If we allowed this to happen, what would be the point of using a hash table?

Full bucket

Now, think about a more common example: a hash table with 100100 buckets and 200200 values in it. An ideal hash function would spread the values uniformly, and we would have 2 values in each bucket. Then, whenever we check for a value, we have to check for equality with both values in the corresponding bucket. This is not bad, but, ideally, buckets should have 1 or 0 elements. We can't always guarantee it, but we can improve the average number if, for example, we have more buckets than elements. For this, we have to introduce the load factor.

The load factor of a hash table is a real number ll that tells us how full the hash table is. We can calculate it at any time with this formula:

l=#elements#buckets.l = { \#elements \over \#buckets}.If we aim to have no more than one element in one bucket, the load factor ll is expected to be between 00 and 11 (why?).

Max load factor

Most hash tables have a max load factor α\alpha (alpha), a constant number between 0 and 1 that is an upper limit for the load factor. After we insert a new value to the hash table, we calculate the new load factor ll. If l>αl > \alpha, then we increase the number of buckets in the hash table, usually by creating a new array of twice as many buckets and inserting in it all the values from the old one. Note that we have to recompute the indices of all elements, since their indices are based on both hash values and the total number of buckets. A common value for the max load factor is 0.750.75, which helps us make sure there are enough buckets, while not wasting a lot of memory on empty ones.

Best  load factor

This picture illustrates that the emptier the bucket, the better it is for the performance of the hash table. The load factor helps us keep the buckets as close to empty as reasonably possible. A too low load factor, however, may also be a bad sign, as it means that we use a lot of unnecessary memory to store empty buckets.

Why are hash tables so fast?

Earlier we mentioned that hash tables take O(1)O(1) to insert, remove, or search for a value when using a good hash function. Let's see why this is true!

For all those operations, the hash table has to do exactly 2 things: compute the hash value and search in only one bucket for the initial value, namely in the bucket whose index is equal to the computed hash value. A good hash function is efficient, so it takes O(1)O(1) to compute the hash value and is deterministic, so the hash value will be the same for any equal values, meaning that we will search the correct bucket. Next, a good hash function is uniform, so there will not be some very large buckets, while others are empty or almost empty. This, together with the load factor we explained above, makes sure that, on average, there is no more than one element in a bucket. So, searching the bucket takes time O(1)O(1). Finally, we can implement buckets so that inserting and deleting from them also takes O(1)O(1), for example, using linked lists. So, if we are using a good hash function, all operations on a hash table will take O(1)O(1).

Summary

In short, the load factor plays a key role in the performance of a hash table. It indicates how full the hash table is and has an impact on the speed of operations:

  • The load factor is often used to know when to resize the hash table, helping to keep it running swiftly.

  • You can calculate it using the formula l=#buckets#elementsl = \frac{\#buckets}{\#elements}. Ideally, its values should fall between 0 and 1.

  • Most hash tables have a maximum load factor, labeled as alpha. If this limit is surpassed, the number of buckets increases.

  • Hash tables are notably quick because of the efficient calculation of the hash value and the equal spread of elements across buckets. Both are ensured by a good hash function.

Now you've grasped the theory, it's time to apply it with some tasks.

7 learners liked this piece of theory. 1 didn't like it. What about you?
Report a typo