Computer scienceAlgorithms and Data StructuresAlgorithmsArray algorithmsSorting algorithms

Counting sort

16 minutes read

When we are given a list of items and are told to sort them, our brain immediately starts comparing objects with one another. However, there is another approach to sorting the items without comparing them. This non-comparison-based sorting algorithm is the counting sort that works like magic. As the name suggests, this algorithm sorts a list by counting its elements. In this topic, we will learn how this method works and where we can apply the magic through some examples.

A treasure box

Jack, the curious explorer, stumbled upon a mysterious box brimming with ancient coins! Yo-ho-ho! The treasure chest be holdin' three types of coins: gold, silver, and bronze. Jack is feeling lucky and wants to sort his coins by their value. What's the next move for him?

You've nailed it! Hey Jack, grab three empty buckets and label them gold, silver, and bronze! Pick up each shiny treasure from the box and put them into their properly labeled boxes. Let's get organized!

That's the exact mechanism through which our counting sort works. It sorts an unsorted list by counting each element. Some important point to be noted from Jack's story: he knows in advance that the box has only three types of coins. It would have been impossible for him to gather the necessary number of empty buckets if he didn't know the information.

Stability of sorting

Before we begin studying counting sort, let's discuss the two forms of sorting: stable sort and unstable sort. While sorting, stable sort keeps the order of repeating components, but unstable sort does not. Let us illustrate these notions with an example:

Let's go back to our wealthy Jack. He is an incredibly generous guy who wishes to aid five deserving poor kids by offering financial support. So he solicits applications and requests SAT scores. However, 7 pupils applied for help. Their results are as follows:

Because Jack will only grant five scholarships, he must first sort the array in descending order and then select the best five students. However, because there are some kids with similar grades, Jack wants to follow the "first come, first served" concept. As a result, his sorted list will look like this:

In this case, the sorted list follows the order in which similar scores occur in our original list. It is evident that James will receive the money as he appeared before Luna in the initial unsorted list and remains ahead of her in the sorted list. This is the fundamental principle of stable sorting. On the other hand, if our sorted list does not follow the order, we call this the unstable sort.

But the topic is on counting sort: why do we talk about the stability of sorting here? As a matter of fact, there are two versions of our counting sort algorithm: unstable counting sort and stable counting sort. Let's start learning them now.

Unstable counting sort

Let's roll a dice 10 times and the outcomes are:

Now we wish to sort those numbers using counting sort. Let's dive into the steps we have to follow:

Step 1: Range first

We need to determine the range of values in our input array. Since we have rolled a dice, we are certain that we will obtain integer numbers from 11 to 66.

Step 2: Create an empty frequency array

Now we will create a frequency array of size 66 , which will work like a tally sheet for counting occurrences. Initially, we set the value of each element to zero.

Please note that our data range can vary, for example, from 11 to 66. To accommodate this, we create an array with a size equal to the difference between the maximum and minimum values, plus one. This ensures that the first cell of the array represents the minimum value, 11 , and the last cell represents the maximum value, 66.

Step 3: Start counting

It's time to count the occurrences of each number in our array. As we go through each element of the number array, we mark a tally in the frequency array, i.e. increase the value by one for the corresponding element in the frequency array. For example, if we encounter the number 44, we increase the fourth element(at index 44) of the frequency array by 11. Our frequency array will look like this after counting all elements:

Step 4: Create and fill up the output array

Create an empty output array with the same size as the input array. This array will hold the sorted elements. Iterate through the frequency array. For each number, retrieve the count of occurrences and place the corresponding value in the output array that many times. The output array is filled with values from the minimum to the maximum, based on the counts in the frequency array. Here is the final output:

Here is the pseudocode of the procedure above:

UnstableCountingSort(inputArray):
    // Step 1: Determine the range
    minVal = minimum value in inputArray
    maxVal = maximum value in inputArray
    range = maxVal - minVal + 1

    // Step 2: Create the frequency array
    frequencyArray = new Array[range]

    // Step 3: Count the occurrences
    for i in (1, length(inputArray)):
        value = inputArray[i]
        frequencyArray[value - minVal + 1] += 1

    // Step 4: Create and fill up the output array
    outputArray = new Array[length(inputArray)]
    currentIndex = 1
    for i in (1, length(frequencyArray)):
        count = frequencyArray[i]
        while count > 0:
            outputArray[currentIndex] = i + minVal - 1
            currentIndex += 1
            count -= 1

    // Step 5: Return the sorted array
    return outputArray

Stable counting sort

The procedures above are very simple and intuitive. However, the sorted array doesn't preserve the order of the similar elements as in the input array. To preserve the order, we follow the following steps. Here, the steps from 11 to 33 are exactly the same as the unstable counting sort.

Step 1: Range first

Step 2: Create an empty frequency array

Step 3: Start counting

Step 4: Modify the frequency array

Here comes the twist! Instead of keeping the actual counts, we'll transform the frequency array into a map of starting positions. We traverse the frequency array and update each element with the cumulative sum of the previous frequencies. Here, cumulative sum refers to the successive addition of numbers in a sequence, where each sum is the result of adding the current frequency to the sum of all preceding frequencies. For example, cumulative frequency of 44 is the sum of frequency of 11, 22, 33, and 44, namely 1+2+2+1=61 + 2 + 2 + 1 = 6(see the frequency table above).

Here is our cumulative frequency table:

Step 5: Create and fill up the output array

To store the sorted array, we prepare an empty array of the same size as the input array. Next, we take each number from the input array, and based on the cumulative frequency array, we place it in the corresponding position of the output array. The cumulative frequency array keeps track of the position of the next similar number.

We start from the end of the input array containing the number 55. To determine the index of a particular number in the cumulative frequency array, we subtract the minimum value of the range from the number and then add one: (numberminVal)+1(number - \text{minVal}) + 1. Hence, the position of then number 5 in the cumulative frequency array will be: (51)+1=5(5 - 1) + 1 = 5. The cumulative frequency array contains the number 88 at index 55. This number 88 tells us that the 88th element (at index 88) in the output array should be 55. So, we update the output array accordingly and subtract 11 from the cumulative frequency for the number 55. Here is the illustration:

Next, we select the second last element in the input array, i.e. 33 and follow the same procedures as above until we reach the beginning of the input array. This process will sort the numbers while preserving the order in which similar elements appear in the input array.

Now, let's look at the pseudocode of the stable counting sort:

StableCountingSort(inputArray):
    // Step 1: Determine the range
    minVal = minimum value in inputArray
    maxVal = maximum value in inputArray
    range = maxVal - minVal + 1

    // Step 2: Create the frequency array
    frequencyArray = new Array[range]
    
    // Step 3: Count the occurrences
    for i in (1, length(inputArray)):
        value = inputArray[i]
        frequencyArray[value - minVal + 1] += 1

    // Step 4: Modify the frequency array
    for i in (2, length(frequencyArray)):
        frequencyArray[i] += frequencyArray[i - 1]

    // Step 5: Create and fill up the output array
    outputArray = new Array[length(inputArray)]
    for i in (length(inputArray), 1):
        value = inputArray[i]
        position = frequencyArray[value - minVal + 1] - 1
        outputArray[position] = value
        frequencyArray[value - minVal + 1] -= 1

    // Step 6: Return the sorted array
    return outputArray

Complexities and constraints

From the above discussion, it is clear that the counting sort algorithm is simple to understand and implement. The space complexity of this algorithm is O(n+k)O(n+k), since this algorithm needs a frequency array of size nn representing the range along with the input array of size kk. The time complexity of this algorithm is also O(n+k)O(n+k) because it traverses through each element of the input array and the frequency array.

So, we can say that the algorithm is a pretty good choice if nn, i.e. the range of our data, is small compared to the size of the input array. For example, suppose a total of 500 babies have been born in a hospital during a year. If we want to sort these little ones' names according to their birth month, counting sort is a good option. However, if we want to sort them according to the timestamps of their delivery, counting sort doesn't seem to be a good option since the range of time can be huge.

Conclusion

In this topic, you've become familiar with one of the sorting algorithms, namely, the counting sort. You've learned about its properties and had a look at the examples of the stable and non-stable versions of the algorithm. Finally, let's see when it's best to use the counting sort.

Counting sort is the right choice of an algorithm, if:

  • the input array consists of integers or objects mappable to integers (characters, key-value pairs with small keys, and so on);

  • we know the range of elements;

  • most of the range elements are present in the input array;

  • additional memory usage is not an issue.

Stable counting sort is used to implement other non-comparison-based sorting algorithms, such as radix sort.

14 learners liked this piece of theory. 1 didn't like it. What about you?
Report a typo