Computer scienceAlgorithms and Data StructuresAlgorithmsArray algorithmsSorting algorithms

Sorting algorithms

6 minutes read

Sorting is a fundamental concept in computer science and is often a key component of many algorithms. Knowing how to choose and implement the right sorting algorithm for a particular situation is a crucial skill for any software engineer. This guide will cover seven common sorting algorithms: Bubble Sort, Insertion Sort, Selection Sort, Quick Sort, Merge Sort, Heap Sort, and Counting Sort. We will highlight instances where it is better to use a particular sort and how some of them can be improved. Let's start with a quick reminder of how they work.

Brief descriptions and code

Bubble Sort

This algorithm works by repeatedly swapping the adjacent elements if they are in the wrong order. It continues to pass through the list until no more swaps are needed, indicating that the list is sorted. The largest elements "bubble" to the end of the list with each pass.

for i from 0 to n-1
    for j from 0 to n-i-1
        if array[j] > array[j+1]
            swap array[j] and array[j+1]

Insertion Sort

This algorithm works by dividing the array into a sorted and an unsorted region. The sorted region starts with the first element. Then, it iteratively removes one element at a time from the unsorted region and inserts it into the correct position in the sorted region.

for i from 1 to n
    key = array[i]
    j = i - 1
    while j >= 0 and array[j] > key
        array[j+1] = array[j]
        j = j - 1
    array[j+1] = key

Selection Sort

This algorithm divides the array into a sorted and an unsorted region. The sorted region starts empty while the unsorted region contains all the elements. It repeatedly selects the smallest (or largest, depending on sorting order) element from the unsorted region, and swaps it with the leftmost unsorted element, moving the boundary between the two regions one element to the right.

for i from 0 to n-1
    min_index = i
    for j from i+1 to n
        if array[j] < array[min_index]
            min_index = j
    swap array[i] and array[min_index]

Quick Sort

This algorithm works by selecting a 'pivot' element from the array and partitioning the other elements into two sub-arrays, according to whether they are less than or greater than the pivot. The sub-arrays are then recursively sorted. The partition operation is the heart of the algorithm. Sometimes elements equal to the pivot are allocated to a separate subarray. Here we move them to the left part from the pivot.

partition(array, low, high):
    pivot = array[high]
    i = low - 1
    for j from low to high-1
        if array[j] <= pivot
            i = i + 1
            swap array[i] and array[j]
    swap array[i + 1] and array[high]
    return i + 1
quicksort(array, low, high)
    if low < high
        pi = partition(array, low, high)
        quicksort(array, low, pi - 1)
        quicksort(array, pi + 1, high)

Merge Sort

This algorithm works by dividing the unsorted list into nn sublists, each containing one element (a list of one element is considered sorted), and then repeatedly merges sublists to produce new sorted sublists until there is only one sublist remaining. The merge operation is key to the algorithm.

merge(left, right)
    result = empty array
    while left and right are not empty
        if first element of left <= first element of right
            remove first element from left and append to result
        else
            remove first element from right and append to result
    append remaining elements of left and right (if any) to result
    return result
mergesort(array)
    if length(array) > 1
        middle = length(array) / 2
        left = array[0, 1, ..., middle-1]
        right = array[middle, ..., n-1]
        mergesort(left)
        mergesort(right)
        merge(left, right)

Heap Sort

This algorithm works by visualizing the elements of the array as a special kind of complete binary tree called a heap. The heap is constructed with a process called 'heapify'. Once the heap is built, the largest item is at the root of the heap. This item is moved to the end of the array, the heap size is reduced by one, and max heapify is called on the root. This process is repeated until all elements are sorted.

build_max_heap(array)
    for i from floor(n/2) to 1
        heapify(array, i, n)

heapify(array, i, n)
    largest = i
    left = 2*i
    right = 2*i + 1

    if left <= n and array[left] > array[largest]
        largest = left
    if right <= n and array[right] > array[largest]
        largest = right
    if largest != i
        swap array[i] and array[largest]
        heapify(array, largest, n)
heapsort(array)
    build_max_heap(array)
    for i from n to 1
        swap array[0] and array[i]
        heapify(array, 0, i)

Counting Sort

This algorithm works by counting the number of objects that have distinct key values and using arithmetic on those counts to determine the position of each key value in the output sequence. It is only suitable for direct use in situations where the variation in keys is not significantly greater than the number of items.

countingSort(array)
    max_val = maximum value in array
    count = array of zeros of size max_val + 1
    for each number in array
        increment count[number]
    for i from 1 to max_val
        add count[i] to count[i-1]
    for each number in array
        output[count[number] - 1] = number
        decrement count[number]

Comparison of sorting algorithms

Now let's compare these sorts in terms of the time taken in the average case and the additional memory used. Another important aspect of sorting is its stability, i.e. preserving the order of equal elements.

Algorithm Average Time Complexity Space Complexity Stability Comments
Bubble Sort O(n2)O(n^2) O(1)O(1) Stable Useful when the array is almost sorted
Insertion Sort O(n2)O(n^2) O(1)O(1) Stable Efficient for small data sets
Selection Sort O(n2)O(n^2) O(1)O(1) Unstable Not suitable for large data sets
Quick Sort O(nlogn)O(n \log n) O(logn)O(\log n) Unstable

Fast for large data sets,

but performance depends on pivot selection

Merge Sort O(nlogn)O(n \log n) O(n)O(n) Stable Fast for large data sets, but uses more memory
Heap Sort O(nlogn)O(n \log n) O(1)O(1) Unstable Good for memory-limited systems
Counting Sort O(n+k)O(n+k) O(k)O(k) Stable

Only useful when the range of potential items (k)(k)

is known and not too large. Works with numbers only

You can read more about each of them by clicking on the name in the table.

When to use each algorithm

  • Bubble Sort: Despite being one of the simplest sorting algorithms to understand and implement, Bubble Sort is generally not used in practice due to its poor performance with larger data sets. However, it can be practical in certain situations. For instance, if you know that the input list is almost sorted (i.e., the elements are only slightly out of order), Bubble Sort can perform quite well, achieving nearly linear time complexity in the best-case scenario. It is also an in-place sorting algorithm, meaning it doesn't require any extra space beyond what is used to store the input. This makes it a good choice for memory-constrained systems. In addition, there are conditions for early exits from the outer and inner cycles: the Iverson conditions. They allow us to reduce the number of comparisons. The first condition says that if no swaps have occurred during one pass through the array, the next one is not necessary and sorting can be finished. The second one says that it is not necessary to pass through the whole array, but only to the position of the last swap. Thus, together these conditions improve the best-case scenario to O(n)O(n), but it still will be O(n2)O(n^2) on average.

  • Insertion Sort: Like Bubble Sort, Insertion Sort performs well on small lists or nearly sorted lists, with a time complexity of O(n)O(n) in the best-case scenario. It is also an in-place sorting algorithm, making it memory efficient. Insertion Sort can be a good choice when the data is being received in a streaming manner, as it can sort the list as it receives the data. To improve the efficiency of this algorithm, we can search for the insertion location of each element in the sorted part not by linear search but by binary search, thus reducing the search time to O(logn)O(\log n). It is better to do it this way, but the algorithm will still remain quadratic in the worst case.

  • Selection Sort: This algorithm is unsuitable for larger datasets as its time complexity is O(n2)O(n^2) in all cases. However, it does have the property of minimizing the number of swaps. In situations where writing to memory is significantly more expensive than reading, such as with flash memory or EEPROM, Selection Sort may perform better than other algorithms.

  • Quick Sort: Quick Sort is generally a good choice for large data sets where the additional memory usage is not a concern, and when the data doesn't contain many duplicates. Quick Sort has a worst-case time complexity of O(n2)O(n^2), but this is rare and it typically performs at O(nlogn)O(n \log n). However, the performance of Quick Sort greatly depends on the choice of the pivot. If you have prior knowledge about the distribution of data, Quick Sort can be optimized by choosing a good pivot. The worst-case scenario for choosing a pivot is to choose the extremum (maximum or minimum) of all the data. In this case, all or almost all of the array will remain on one side of the pivot, and this will change the complexity to quadratic. So if your dataset is close to being sorted, it is better not to choose the first or last elements of the array as the pivot. The best choice is the median of all values, but it takes too long to find it, and all the options for choosing pivots come down to how to find the element closest to the median the most quickly.

  • Merge Sort: Merge Sort is a great choice for large data sets as it always performs at O(nlogn)O(n \log n), regardless of the distribution of the input data. It is also a stable sort, which means it maintains the relative order of equal elements. This can be important in certain situations, such as if you're sorting a list of records by a certain field. However, Merge Sort is not an in-place sorting algorithm, and its use of auxiliary space can make it less suitable for memory-constrained systems.

  • Heap Sort: Heap Sort is a good choice when memory is a concern. It's an in-place sorting algorithm with a time complexity of O(nlogn)O(n \log n) in all cases. However, Heap Sort is not a stable sort, which means it doesn't guarantee to preserve the relative order of equal elements. If stability is a concern, Heap Sort may not be the best choice.

  • Counting Sort: Counting Sort is an integer sorting algorithm and is not a comparison-based sort, which allows it to run in O(n)O(n) time given certain assumptions. It's especially efficient when the range of potential items in the input (k)(k) is relatively small compared to the number of items (n)(n). However, since it uses an auxiliary array of length kk, it can use a significant amount of memory if kk is large. It is also a stable sort, which can be important in certain situations.

Conclusion

Sorting algorithms are a foundational part of computer science and understanding them is crucial for any aspiring software engineer. By understanding the strengths and weaknesses of each algorithm, you can make informed decisions about which to use in any given situation. This guide provides a starting point, but the best way to truly understand these algorithms is to implement them yourself. Happy coding!

4 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo