Computer scienceAlgorithms and Data StructuresAlgorithmsGraph algorithmsMinimum spanning tree algorithms

Kruskal's algorithm

6 minutes read

What do the fields of image segmentation, bioinformatics, facility location, and network design have in common? The answer is simple: they all use the minimum spanning trees (MST) to solve different problems! You've already become familiar with the concept of it, but how can we find the MST? Well, as with any other problem in programming, it has different approaches to it. One of them is Kruskal's algorithm, which is exactly the one that will be covered in this topic. So, let's not waste our time and dive into exploring it right away!

Steps of the algorithm

But before that, let's remember the definition of the spanning tree to make sure that the algorithm is absolutely clear. So, a spanning tree is such a subgraph of the initial graph, that is a tree and connects all of its vertices. Consequently, a minimum spanning tree is a spanning tree with the minimum possible sum of the weights of its edges.

Now, how does Kruskal's algorithm find such a tree? In summary, it constructs a minimum spanning tree in a connected undirected weighted graph using a greedy approach. What it does exactly is that it starts with an empty spanning tree and repeatedly adds an edge with the smallest weight that does not produce a cycle in the current tree. In other words, the steps of the algorithm are the following:

Sort the edges of a graph by weight in a nondecreasing order. Create an empty minimum spanning tree.
Iterate over the edges in sorted order.
If the edge does not create a cycle, add it to the minimum spanning tree.
Repeat the previous step until all the edges have been processed.
Return the minimum spanning tree.

An example

Let's see how the algorithm works for the following graph:

Finding the minimum spanning tree in a connected undirected weighted graph

The weights of the edges range from $1$ to $5$ , so we will start considering the ones whose weight is $1$ . There are 5 such edges: $\{0, 1\}, \{1, 3\}, \{2, 3\}, \{4, 7\}$ and $\{8, 11\}$ . If we start to repeatedly include these edges to the current spanning tree, none will produce a cycle. So, all of them have to be added to the spanning tree:

Kruskal's algorithm, picture 1

Next, we need to consider all the edges whose weight is $2$ . These are $\{0, 2\}, \{8, 9\}$ and $\{9, 12\}$ . If we include the edge $\{0, 2\}$ to the current spanning tree, it will produce a cycle, therefore it cannot be used. The remaining edges won't give a cycle, so we will add them to the current spanning tree:

Kruskal's algorithm, picture 2

Then, we will try to add the edges with the weight of $3$ in the following order: $\{3, 4\}, \{6, 7\}, \{3, 6\}, \{6, 10\}, \{12, 13\}$ and $\{11, 12\}$ . The first two edges can be added to the current spanning tree. But after that, the edge $\{3, 6\}$ cannot be used, since it produces a cycle. Then the edges $\{6, 10\}$ and $\{12, 13\}$ can be successfully added too, but the edge $\{11, 12\}$ cannot, because it gives a cycle. After all these edges are processed, we will get the following:

Kruskal's algorithm, picture 1

The edges with the weight of $4$ are $\{2, 5\}, \{5, 6\}$ and $\{9, 10\}$ . The first edge can be added to the current spanning tree since it does not produce a cycle. The second one cannot be used because it gives a cycle $2-5-6-7-4-3-2$ . The last one does not give a cycle, so we will add it to the tree. Finally, we will get the following:

Kruskal's algorithm, picture 4

Hurray, since there are no edges that can be added to the current spanning tree, we are done! The minimum spanning tree consists of $13$ edges and has the weight $1 + 1 + 1 + 1 + 1 + 2 + 2 + 3 + 3 + 3+ 3 + 4 + 4 = 29$ .

The complexity analysis

It is important not only to know the complexity of the algorithm but also to understand how it can be assessed.

Assume that a graph $G$ has $V$ nodes and $E$ edges. Then, to find a minimum spanning tree in $G$ , we first need to sort all the edges of $G$ , which requires $O(E\ log E)$ operations. Next, starting with an empty spanning tree, we need to consider the sorted edges and repeatedly add one that does not produce a cycle in the current tree. For the example above, we checked whether an edge produces a cycle by simply looking at the graph. However when implementing the algorithm using a programming language, one needs to use some data structure for it. Now we need to include more implementation details to complete the analysis.

To check whether adding an edge produces a cycle, you may use the following approach: store an array $A$ of size $V$ where $A[i]$ is the number of a component the node $i$ belongs to. Initially, all the nodes belong to different components (for example, $A[i] = i$ ). When we want to add an edge $\{i, j\}$ to the current spanning tree, we may check whether the nodes are in different components by comparing their numbers $A[i]$ and $A[j]$ . After adding a new edge, we need to update the array $A$ . For example, if $A[i] = a$ and $A[j] = b$ we may iterate through the array and substitute all $b$ by $a$ thereby uniting components. For instance, let's consider the following small graph. Initially, every vertex lies in its own component:

A graoh with 4 vertices and a table of components

Now, just like in the example, let's look through the edges, ordered by their weights. We have 2 edges with the weight 1: $\{1, 2\}$ and $\{3, 4\}$ . Both pairs of vertices $1$ and $2$ , $3$ and $4$ lie in different components, so both edges can be added to our tree without creating a cycle. The components will change in the following way:

Graph, after adding edge 1-2 to the spanning tree

Graph, after adding edge 3-4 to the spanning tree

The last vertex – 4 – can be connected with the rest of the nodes either by adding the edge $\{1, 4\}$ or $\{2, 4\}$ . Let's choose the edge $\{2, 4\}$ . After adding it to our spanning tree, the first and third components are united:

Graph, after adding edge 2-4 to the spanning tree

Since any spanning tree in $G$ contains $V-1$ edges, the total number of updates is no more than $V \cdot (V-1) = O(V^2)$ and therefore the total running time is $O(E\ log E + V^2)$ . However, to store and unite components, one may use a more sophisticated data structure called a disjoint set, which can decrease the running time to $O(E\ log E)$ .

Pseudocode

Even though the algorithm may not seem too complicated, it is still more convenient to use code instead of performing it manually. So, the pseudocode for the Kruskal's algorithm may look like this:

class Kruskal:  
  function find_tree(graph):  
    edges = sorted(graph.edges) 
    components = [] 
    tree_edges = [] 

    // initializing the array of the components' numbers 
    for i in [1, length(graph)]:  
      components.add(i) 

    for edge in edges:  
      // skip if the vertices, connected by the edge, lie in the same component 
      if (components[edge.start] == components[edge.end]) then 
        continue 

      // otherwise, unite the components and add the edge to the tree 
      update_components(components, edge.start, edge.end) 
      tree_edges.add(edge) 

    return tree_edges 

  function update_components(components, start, end): 
    for i in [1, length(components)]: 
      // uniting the components 
      if (components[i] == components[start]) then 
        components[i] = components[end]

Conclusion

To sum up, let's go through the main points about the Kruskal's algorithm:

It is used to find the minimum spanning tree of the graph.
It is a greedy algorithm.
It starts with an empty spanning tree and repeatedly adds an edge with the smallest weight that does not produce a cycle in the current tree.
The complexity of the algorithm is $O(E\ log E + V^2)$ , that can be reduced to $O(E\ log E)$ with the use of a more sophisticated data structure – a disjoint set.

5 learners liked this piece of theory. 0 didn't like it. What about you?

Report a typo