What do the fields of image segmentation, bioinformatics, facility location, and network design have in common? The answer is simple: they all use the minimum spanning trees (MST) to solve different problems! You've already become familiar with the concept of it, but how can we find the MST? Well, as with any other problem in programming, it has different approaches to it. One of them is Kruskal's algorithm, which is exactly the one that will be covered in this topic. So, let's not waste our time and dive into exploring it right away!
Steps of the algorithm
But before that, let's remember the definition of the spanning tree to make sure that the algorithm is absolutely clear. So, a spanning tree is such a subgraph of the initial graph, that is a tree and connects all of its vertices. Consequently, a minimum spanning tree is a spanning tree with the minimum possible sum of the weights of its edges.
Now, how does Kruskal's algorithm find such a tree? In summary, it constructs a minimum spanning tree in a connected undirected weighted graph using a greedy approach. What it does exactly is that it starts with an empty spanning tree and repeatedly adds an edge with the smallest weight that does not produce a cycle in the current tree. In other words, the steps of the algorithm are the following:
- Sort the edges of a graph by weight in a nondecreasing order. Create an empty minimum spanning tree.
- Iterate over the edges in sorted order.
- If the edge does not create a cycle, add it to the minimum spanning tree.
- Repeat the previous step until all the edges have been processed.
- Return the minimum spanning tree.
An example
Let's see how the algorithm works for the following graph:
The weights of the edges range from to , so we will start considering the ones whose weight is . There are 5 such edges: and . If we start to repeatedly include these edges to the current spanning tree, none will produce a cycle. So, all of them have to be added to the spanning tree:
Next, we need to consider all the edges whose weight is . These are and . If we include the edge to the current spanning tree, it will produce a cycle, therefore it cannot be used. The remaining edges won't give a cycle, so we will add them to the current spanning tree:
Then, we will try to add the edges with the weight of in the following order: and . The first two edges can be added to the current spanning tree. But after that, the edge cannot be used, since it produces a cycle. Then the edges and can be successfully added too, but the edge cannot, because it gives a cycle. After all these edges are processed, we will get the following:
The edges with the weight of are and . The first edge can be added to the current spanning tree since it does not produce a cycle. The second one cannot be used because it gives a cycle . The last one does not give a cycle, so we will add it to the tree. Finally, we will get the following:
Hurray, since there are no edges that can be added to the current spanning tree, we are done! The minimum spanning tree consists of edges and has the weight .
The complexity analysis
It is important not only to know the complexity of the algorithm but also to understand how it can be assessed.
Assume that a graph has nodes and edges. Then, to find a minimum spanning tree in , we first need to sort all the edges of , which requires operations. Next, starting with an empty spanning tree, we need to consider the sorted edges and repeatedly add one that does not produce a cycle in the current tree. For the example above, we checked whether an edge produces a cycle by simply looking at the graph. However when implementing the algorithm using a programming language, one needs to use some data structure for it. Now we need to include more implementation details to complete the analysis.
To check whether adding an edge produces a cycle, you may use the following approach: store an array of size where is the number of a component the node belongs to. Initially, all the nodes belong to different components (for example, ). When we want to add an edge to the current spanning tree, we may check whether the nodes are in different components by comparing their numbers and . After adding a new edge, we need to update the array . For example, if and we may iterate through the array and substitute all by thereby uniting components. For instance, let's consider the following small graph. Initially, every vertex lies in its own component:
Now, just like in the example, let's look through the edges, ordered by their weights. We have 2 edges with the weight 1: and . Both pairs of vertices and , and lie in different components, so both edges can be added to our tree without creating a cycle. The components will change in the following way:
The last vertex – 4 – can be connected with the rest of the nodes either by adding the edge or . Let's choose the edge . After adding it to our spanning tree, the first and third components are united:
Since any spanning tree in contains edges, the total number of updates is no more than and therefore the total running time is . However, to store and unite components, one may use a more sophisticated data structure called a disjoint set, which can decrease the running time to .
Pseudocode
Even though the algorithm may not seem too complicated, it is still more convenient to use code instead of performing it manually. So, the pseudocode for the Kruskal's algorithm may look like this:
class Kruskal:
function find_tree(graph):
edges = sorted(graph.edges)
components = []
tree_edges = []
// initializing the array of the components' numbers
for i in [1, length(graph)]:
components.add(i)
for edge in edges:
// skip if the vertices, connected by the edge, lie in the same component
if (components[edge.start] == components[edge.end]) then
continue
// otherwise, unite the components and add the edge to the tree
update_components(components, edge.start, edge.end)
tree_edges.add(edge)
return tree_edges
function update_components(components, start, end):
for i in [1, length(components)]:
// uniting the components
if (components[i] == components[start]) then
components[i] = components[end] Conclusion
To sum up, let's go through the main points about the Kruskal's algorithm:
- It is used to find the minimum spanning tree of the graph.
- It is a greedy algorithm.
- It starts with an empty spanning tree and repeatedly adds an edge with the smallest weight that does not produce a cycle in the current tree.
- The complexity of the algorithm is , that can be reduced to with the use of a more sophisticated data structure – a disjoint set.