Computer scienceAlgorithms and Data StructuresAlgorithmsGraph algorithmsMinimum spanning tree algorithms

Minimum spanning tree

13 minutes read

Welcome to the exciting world of graph theory, where connections and efficiency often work together. As you start this journey, imagine designing a network; it could be for roads, telecommunications, or distributing electricity. Your goal is to link all points using the least amount of resources possible. Here, the idea of a minimum spanning tree (MST) guides you to make the best choices. Together, we'll explore MSTs, learning how they are crucial for creating cost-effective networks. Get ready for an algorithmic journey that will not only boost your problem-solving skills but also show the beauty of connecting points most affordably.

Conceptualizing the Minimum Spanning Tree

A minimum spanning tree (MST) is a simple yet powerful concept in network optimization. It is a subset of the edges in a weighted graph that connects all the vertices without any cycles, and the total edge weight is as low as possible. It forms a skeleton of the original graph that keeps its connectivity while cutting any extra weight. This ensures that resources are used wisely, whether physical materials, like cables in a telecommunications network, or abstract things, such as time or cost.

Minimum spanning tree

Here are the properties that define an MST:

Uniqueness: If a weighted graph has unique edge weights, it will have only one MST. This shows that the best solution is clear and definite. In the above figure, the spanning tree is unique because the weighted graph has unique edge weights.
Cut Property: Of the edges that cross any cut of the graph, the one with the least weight is always part of the MST. This helps build the MST by adding the smallest edges that link two different components. You can try to make any cut in the initial graph above and find that the edge with least weight is always included in the MST.
Cycle Property: For any cycle in the graph, you cannot include the edge with the greatest weight in the MST. This stops cycles from forming and keeps the tree without cycles. Please check the cycles $V1-V2-V4-V1$ or $V1-V3-V4-V1$ above and you will find that the greatest weight $11$ is always excluded from the MST.

Recognizing a Minimum Spanning Tree Problem

To identify a minimum spanning tree (MST) problem, look for specific attributes that define these challenges. Here's how to tell if a situation needs an MST solution:

Graph Representation: The problem shows a network that can be modeled as a graph. This could be a physical network, like a roadmap connecting cities, or an abstract one, such as a network of connected devices.
Edge Weights: The graph's edges, which represent connections, have weights. These might be distances, costs, or some other measurable item that you need to minimize. For instance, roads on a map have distances you can use as edge weights.
Optimization Objective: The main goal is to find a tree—a connected acyclic subgraph—that covers all vertices and has the smallest sum of edge weights. For a cities' map, this means finding the shortest road network connecting all cities without making any loops.
Graph Density: The graph could be dense with many connections between the nodes, or sparse with fewer edges compared to the number of nodes. Graph density affects which MST algorithm is better to use.

Understanding these attributes helps you spot MST problems.

Algorithms for Finding a Minimum Spanning Tree

In the search for the minimum spanning tree, two algorithms are well known for their effectiveness and common use: Prim's and Kruskal's. Let's look at these algorithms with examples and learn how to implement them step by step, including pseudocode.

Prim's Algorithm

Imagine you're at a buffet with lots of different dishes. You start with the dish closest to you and, with each new plate, you take the dish that's nearest to what's already on your plate, making sure you don't go back to dishes you've already had. This method is like Prim's Algorithm, which gradually creates a minimum spanning tree (MST) by choosing the closest connected vertex that hasn't been picked yet.

Steps to follow in Prim's Algorithm:

Begin with a vertex, chosen by chance, and mark it as part of the MST.
At every step, add the smallest connecting edge from the tree to a new vertex not yet in the tree.
Keep going until all vertices are part of the tree.

Here is the illustration of Prim's algorithm applied to the previous graph starting from the vertex $V2$ . The solid lines are included in the MST, the dotted lines are not.

Prim's algorithm applied

Pseudocode for Prim's Algorithm using heap data structure:

function Prim_find_tree(graph):
    tree_edges = [] // List to store the edges of the resulting MST
    min_heap = Heap() // Priority queue (min-heap) to select the smallest edge

    // Array to store the minimum weight edge to connect each vertex to the MST
    min_edges = [Int.maxValue()] * length(graph)
    
    // Start the MST with node 1, so set its weight to a value that ensures it is picked first
    min_edges[1] = -1
    
    // Array to store the parent node for each vertex in the MST
    min_edge_parents = [-1] * length(graph)

    // Insert all vertices with their associated weights into the min_heap
    for i in [1, length(graph)]:
        insert(min_heap, [i, min_edges[i]])

    while size(min_heap) > 0:
        // Extract the vertex with the smallest edge weight connecting it to the MST
        cur_node = extract_min(min_heap)
        
        // If cur_node is not the starting node, add the edge to the MST
        if min_edge_parents[cur_node] != -1:
            tree_edges.add(Edge(min_edge_parents[cur_node], cur_node, min_edges[cur_node]))

        // For each edge adjacent to cur_node, try to update the path to each child
        for edge in graph[cur_node]:
            child = edge.destination
            
            // If child is in the heap and the new edge weight is smaller, update it
            if (contains(min_heap, child) and min_edges[child] > edge.weight):
                // Update the minimum weight to reach the child vertex
                min_edges[child] = edge.weight
                // Update the parent of the child vertex
                min_edge_parents[child] = cur_node
                // Re-insert the child vertex with the new weight into the heap
                insert(min_heap, [child, edge.weight])

    // Return the list of edges that form the MST
    return tree_edges

Some points about the pseudocode

The graph is assumed to be a list of adjacencies where each vertex's edges are listed as Edge objects, with a destination and weight.
The min_heap is used for efficiently selecting the next vertex for the MST. It holds pairs [vertex, weight], where weight is the minimum edge weight to connect the vertex to the MST.
The min_edges array tracks the smallest edge weight that connects each vertex to those already in the MST.
The min_edge_parents array contains the parent vertex for each one in the MST, which helps to rebuild the MST edges.
The insert function puts a new vertex with its weight into the min_heap, and extract_min takes out the vertex with the smallest weight from the heap.
The contains function checks if a vertex is still in the heap, which means it's not yet in the MST.
The size function gives the number of items in the heap.

Kruskal's Algorithm

Imagine you need to connect several islands with bridges. You can pick from different bridge lengths and aim to connect all islands using the least amount of material. You would start by building the shortest bridge, then the next shortest that links two islands that aren't yet connected, and so on. This method is the core of Kruskal's Algorithm, which creates a minimum spanning tree (MST) by choosing the shortest edges and avoiding cycles.

Steps to implement Kruskal's Algorithm:

Sort all the edges of the graph from shortest to longest based on their weights. Start with an empty set for the MST.
For each edge in the sorted list, do the following:
a. Check if adding the current edge to the MST would cause a cycle.
b. If no cycle would happen, add the edge to the MST.
Keep doing this until you have looked at all edges.
The edges you have added will make up the graph's MST when you finish the algorithm.

Here is an illustration of the Kruskal algorithm using a slightly different graph:

Kruskal algorithm applied

Pseudocode for Kruskal's Algorithm:

function find_tree(graph):
    // Sort all edges in the graph based on their weight
    edges = sorted(graph.edges by weight)
    // Start an array to keep track of the component (or set) each vertex is in
    components = []
    // List to store the edges in the MST
    tree_edges = []

    // Put each vertex in its own component
    for i in [1, length(graph)]:
        components.add(i)

    // Go through each edge by weight order
    for edge in edges:
        // If the current edge connects vertices in the same component
        if components[edge.start] == components[edge.end]:
            // Skip this edge to prevent cycles
            continue

        // If the edge links two different components, add it to the MST
        tree_edges.add(edge)
        // Join the two components, as they are now linked by the MST
        update_components(components, edge.start, edge.end)

    // Give back the list of edges that make up the MST
    return tree_edges

function update_components(components, start, end):
    // Find the component numbers for the start and end
    start_component = components[start]
    end_component = components[end]

    // Go through the components list
    for i in [1, length(components)]:
        // Change the component number for all connected to the start
        // to the end's number, joining them
        if components[i] == start_component:
            components[i] = end_component

Some points about the pseudocode

The graph.edges is thought to be a list of edge items where each edge has a start, end, and weight.
The sorted function puts the edges in order from shortest to longest by their weights.
The components array keeps track of which group (or disjoint set) each vertex is in. At first, every vertex is in its own group.
The update_components function combines two groups by changing all the start group vertices to the end group.
The tree_edges list will have the edges that make the MST at the end of the algorithm.

The pseudocode implements Kruskal's algorithm using a components array. But for bigger graphs, you would use a more efficient disjoint-set data structure, called union-find, to merge components. This is because it's faster for joining and finding operations.

(Optional) Implementation of Kruskal's algorithm using union-find data structure

Here is the pseudocode:

function find_tree(graph):
    edges = sorted(graph.edges by weight) // Sort the edges by weight in non-decreasing order
    forest = UnionFind(length(graph)) // Initialize the union-find structure for the graph vertices
    tree_edges = [] // List to store the edges of the resulting MST

    // Iterate through each edge, in order of increasing weight
    for edge in edges:
        // Use union-find to check if the current edge's vertices belong to different components
        if forest.find(edge.start) != forest.find(edge.end):
            // If they are in different components, the edge does not form a cycle
            tree_edges.add(edge) // Add the current edge to the MST
            forest.union(edge.start, edge.end) // Unite the components of the two vertices

    // Return the list of edges that form the MST
    return tree_edges

// Union-find structure with path compression and union by rank optimizations
class UnionFind:
    function __init__(n):
        // Initialize parent pointers for each vertex
        self.parent = [i for i in [1, n]]
        // Initialize rank (tree height) for each vertex
        self.rank = [0] * n

    function find(vertex):
        // Find the root of the component containing 'vertex' with path compression
        if vertex != self.parent[vertex]:
            self.parent[vertex] = self.find(self.parent[vertex])
        return self.parent[vertex]

    function union(vertex1, vertex2):
        // Find the roots of the components for each vertex
        root1 = self.find(vertex1)
        root2 = self.find(vertex2)

        // If the roots are the same, the vertices are already connected
        if root1 == root2:
            return

        // Union by rank: attach the shorter tree to the root of the taller tree
        if self.rank[root1] < self.rank[root2]:
            self.parent[root1] = root2
        else if self.rank[root1] > self.rank[root2]:
            self.parent[root2] = root1
        else:
            // If ranks are the same, make one the root and increment its rank
            self.parent[root2] = root1
            self.rank[root1] += 1

Some points about the pseudocode

The graph.edges is a list of edges where each one has a start, end, and weight.
The sorted function sorts the edges by weight.
The UnionFind class is the union-find data structure. It has methods for find and union.
The forest variable is an instance of UnionFind, started with the graph's vertex count.
The tree_edges list holds the edges of the MST.
The find method finds the component root for a vertex and compresses paths for quicker future finds.
The union method joins two sets if they're separate, using rank to balance the trees.

Comparing Prim's and Kruskal's algorithms

Both Prim's and Kruskal's algorithms are greedy algorithms that find the minimum spanning tree (MST) of a connected, weighted graph. You might choose between the two based on the graph's specific features and how quickly the algorithm needs to run. Prim's algorithm is more efficient for dense graphs with many edges as it adds the smallest edge to the growing tree. Kruskal's algorithm works well for sparse graphs because it sorts edges by weight and joins trees in the forest using the union-find structure.

Below is a comparison of several key aspects of both algorithms:

Aspect	Prim's Algorithm	Kruskal's Algorithm
Starting Point	Begins with a single node, expanding the MST one edge at a time.	Starts with the edges, adding them one by one to grow the forest into an MST.
Data Structure	Typically uses a priority queue (min-heap) to track the edges.	Uses a disjoint-set data structure (union-find) or component array to manage the trees.
Edge Selection	Adds the smallest edge that connects a new vertex to the growing MST.	Adds the next smallest edge from the list that doesn't create a cycle.
Cycle Detection	Not required, as the MST always stays a tree.	Essential, since adding edges from the whole graph might make cycles.
Graph Type Preference	Works better for dense graphs with many edges.	Works better for sparse graphs with fewer edges compared to nodes.
Time Complexity	$O(E\cdot log V)$ using a binary heap, where E is edges and V is vertices.	$O(E\cdot log E)$ or $O(E\cdot log V)$ , usually similar but varies with the way it's done. However, it is $O(E\cdot log V)$ if implemented using the component array.
Space Complexity	$O(V)$ , since each vertex must be in the priority queue.	$O(E + V)$ , for storing edges and the union-find structure.
Implementation	A bit more complicated due to updating the priority queue.	Simpler, with straightforward sorting of edges and union-find steps.
Algorithm Type	Greedy algorithm.	Greedy algorithm.

Applications in Network Design

Minimum spanning trees (MSTs) are very helpful in network design in several areas, including computer networks, telecommunications, transportation, and utilities. Imagine you need to set up a fiber-optic network across a company's campus. MSTs can help you plan the best layout that uses the least amount of cable and costs the least, while also making sure the network is fast.

In the same way, in telecommunications, MSTs help operators build cell tower networks that save money and work well, with simple connections that make maintenance easier. In the field of transportation, MSTs help to design efficient transit systems and highways, cutting down on both building and maintenance costs. For utilities like water and power, MSTs guide the setup of supply grids that cause the least harm to the environment and keep operational costs low. These real-life examples show how MSTs are important in making infrastructure designs that are both technically solid and cost-effective.

Conclusion

In conclusion, the concept of a minimum spanning tree (MST) is key in network design and graph algorithms. It offers a way to achieve optimal connectivity with the least cost. By looking at Prim's and Kruskal's algorithms, you've learned about practical methods for building MSTs. Each method has its own benefits, depending on the graph. These algorithms show the power of greedy strategies and the role of data structures in making algorithms efficient.

Looking at how MSTs are used, it's clear their impact goes far beyond theory. They affect decisions in real-life projects, like infrastructure planning, creating networks, and cutting costs. Knowing about MSTs and how to use their algorithms gives you strong tools to solve complex problems. It's an important topic for anyone diving into advanced algorithms and optimization.

4 learners liked this piece of theory. 1 didn't like it. What about you?

Report a typo

Minimum spanning tree

Conceptualizing the Minimum Spanning Tree

Recognizing a Minimum Spanning Tree Problem

Algorithms for Finding a Minimum Spanning Tree

Comparing Prim's and Kruskal's algorithms

Applications in Network Design

Conclusion

Related topics