Computer scienceAlgorithms and Data StructuresAlgorithmsGraph algorithmsConnectivity algorithms

Kosaraju's algorithm

12 minutes read

We already learned about connectivity in directed graphs, and in particular, discussed what a strongly connected component is. We also considered some simple examples of graphs and manually found strongly connected components for them. However, in real applications, graphs are usually huge and represented in the format of an adjacency matrix or an adjacency list. To find strongly connected components in such cases, we need to use more sophisticated approaches. In this topic, we will discuss Kosaraju's algorithm — one possible algorithm for finding strongly connected components in directed graphs.

Kosaraju's algorithm

Kosaraju's algorithm consists of two series of DFS and one simple graph transformation between them. We will see how it works step by step using the following directed graph as an example:

Directed graph and Kosaraju's algorithm

The first step is to run DFS and save output time for each node. After we do that starting with the node 00, we will get the following:

The first step is to run DFS

Here, the blue edges correspond to the DFS spanning tree, and the numbers in the rectangles show the output time for each node. For example, node 22 was the first node DFS ended processing, node 66 was the second, and node 00 was the last.

Note: In DFS algorithms, "output time" is synonymous with "finish time"— marking the completion of a node's DFS traversal. In Kosaraju's algorithm, it denotes the order in which nodes finish, crucial for identifying strongly connected components in directed graphs.

While not universal across all DFS algorithms, the practice of tracking the order of node visits is common. You might encounter related concepts like "discovery time" (when a node is first visited) and "finish time" (when DFS completes processing a node). Understanding output time is key to navigating the intricacies of DFS, especially in the context of Kosaraju's algorithm.

Why output times?

Before we move on, let's discuss one important property of output times.

Assume that C1C_1 and C2C_2 are two strongly connected components in a directed graph. Suppose also that there is an edge from some node of C1C_1 to some node of C2C_2. We will denote this edge as (C1,C2)(C_1, C_2). Assume that we apply DFS for the graph and calculate output times for its nodes. Let Tout1Tout_1 be the maximum output time among all the nodes of C1C_1 and Tout2Tout_2 be the same for C2C_2. It turns out that Tout1>Tout2Tout_1 > Tout_2.

To understand why the property always holds, let's consider two possible ways we can visit the nodes of these components when applying DFS.

1. We start processing some node of C2C_2 first. Since there is no edge from C2C_2 to C1C_1, we first consider all the nodes of C2C_2 and only then all the nodes of C1C_1. Thus, the last node of C1C_1 will be processed after the last node of C2C_2 and therefore Tout1Tout_1 will be greater than Tout2Tout_2.

2. We start processing some node of C1C_1 first. Then, since there is an edge from C1C_1 to C2C_2, we will start visiting the nodes of C2C_2 at some moment. After we visit all the nodes of C2C_2, we will return to C1C_1 and continue processing its remaining nodes. Thus, in this case, the last node of C1C_1 will be processed after the last node of C2C_2 as well.

Summing up, if there is an edge (C1,C2)(C_1, C_2) between two strongly connected components, then after applying DFS, we will always get Tout1>Tout2Tout_1 > Tout_2.

This property gives us a clue about how strongly connected components can be found. Namely, let's consider the node with the maximum output time. According to the described property, this node belongs to a root component: one having no ingoing edges from other components. What we would like to do next is start DFS from this node and visit only the nodes of the root component. After that, we can apply DFS for the next unvisited node with the maximum output time and find the next component and so on until all components are found.

The structure that will help us perform such a traversal is a transposed graph: a graph with all edges' directions reversed. In such a graph, a root component does not have outgoing edges. Thus, if we apply DFS for the node with the maximum output time, we will visit only the nodes of this component. Then, we can continue applying DFS from the next unvisited node with the maximum output time and find the next component. Continuing the same procedure, we will finally find all strongly connected components of the graph.

Applying DFS for a transposed graph

Now, let's see how the described ideas work for our graph. Here is its transposed version:

the transposed version of the graph

First, we need to apply DFS starting with the node 00 since it has the largest output time. Below is a graph after the first DFS iteration is completed:

graph after the first DFS iteration is completed

As you can see, the only node visited after this iteration is 00. Thus, the first strongly connected component includes only one node. The next node to process according to the output times is 11. After we apply DFS for this node, we get the following:

after the previous iteration of DFS:

The found connected component consists of 33 nodes, 11, 33, and 22, shown in green. At the next step, we apply DFS for the node 44:

At the next step, we apply DFS for the node 4

The nodes shown in red correspond to the found strongly connected component. After this step, the algorithm is finished since all the nodes are processed. The transposed graph consists of three strongly connected components, as well as the initial one.

Pseudocode

Having grasped the fundamental principles of Kosaraju's algorithm, let's now delve into its practical implementation. The following pseudocode outlines the step-by-step procedure for finding strongly connected components in a directed graph. Before we explore the algorithm's complexity, let's walk through the pseudocode and understand how each section contributes to the overall process.

// Function to perform Kosaraju's algorithm for finding strongly connected components
function kosaraju(graph):
    // Step 1: Run DFS on the original graph and save output times
    outputTimes = dfs(graph)

    // Step 2: Transpose the graph
    transposedGraph = transpose(graph)

    // Step 3: Initialize variables for tracking visited nodes
    visited = set()

    // Step 4: Iterate through nodes in reverse order of output times
    for node in reverseOrder(outputTimes):
        if node not in visited:
            // Step 5: Run DFS on the transposed graph for the current node
            component = dfs(transposedGraph, startNode=node)

            // Step 6: Process the strongly connected component (customize as needed)
            // Here, we simply print it.
            processComponent(component)

// Function to perform DFS on a graph and return output times
function dfs(graph):
    outputTimes = {}
    visited = set()

    for each node in graph.nodes:
        if node not in visited:
            explore(graph, node, visited, outputTimes)

    return outputTimes

// Function to perform DFS on a transposed graph starting from a specific node
function dfs(graph, startNode):
    component = set()
    visited = set()

    explore(graph, startNode, visited, component)

    return component

// Helper function for DFS exploration
function explore(graph, node, visited, outputTimes):
    visited.add(node)

    for neighbor in graph.neighbors(node):
        if neighbor not in visited:
            explore(graph, neighbor, visited, outputTimes)

    // Assign output time to the current node
    outputTimes[node] = currentTime++

// Function to transpose a graph
function transpose(graph):
    transposedGraph = createEmptyGraph()

    for edge in graph.edges:
        transposedGraph.addEdge(edge.destination, edge.source)

    return transposedGraph

// Function to process a strongly connected component (customize as needed)
function processComponent(component):
    // Print or perform any desired operation on the strongly connected component
    print("Found strongly connected component:", component)

Complexity analysis

For a graph GG consisting of nn nodes and mm edges, the algorithm performs DFS twice, thus having O(n+m)O(n + m) time complexity.

Conclusion

To sum up, the steps of Kosaraju's algorithm for a graph GG are:

  1. Run DFS for GG and save output time for each node.

  2. Transpose GG.

  3. Choose an unvisited node with the maximum output time and run DFS for the node of the transposed GG.

  4. When DFS finishes the current iteration, save the nodes visited during the iteration as a strongly connected component.

  5. Repeat steps 3-4 until the graph contains no unvisited nodes.

If you want to see another example of how the algorithm works, take a look at a visualization (choose SCC Algorithm -> Kosaraju's algorithm on the pane in the left lower corner.)

32 learners liked this piece of theory. 4 didn't like it. What about you?
Report a typo