Computer scienceAlgorithms and Data StructuresAlgorithmsGraph algorithmsConnectivity algorithms

Kosaraju's algorithm

13 minutes read

We already learned about connectivity in directed graphs, and in particular, discussed what a strongly connected component is. We also considered some simple examples of graphs and manually found strongly connected components for them. However, in real applications, graphs are usually huge and represented in the format of an adjacency matrix or an adjacency list. To find strongly connected components in such cases, we need to use more sophisticated approaches. In this topic, we will discuss Kosaraju's algorithm — one possible algorithm for finding strongly connected components in directed graphs.

Kosaraju's algorithm

Kosaraju's algorithm consists of two series of DFS and one simple graph transformation between them. We will see how it works step by step using the following directed graph as an example:

Directed graph and Kosaraju's algorithm

The first step is to run DFS and save output time for each node. After we do that starting with the node $0$ , we will get the following:

The first step is to run DFS

Here, the blue edges correspond to the DFS spanning tree, and the numbers in the rectangles show the output time for each node. For example, node $2$ was the first node DFS ended processing, node $6$ was the second, and node $0$ was the last.

Note: In DFS algorithms, "output time" is synonymous with "finish time"— marking the completion of a node's DFS traversal. In Kosaraju's algorithm, it denotes the order in which nodes finish, crucial for identifying strongly connected components in directed graphs.

While not universal across all DFS algorithms, the practice of tracking the order of node visits is common. You might encounter related concepts like "discovery time" (when a node is first visited) and "finish time" (when DFS completes processing a node). Understanding output time is key to navigating the intricacies of DFS, especially in the context of Kosaraju's algorithm.

Why output times?

Before we move on, let's discuss one important property of output times.

Assume that $C_1$ and $C_2$ are two strongly connected components in a directed graph. Suppose also that there is an edge from some node of $C_1$ to some node of $C_2$ . We will denote this edge as $(C_1, C_2)$ . Assume that we apply DFS for the graph and calculate output times for its nodes. Let $Tout_1$ be the maximum output time among all the nodes of $C_1$ and $Tout_2$ be the same for $C_2$ . It turns out that $Tout_1 > Tout_2$ .

To understand why the property always holds, let's consider two possible ways we can visit the nodes of these components when applying DFS.

1. We start processing some node of $C_2$ first. Since there is no edge from $C_2$ to $C_1$ , we first consider all the nodes of $C_2$ and only then all the nodes of $C_1$ . Thus, the last node of $C_1$ will be processed after the last node of $C_2$ and therefore $Tout_1$ will be greater than $Tout_2$ .

2. We start processing some node of $C_1$ first. Then, since there is an edge from $C_1$ to $C_2$ , we will start visiting the nodes of $C_2$ at some moment. After we visit all the nodes of $C_2$ , we will return to $C_1$ and continue processing its remaining nodes. Thus, in this case, the last node of $C_1$ will be processed after the last node of $C_2$ as well.

Summing up, if there is an edge $(C_1, C_2)$ between two strongly connected components, then after applying DFS, we will always get $Tout_1 > Tout_2$ .

This property gives us a clue about how strongly connected components can be found. Namely, let's consider the node with the maximum output time. According to the described property, this node belongs to a root component: one having no ingoing edges from other components. What we would like to do next is start DFS from this node and visit only the nodes of the root component. After that, we can apply DFS for the next unvisited node with the maximum output time and find the next component and so on until all components are found.

The structure that will help us perform such a traversal is a transposed graph: a graph with all edges' directions reversed. In such a graph, a root component does not have outgoing edges. Thus, if we apply DFS for the node with the maximum output time, we will visit only the nodes of this component. Then, we can continue applying DFS from the next unvisited node with the maximum output time and find the next component. Continuing the same procedure, we will finally find all strongly connected components of the graph.

Applying DFS for a transposed graph

Now, let's see how the described ideas work for our graph. Here is its transposed version:

the transposed version of the graph

First, we need to apply DFS starting with the node $0$ since it has the largest output time. Below is a graph after the first DFS iteration is completed:

graph after the first DFS iteration is completed

As you can see, the only node visited after this iteration is $0$ . Thus, the first strongly connected component includes only one node. The next node to process according to the output times is $1$ . After we apply DFS for this node, we get the following:

after the previous iteration of DFS:

The found connected component consists of $3$ nodes, $1$ , $3$ , and $2$ , shown in green. At the next step, we apply DFS for the node $4$ :

At the next step, we apply DFS for the node 4

The nodes shown in red correspond to the found strongly connected component. After this step, the algorithm is finished since all the nodes are processed. The transposed graph consists of three strongly connected components, as well as the initial one.

Pseudocode

Having grasped the fundamental principles of Kosaraju's algorithm, let's now delve into its practical implementation. The following pseudocode outlines the step-by-step procedure for finding strongly connected components in a directed graph. Before we explore the algorithm's complexity, let's walk through the pseudocode and understand how each section contributes to the overall process.

// Function to perform Kosaraju's algorithm for finding strongly connected components
function kosaraju(graph):
    // Step 1: Run DFS on the original graph and save output times
    outputTimes = dfs(graph)

    // Step 2: Transpose the graph
    transposedGraph = transpose(graph)

    // Step 3: Initialize variables for tracking visited nodes
    visited = set()

    // Step 4: Iterate through nodes in reverse order of output times
    for node in reverseOrder(outputTimes):
        if node not in visited:
            // Step 5: Run DFS on the transposed graph for the current node
            component = dfs(transposedGraph, startNode=node)

            // Step 6: Process the strongly connected component (customize as needed)
            // Here, we simply print it.
            processComponent(component)

// Function to perform DFS on a graph and return output times
function dfs(graph):
    outputTimes = {}
    visited = set()

    for each node in graph.nodes:
        if node not in visited:
            explore(graph, node, visited, outputTimes)

    return outputTimes

// Function to perform DFS on a transposed graph starting from a specific node
function dfs(graph, startNode):
    component = set()
    visited = set()

    explore(graph, startNode, visited, component)

    return component

// Helper function for DFS exploration
function explore(graph, node, visited, outputTimes):
    visited.add(node)

    for neighbor in graph.neighbors(node):
        if neighbor not in visited:
            explore(graph, neighbor, visited, outputTimes)

    // Assign output time to the current node
    outputTimes[node] = currentTime++

// Function to transpose a graph
function transpose(graph):
    transposedGraph = createEmptyGraph()

    for edge in graph.edges:
        transposedGraph.addEdge(edge.destination, edge.source)

    return transposedGraph

// Function to process a strongly connected component (customize as needed)
function processComponent(component):
    // Print or perform any desired operation on the strongly connected component
    print("Found strongly connected component:", component)

Complexity analysis

For a graph $G$ consisting of $n$ nodes and $m$ edges, the algorithm performs DFS twice, thus having $O(n + m)$ time complexity.

Conclusion

To sum up, the steps of Kosaraju's algorithm for a graph $G$ are:

Run DFS for $G$ and save output time for each node.
Transpose $G$ .
Choose an unvisited node with the maximum output time and run DFS for the node of the transposed $G$ .
When DFS finishes the current iteration, save the nodes visited during the iteration as a strongly connected component.
Repeat steps 3-4 until the graph contains no unvisited nodes.

If you want to see another example of how the algorithm works, take a look at a visualization (choose SCC Algorithm -> Kosaraju's algorithm on the pane in the left lower corner.)

32 learners liked this piece of theory. 4 didn't like it. What about you?

Report a typo

Kosaraju's algorithm

Kosaraju's algorithm

Why output times?

Applying DFS for a transposed graph

Pseudocode

Complexity analysis

Conclusion

Related topics