We already learned about connectivity in directed graphs, and in particular, discussed what a strongly connected component is. We also considered some simple examples of graphs and manually found strongly connected components for them. However, in real applications, graphs are usually huge and represented in the format of an adjacency matrix or an adjacency list. To find strongly connected components in such cases, we need to use more sophisticated approaches. In this topic, we will discuss Kosaraju's algorithm — one possible algorithm for finding strongly connected components in directed graphs.
Kosaraju's algorithm
Kosaraju's algorithm consists of two series of DFS and one simple graph transformation between them. We will see how it works step by step using the following directed graph as an example:
The first step is to run DFS and save output time for each node. After we do that starting with the node , we will get the following:
Here, the blue edges correspond to the DFS spanning tree, and the numbers in the rectangles show the output time for each node. For example, node was the first node DFS ended processing, node was the second, and node was the last.
Note: In DFS algorithms, "output time" is synonymous with "finish time"— marking the completion of a node's DFS traversal. In Kosaraju's algorithm, it denotes the order in which nodes finish, crucial for identifying strongly connected components in directed graphs.
While not universal across all DFS algorithms, the practice of tracking the order of node visits is common. You might encounter related concepts like "discovery time" (when a node is first visited) and "finish time" (when DFS completes processing a node). Understanding output time is key to navigating the intricacies of DFS, especially in the context of Kosaraju's algorithm.
Why output times?
Before we move on, let's discuss one important property of output times.
Assume that and are two strongly connected components in a directed graph. Suppose also that there is an edge from some node of to some node of . We will denote this edge as . Assume that we apply DFS for the graph and calculate output times for its nodes. Let be the maximum output time among all the nodes of and be the same for . It turns out that .
To understand why the property always holds, let's consider two possible ways we can visit the nodes of these components when applying DFS.
1. We start processing some node of first. Since there is no edge from to , we first consider all the nodes of and only then all the nodes of . Thus, the last node of will be processed after the last node of and therefore will be greater than .
2. We start processing some node of first. Then, since there is an edge from to , we will start visiting the nodes of at some moment. After we visit all the nodes of , we will return to and continue processing its remaining nodes. Thus, in this case, the last node of will be processed after the last node of as well.
Summing up, if there is an edge between two strongly connected components, then after applying DFS, we will always get .
This property gives us a clue about how strongly connected components can be found. Namely, let's consider the node with the maximum output time. According to the described property, this node belongs to a root component: one having no ingoing edges from other components. What we would like to do next is start DFS from this node and visit only the nodes of the root component. After that, we can apply DFS for the next unvisited node with the maximum output time and find the next component and so on until all components are found.
The structure that will help us perform such a traversal is a transposed graph: a graph with all edges' directions reversed. In such a graph, a root component does not have outgoing edges. Thus, if we apply DFS for the node with the maximum output time, we will visit only the nodes of this component. Then, we can continue applying DFS from the next unvisited node with the maximum output time and find the next component. Continuing the same procedure, we will finally find all strongly connected components of the graph.
Applying DFS for a transposed graph
Now, let's see how the described ideas work for our graph. Here is its transposed version:
First, we need to apply DFS starting with the node since it has the largest output time. Below is a graph after the first DFS iteration is completed:
As you can see, the only node visited after this iteration is . Thus, the first strongly connected component includes only one node. The next node to process according to the output times is . After we apply DFS for this node, we get the following:
The found connected component consists of nodes, , , and , shown in green. At the next step, we apply DFS for the node :
The nodes shown in red correspond to the found strongly connected component. After this step, the algorithm is finished since all the nodes are processed. The transposed graph consists of three strongly connected components, as well as the initial one.
Pseudocode
Having grasped the fundamental principles of Kosaraju's algorithm, let's now delve into its practical implementation. The following pseudocode outlines the step-by-step procedure for finding strongly connected components in a directed graph. Before we explore the algorithm's complexity, let's walk through the pseudocode and understand how each section contributes to the overall process.
// Function to perform Kosaraju's algorithm for finding strongly connected components
function kosaraju(graph):
// Step 1: Run DFS on the original graph and save output times
outputTimes = dfs(graph)
// Step 2: Transpose the graph
transposedGraph = transpose(graph)
// Step 3: Initialize variables for tracking visited nodes
visited = set()
// Step 4: Iterate through nodes in reverse order of output times
for node in reverseOrder(outputTimes):
if node not in visited:
// Step 5: Run DFS on the transposed graph for the current node
component = dfs(transposedGraph, startNode=node)
// Step 6: Process the strongly connected component (customize as needed)
// Here, we simply print it.
processComponent(component)
// Function to perform DFS on a graph and return output times
function dfs(graph):
outputTimes = {}
visited = set()
for each node in graph.nodes:
if node not in visited:
explore(graph, node, visited, outputTimes)
return outputTimes
// Function to perform DFS on a transposed graph starting from a specific node
function dfs(graph, startNode):
component = set()
visited = set()
explore(graph, startNode, visited, component)
return component
// Helper function for DFS exploration
function explore(graph, node, visited, outputTimes):
visited.add(node)
for neighbor in graph.neighbors(node):
if neighbor not in visited:
explore(graph, neighbor, visited, outputTimes)
// Assign output time to the current node
outputTimes[node] = currentTime++
// Function to transpose a graph
function transpose(graph):
transposedGraph = createEmptyGraph()
for edge in graph.edges:
transposedGraph.addEdge(edge.destination, edge.source)
return transposedGraph
// Function to process a strongly connected component (customize as needed)
function processComponent(component):
// Print or perform any desired operation on the strongly connected component
print("Found strongly connected component:", component)Complexity analysis
For a graph consisting of nodes and edges, the algorithm performs DFS twice, thus having time complexity.
Conclusion
To sum up, the steps of Kosaraju's algorithm for a graph are:
Run DFS for and save output time for each node.
Transpose .
Choose an unvisited node with the maximum output time and run DFS for the node of the transposed .
When DFS finishes the current iteration, save the nodes visited during the iteration as a strongly connected component.
Repeat steps 3-4 until the graph contains no unvisited nodes.
If you want to see another example of how the algorithm works, take a look at a visualization (choose SCC Algorithm -> Kosaraju's algorithm on the pane in the left lower corner.)