While creating various programs from social networks to route planners, you are likely to come across the problem of finding the strongly connected components (SCC) of a graph. Therefore, it is necessary to be familiar with possible solutions. Even though there are different approaches to this problem, Tarjan's algorithm sometimes may be more effective than the alternatives. The reason is that it may work faster than some other known SCC-finding algorithms, which is crucial when working with large amounts of data. So, why don't we dive into exploring the algorithm already?
Algorithm description
First of all, let's define some terms that you will come across in the description:
- Entering time of the node is the time that passes from the moment the algorithm starts until a certain node is visited. You can count it the following way: in the beginning of the algorithm, you initialize the variable storing the time that passes, with 0. Then, after entering each new unvisited node, you increase it by one.
- DFS subgraph is a graph consisting of the nodes that are visited during DFS that started from a given node, and edges to them.
- Root of the SCC is the node from the SCC that was entered the earliest.
The algorithm itself is based on DFS. But before discussing the algorithm, you need to go through additional values that have to be stored during the algorithm's execution.
- First is the entering time of the graph's nodes. Let's name the corresponding array .
- Then goes the array . It consists of the smallest entering times of the nodes that are reachable through the DFS subgraph of the current node. Confusing? Let's take a look at an example.
Say, you enter node and start or continue your DFS. This node has unvisited neighbors: nodes and . These nodes also have unvisited neighbors: nodes and . The nodes , , , and are reachable through the DFS subgraph of the node . Therefore, .
For example, you have this graph:
Initially, for each node equals its entering time. Entering times for are, for example, , respectively. After all the neighbors of a certain node are visited, you start to update this node's .
and don't have neighbors, so their will remain the same: and , respectively. You can find for the rest of the nodes in the following way:
- The final value you will need is the stack containing the nodes that have been explored but have not been assigned to any SCC yet.
After establishing all the values that need to be stored, you can finally move to examining the algorithm itself.
It consists of several steps that are repeated recursively. First of all, you need to start the DFS. The node, from which the DFS starts, is chosen randomly. Then you need to perform the following steps on each of the newly visited nodes. Let's name the current node , its neighbors would be named :
- Set the time when the node was entered ().
- Set the smallest entering time of the node (). For now, it will equal the time when it was entered.
- Push the current node onto the stack.
- Recursively call the function for the neighbors of the current node that have not been visited yet. Update if node is currently located on the stack or has not been visited yet: .
- Finally, check if equals . If it does, then the current node is the root of the SCC, and you can extract the current SCC from the stack. To do so, you need to pop the values from the stack, as long as the last popped value is not equal to the root of the SCC. In this case, the root of the SCC equals the current node. If the current node is not the root, you do nothing and just exit this recursive call of our function.
Just as you exit the last remaining recursive call, the algorithm is finished after finding all the strongly connected components.
Example
To make sure that the idea behind the algorithm is clear enough, let's go through the steps using an actual example. Don't forget about the stored values: they will be updated after every step in a table, according to the names stated in the previous section.
Suppose we need to find the SCCs of the graph pictured below.
Some comments on the following illustrations: if a node has not been visited yet, it is white. If a node is colored blue, it was visited and is currently located on the stack. If it is colored in any other way, it is a part of some SCC. The nodes from the same SCCs are of the same color. Also, you will see red edges: it means that you have just gone through it from one node to another. The numbers above each node are and . For now, all the nodes are marked as unvisited, and the stack is empty. Current entering time is 0, because you haven't entered any nodes yet and haven't had a chance to increase the respective variable. Let's define it as follows: .
First of all, you need to start the DFS. As you enter node 1, you assign the value stored in to and , push the current node onto the stack, mark it as visited, and increase : .
Now you enter the neighbors of this node. Let's look at the node 2 first. Just like during the previous step, you assign the value to and , push the current node onto the stack and mark it as visited. .
Just like before, you enter node 4 (since it is a neighbor of the second node) and 3 (since it is a neighbor of node 4). The following steps are identical to the already taken ones, so perform them on your own and check the received values:
Now, the interesting part begins. Node 3 has only one neighbor – node 2, which has already been visited. Therefore, there is no need to enter it, and you can just check whether node 2 is located on the stack. It actually is, and, additionally, is less than , so you should update the value for node 3: .
You can see that does not equal , which means that node 3 is not the root of the current SCC. Therefore, you can exit this recursive call and go back to node 4.
Once again, you update , as .
doesn't equal , which makes it obvious that node 4 is not the root of the current SCC.
Just like that, you go back to the node with the number 2. You don't need to update , as it is not smaller than . However, another twist happens here: it turns out that equals . Hooray, you have finally found the root of the current SCC! Now you can pop the values from the stack, until you reach the root, and find your first SCC: {}. Let's color these nodes purple.
You're about to find the second SCC, as well: node 1 doesn't have any more unvisited neighbors, so you just update and check if the equation is true. Indeed, it is, which means that node 1 is the root of the next SCC = {}. Let's color it green.
Identically, you will find the last SCC = {}.
Finally! You have found all the SCCs and, just like that, the algorithm is finished.
You may also find this visualization useful. In order to see the visualization, click on the 'SCC Algorithms' in the bottom-left part of your screen, and then on the 'Tarjan's algorithm'.
Time complexity
As during Tarjan's algorithm you use DFS, you need to go through all the nodes and edges. Therefore, its complexity is , where and are the numbers of nodes and edges, respectively. Even though the complexity of this algorithm and some alternative ones are the same, Tarjan's algorithm may perform better, as it needs only one series of DFS, which decreases the constant.
Pseudocode
class FindScc is
// initializing the values
cur_time = 0
time = [-1] * n
small_time = [-1] * n
visited = [false] * n
graph = initGraph()
stack = initStack()
method tarjan(v) is
// assigning the values
visited[v] = true
time[v] = cur_time
small_time[v] = cur_time
++cur_time
stack.push(v)
for (w in graph[v]):
if (not visited[w]) then:
tarjan(w) // recursively calling the method for an unvisited neighbor
small_time[v] = min(small_time[v], small_time[w])
else:
if (w is in stack) then:
// if a visited neighbor is still located on the stack, it belongs to the same SCC, so we update the value
small_time[v] = min(small_time[v], small_time[w])
if (time[v] == small_time[v]) then:
while (stack.top() != v): // extracting the current SCC from the stack
print(stack.pop())
print(stack.pop()) // extracting the root of the SCC
method dfs() is
for (int i = 0; i < n; ++i):
if (not visited[i]) then: // starting DFS on an unvisited node
tarjan(i)
method initGraph() is ...
method initStack() is ...Conclusion
To sum up all the information, let's point out the main facts concerning Tarjan's algorithm:
- You can use it to find strongly connected components.
- It requires some additional stored values.
- It uses DFS in order to visit the nodes of the graph.
- It has a complexity of .
- Tarjan's algorithm may work faster than some other SCC-finding algorithms due to a smaller constant.
- The algorithm works as follows: during DFS, newly visited nodes are pushed onto the stack, and entering and smallest entering times are assigned. After visiting the neighbors of the node, the smallest entering time is updated as a minimum between the current value and the smallest entering time of a neighbor. Then the entering time and the smallest entering time are compared: if the values are equal, you have found the root of the current SCC, and can pop it from the stack. By repeating these steps, you find all the SCCs.