Computer scienceAlgorithms and Data StructuresAlgorithmsGraph algorithmsConnectivity algorithms

Tarjan's algorithm

17 minutes read

While creating various programs from social networks to route planners, you are likely to come across the problem of finding the strongly connected components (SCC) of a graph. Therefore, it is necessary to be familiar with possible solutions. Even though there are different approaches to this problem, Tarjan's algorithm sometimes may be more effective than the alternatives. The reason is that it may work faster than some other known SCC-finding algorithms, which is crucial when working with large amounts of data. So, why don't we dive into exploring the algorithm already?

Algorithm description

First of all, let's define some terms that you will come across in the description:

Entering time of the node is the time that passes from the moment the algorithm starts until a certain node is visited. You can count it the following way: in the beginning of the algorithm, you initialize the variable storing the time that passes, with 0. Then, after entering each new unvisited node, you increase it by one.
DFS subgraph is a graph consisting of the nodes that are visited during DFS that started from a given node, and edges to them.
Root of the SCC is the node from the SCC that was entered the earliest.

The algorithm itself is based on DFS. But before discussing the algorithm, you need to go through additional values that have to be stored during the algorithm's execution.

First is the entering time of the graph's nodes. Let's name the corresponding array ${time}$ .
Then goes the array ${small\_time}$ . It consists of the smallest entering times of the nodes that are reachable through the DFS subgraph of the current node. Confusing? Let's take a look at an example.

Say, you enter node ${A}$ and start or continue your DFS. This node has unvisited neighbors: nodes ${B}$ and ${C}$ . These nodes also have unvisited neighbors: nodes ${D}$ and ${E}$ . The nodes ${B}$ , ${C}$ , ${D}$ , and ${E}$ are reachable through the DFS subgraph of the node ${A}$ . Therefore, ${small\_time(A) = min(small\_time(A), small\_time(B), small\_time(C), small\_time(D), small\_time(E))}$ .

For example, you have this graph:

You have this graph

Initially, ${small\_time}$ for each node equals its entering time. Entering times for ${A, B, C, D, E}$ are, for example, ${0, 1, 3, 2, 4}$ , respectively. After all the neighbors of a certain node are visited, you start to update this node's ${small\_time}$ .

${D}$ and ${E}$ don't have neighbors, so their ${small\_time}$ will remain the same: ${2}$ and ${4}$ , respectively. You can find ${small\_time}$ for the rest of the nodes in the following way:

${small\_time(C) = min(small\_time(C), small\_time(E)) = min(3, 4) = 3}$

${small\_time(B) = min(small\_time(B), small\_time(D)) = min(1, 2) = 1}$

${small\_time(A) = min(small\_time(A), small\_time(B), small\_time(C)) = min(0, 1, 3) = 0}$

The final value you will need is the stack containing the nodes that have been explored but have not been assigned to any SCC yet.

After establishing all the values that need to be stored, you can finally move to examining the algorithm itself.

It consists of several steps that are repeated recursively. First of all, you need to start the DFS. The node, from which the DFS starts, is chosen randomly. Then you need to perform the following steps on each of the newly visited nodes. Let's name the current node ${v}$ , its neighbors would be named ${w}$ :

Set the time when the node was entered ( ${time[v]}$ ).
Set the smallest entering time of the node ( ${small\_time[v]}$ ). For now, it will equal the time when it was entered.
Push the current node onto the stack.
Recursively call the function for the neighbors of the current node that have not been visited yet. Update ${small\_time[v]}$ if node $w$ is currently located on the stack or has not been visited yet: ${small\_time[v] = min(small\_time[v], small\_time[w])}$ .
Finally, check if ${time[v]}$ equals ${small\_time[v]}$ . If it does, then the current node is the root of the SCC, and you can extract the current SCC from the stack. To do so, you need to pop the values from the stack, as long as the last popped value is not equal to the root of the SCC. In this case, the root of the SCC equals the current node. If the current node is not the root, you do nothing and just exit this recursive call of our function.

Just as you exit the last remaining recursive call, the algorithm is finished after finding all the strongly connected components.

Example

To make sure that the idea behind the algorithm is clear enough, let's go through the steps using an actual example. Don't forget about the stored values: they will be updated after every step in a table, according to the names stated in the previous section.

Suppose we need to find the SCCs of the graph pictured below.

Example of Tarjan's algorithm

Some comments on the following illustrations: if a node has not been visited yet, it is white. If a node is colored blue, it was visited and is currently located on the stack. If it is colored in any other way, it is a part of some SCC. The nodes from the same SCCs are of the same color. Also, you will see red edges: it means that you have just gone through it from one node to another. The numbers above each node are ${time}$ and ${small\_time}$ . For now, all the nodes are marked as unvisited, and the stack is empty. Current entering time is 0, because you haven't entered any nodes yet and haven't had a chance to increase the respective variable. Let's define it as follows: ${cur\_time = 0}$ .

First of all, you need to start the DFS. As you enter node 1, you assign the value stored in ${cur\_time = 0}$ to ${time[1]}$ and ${small\_time[1]}$ , push the current node onto the stack, mark it as visited, and increase ${cur\_time}$ : ${cur\_time = 1}$ .

Tarjan's algorithm, first step

Now you enter the neighbors of this node. Let's look at the node 2 first. Just like during the previous step, you assign the ${cur\_time = 1}$ value to ${time[2]}$ and ${small\_time[2]}$ , push the current node onto the stack and mark it as visited. ${cur\_time = 2}$ .

Tarjan's algorithm. Second step

Just like before, you enter node 4 (since it is a neighbor of the second node) and 3 (since it is a neighbor of node 4). The following steps are identical to the already taken ones, so perform them on your own and check the received values:

Tarjan's algorithm. Third step

Tarjan's algorithm. Next step

Now, the interesting part begins. Node 3 has only one neighbor – node 2, which has already been visited. Therefore, there is no need to enter it, and you can just check whether node 2 is located on the stack. It actually is, and, additionally, ${small\_time[2] = 1}$ is less than ${small\_time[3] = 3}$ , so you should update the value for node 3: ${small\_time[3] = 1}$ .

Tarjan's algorithm. Fourth step

You can see that ${time[3] = 3}$ does not equal ${small\_time[3] = 1}$ , which means that node 3 is not the root of the current SCC. Therefore, you can exit this recursive call and go back to node 4.

Once again, you update ${small\_time[4]}$ , as ${small\_time[3] = 1 < small\_time[4] = 2}$ .

Tarjan's algorithm. Fifth step

$time[4] = 2$ doesn't equal ${small\_time[4]}$ , which makes it obvious that node 4 is not the root of the current SCC.

Just like that, you go back to the node with the number 2. You don't need to update $small\_time[2] = 1$ , as it is not smaller than $small\_time[4] = 1$ . However, another twist happens here: it turns out that $small\_time[2] = 1$ equals $time[2] = 1$ . Hooray, you have finally found the root of the current SCC! Now you can pop the values from the stack, until you reach the root, and find your first SCC: { ${3, 4, 2}$ }. Let's color these nodes purple.

You go back to the node with the number 2

You're about to find the second SCC, as well: node 1 doesn't have any more unvisited neighbors, so you just update ${small\_time[1] (small\_time[1] = min(small\_time[1], small\_time[2]) = 0)}$ and check if the equation ${small\_time[1] = time[1]}$ is true. Indeed, it is, which means that node 1 is the root of the next SCC = { $1$ }. Let's color it green.

You're about to find the second SCC

Identically, you will find the last SCC = { $5$ }.

Finding the last SCC

The algorithm is finished

Finally! You have found all the SCCs and, just like that, the algorithm is finished.

You may also find this visualization useful. In order to see the visualization, click on the 'SCC Algorithms' in the bottom-left part of your screen, and then on the 'Tarjan's algorithm'.

Time complexity

As during Tarjan's algorithm you use DFS, you need to go through all the nodes and edges. Therefore, its complexity is $O(|E| + |V|)$ , where $|V|$ and $|E|$ are the numbers of nodes and edges, respectively. Even though the complexity of this algorithm and some alternative ones are the same, Tarjan's algorithm may perform better, as it needs only one series of DFS, which decreases the constant.

Pseudocode

class FindScc is 
  // initializing the values
  cur_time = 0
  time = [-1] * n
  small_time = [-1] * n
  visited = [false] * n
  graph = initGraph()
  stack = initStack()

  method tarjan(v) is
    // assigning the values
    visited[v] = true
    time[v] = cur_time
    small_time[v] = cur_time
    ++cur_time
    stack.push(v)

    for (w in graph[v]):
      if (not visited[w]) then:
        tarjan(w) // recursively calling the method for an unvisited neighbor
        small_time[v] = min(small_time[v], small_time[w])
      else:
        if (w is in stack) then: 
        // if a visited neighbor is still located on the stack, it belongs to the same SCC, so we update the value
        small_time[v] = min(small_time[v], small_time[w])

    if (time[v] == small_time[v]) then:
      while (stack.top() != v): // extracting the current SCC from the stack
        print(stack.pop())
      print(stack.pop()) // extracting the root of the SCC

  method dfs() is
    for (int i = 0; i < n; ++i):
      if (not visited[i]) then: // starting DFS on an unvisited node
        tarjan(i)

  method initGraph() is ...

  method initStack() is ...

Conclusion

To sum up all the information, let's point out the main facts concerning Tarjan's algorithm:

You can use it to find strongly connected components.
It requires some additional stored values.
It uses DFS in order to visit the nodes of the graph.
It has a complexity of $O(|E| + |V|)$ .
Tarjan's algorithm may work faster than some other SCC-finding algorithms due to a smaller constant.
The algorithm works as follows: during DFS, newly visited nodes are pushed onto the stack, and entering and smallest entering times are assigned. After visiting the neighbors of the node, the smallest entering time is updated as a minimum between the current value and the smallest entering time of a neighbor. Then the entering time and the smallest entering time are compared: if the values are equal, you have found the root of the current SCC, and can pop it from the stack. By repeating these steps, you find all the SCCs.

13 learners liked this piece of theory. 3 didn't like it. What about you?

Report a typo