Computer scienceAlgorithms and Data StructuresAlgorithmsGraph algorithmsTopological sorting algorithms

DFS-based topological sort

10 minutes read

We are already familiar with the concept of topological sort and the numerous applications it has. We know it is widely used in project management, scheduling, and even on this platform. Indeed, the order in which users pass their topics is determined with the help of topological sort.

All this has been featured previously, however, we are missing one tiny detail: how to find the topological sort of large graphs algorithmically? Obviously, for such graphs, we cannot keep traversing vertices until we find a topological sort. This is exactly what we are going to learn in this topic: how to effectively find a topological sort of graph.

Topological sort in real life

Let's recall the definition: topological sorting of a graph is an order of nodes such that for each edge $(u, v) \in E$ the node $u$ comes before the node $v$ in the ordering. Throughout this topic, we will illustrate the boring theory with the example in this section.

The project manager's task is to distribute tasks and responsibilities among team members. Such a duty is not tricky if the project structure is tiny. However, when a huge company participates in a complex and long-lasting challenge, it is impossible not to define the rules for dividing missions between collaborators. Here, the team leader can confront a number of obstacles: how to define the order of employees doing their duties, how to set deadlines, is it possible for some workers to perform their tasks in parallel. There is no way to hold all these tasks, timings, and people in the mind. A directed acyclic graph can be a good helper for a confused project manager to present the topological sorting of workers & their tasks. For instance, let us imagine a graph of people involved in an innovative and big project of a well-known IT company.

Graph with workflow Seems impressive, isn't it? However, it is not even tough to figure out the project's pipeline, knowing that arrows direct from the person who assigns the task to the person who is responsible for completing it: First, junior developers and designers should start the process of production. Then they send their work and results to senior specialists, whose duty is to check the developed product and send it to the team leader. Only then, the project manager can delegate the process of advertising to the SMM team of the project. So here it is: we have considered the depth-first search for the graph to find the order of collaborators making their part of the project.

Description of the algorithm

Now it's time to suggest an algorithm for finding a topological order of a DAG. We will create an empty list L to store the topological sort and visit every node using the main procedure visit(n). The procedure is recursive and looks as below:

If the node is already visited, the procedure is done. Indeed, that means that its children have also been visited and added to the list (see below).
Otherwise, we repeat the procedure recursively for every child of our node.
If there are no recursive calls to be made, we mark our node as visited and add it at the beginning of the list. Note that this happens if the current node has no children, i.e., it is a leaf.

We repeat the procedure above until all the nodes have been visited and added to the list.

These steps can be easier to understand if we take a look at the pseudocode:

Initialize empty list L               // it will contain the sorted nodes in the end

while exist unvisited nodes do:
    select an unvisited node n
    visit(n)                          // our main procedure

function visit(node n):
    if n is visited then:
        return                        // it means we reached an already visited node

    for each node m, such that exists an edge (n, m):
        visit(m)                      // recursively repeat procedure for children of node n

    mark n as visited                 // if we reach a leaf, or all children have been visited
    add n to the beginning of L       // we are done with that node, add it to the sort

Do you recognize the algorithm? It is the Depth-first search, or shortly DFS, which is already familiar to you. DFS is one of the methods of graph traversal. The strategy of searching in-depth, as the name implies, is to go deep into the graph as far as possible.

Consequently, the time complexity for finding the topological order of the DAG with DFS depends only on the complexity of the DFS algorithms. Hence, it is equal to $O(\text{|V|}+ \text{|E|})$ , which means it is computed in a linear time depending on the number of vertices and edges.

Example: trees

Let's picture the order of team members to process their tasks

Topologically sorted graph

Red arrows illustrate the flow of the tasks which are delegated from chefs to their employees, and blue ones picture already processed tasks and results that are directed to the managers of the specific person.

It's time to illustrate the steps of the algorithm in this graph:

We start with Ben and call visit(Ben).
Ben is not visited and has one child node: Pam. We recursively apply visit(Pam).
Pam is also not visited and has two children nodes: Tom and Tim. We have to recursively apply visit(Tom) and visit(Tim). For the sake of understanding, let's proceed in detail only with Tim.
Tim is unvisited and has two adjacent nodes: Carlos and Adam. Apply visit(Carlos) and visit(Adam).
Carlos has no children, hence we can't keep calling recursively. It's time to mark Carlos as visited and add it to the list.
Similarly, with Adam. So far we have two visited nodes: Adam and Carlos, and the list with the topological sort is [Adam, Carlos] so far. Let's move on.
We are back to Tim since the recursive calls for its child nodes are done. We mark him as visited and add it to the list: [Tim, Adam, Carlos].
In the same way, we proceed with the left branch and get [Tom, Sam, John, Jim, Stella, Jack, Tim, Adam, Carlos]. Try it!
Finally, we are done with all the descendants of Pam: mark Pam as visited and add it to the list. Finally, the same thing is done with Ben. The final result, which is the topological sort of the graph, is as shown below: $[\text{Ben, Pam, Tom, Sam, John, Jim, Stella, Jack, Tim, Adam, Carlos}]$ .

As you see, we have applied graph traversal to the graph. We can even straighten our graph as it is shown below for a better view:

Straight graph

If you were an attentive reader, you might have noticed an ambiguous phrase: this is the topological sort of our graph. In fact, this is a topological sort of our graph. As we mentioned in our previous topic on topological sort, a graph can have more than one topological sort. Indeed, here is another topological sort of the graph above: $[\text{Ben, Pam, Tim, Adam, Carlos, Tom, Sam, John, Jim, Stella, Jack}]$ .

There are many more possible sorts, find them!

Another example: general DAG

The structure of the project above is pretty clear to understand. However, what are we going to do if nodes are more dependent on each other? In other words, a DAG doesn't necessarily have to be a tree. In this case, the first step of the procedure visit() will come into the game. Let's take a look at a simple example: the order of topics.

Directed graph of topics in Hyperskill

The graph above is interesting, as it has many orders of choosing nodes during the algorithm, however, the result will be the same. Let's convince ourselves that the order doesn't matter and that the algorithm works correctly. As always, we apply the procedure to each node. Let's start with node S:

This node has two children: D and J. Apply visit(D) and visit(J). We first proceed with the first one:
D is unvisited and has two children: J and C. Apply visit(J) and visit(C). Let's go with J:
J is unvisited and has one child: C. Apply visit(C):
C is unvisited and has no children: mark it as visited and add it to the list: [C].
We are back at J (from visit(J) in 2). Mark J as visited, add to the list: [J, C].
We have C again (from visit(C) in 2). But C is visited, hence we do nothing. Same with visit(J) in 1.
Back to D: mark as visited, add to the list: [D, J, C].
Similarly, we do the same thing with S and get [S, D, J, C].

Let's say that we go with visit(J) during the first step, and then with visit(D):

J is unvisited and has one child: C. Apply visit(C):
C is unvisited and has no children: mark it as visited and add it to the list: [C].
Back to J, mark it as visited, and add it to the list: [J, C].
Now we finally apply visit(D), but its children are visited, hence we mark D as visited, and add it to the list: [D, J, C].
Finally, we mark S as visited and add to the list: [S, D, J, C].

So, we followed two different paths and ended up with the same topological sort. There are other ways to traverse this graph. However, in this specific case, there is only one topological sort of graph, hence, the result will be always the same.

Conclusion

In this topic, you looked at the DFS-based topological sorting algorithm, which is one of the most used and certain methods for searching the topological order. Here is a summary of what we have looked at:

DFS is a recursive algorithm for traversing the graph, and it is especially applicable for DAGs to find their topological sortings.
Depth-first search is performed with linear time, which depends on the number of vertices and edges of a directed acyclic graph.
Such a way of discovering topological sorting applies to lots of scheduling tasks, such as project management.

Now it is time to consolidate gathered knowledge with the help of the following tasks and ensuing topics!

9 learners liked this piece of theory. 0 didn't like it. What about you?

Report a typo