We are already familiar with the concept of topological sort and the numerous applications it has. We know it is widely used in project management, scheduling, and even on this platform. Indeed, the order in which users pass their topics is determined with the help of topological sort.
All this has been featured previously, however, we are missing one tiny detail: how to find the topological sort of large graphs algorithmically? Obviously, for such graphs, we cannot keep traversing vertices until we find a topological sort. This is exactly what we are going to learn in this topic: how to effectively find a topological sort of graph.
Topological sort in real life
Let's recall the definition: topological sorting of a graph is an order of nodes such that for each edge the node comes before the node in the ordering. Throughout this topic, we will illustrate the boring theory with the example in this section.
The project manager's task is to distribute tasks and responsibilities among team members. Such a duty is not tricky if the project structure is tiny. However, when a huge company participates in a complex and long-lasting challenge, it is impossible not to define the rules for dividing missions between collaborators. Here, the team leader can confront a number of obstacles: how to define the order of employees doing their duties, how to set deadlines, is it possible for some workers to perform their tasks in parallel. There is no way to hold all these tasks, timings, and people in the mind. A directed acyclic graph can be a good helper for a confused project manager to present the topological sorting of workers & their tasks. For instance, let us imagine a graph of people involved in an innovative and big project of a well-known IT company.
Description of the algorithm
Now it's time to suggest an algorithm for finding a topological order of a DAG. We will create an empty list L to store the topological sort and visit every node using the main procedure visit(n). The procedure is recursive and looks as below:
If the node is already visited, the procedure is done. Indeed, that means that its children have also been visited and added to the list (see below).
Otherwise, we repeat the procedure recursively for every child of our node.
If there are no recursive calls to be made, we mark our node as visited and add it at the beginning of the list. Note that this happens if the current node has no children, i.e., it is a leaf.
We repeat the procedure above until all the nodes have been visited and added to the list.
These steps can be easier to understand if we take a look at the pseudocode:
Initialize empty list L // it will contain the sorted nodes in the end
while exist unvisited nodes do:
select an unvisited node n
visit(n) // our main procedure
function visit(node n):
if n is visited then:
return // it means we reached an already visited node
for each node m, such that exists an edge (n, m):
visit(m) // recursively repeat procedure for children of node n
mark n as visited // if we reach a leaf, or all children have been visited
add n to the beginning of L // we are done with that node, add it to the sortDo you recognize the algorithm? It is the Depth-first search, or shortly DFS, which is already familiar to you. DFS is one of the methods of graph traversal. The strategy of searching in-depth, as the name implies, is to go deep into the graph as far as possible.
Consequently, the time complexity for finding the topological order of the DAG with DFS depends only on the complexity of the DFS algorithms. Hence, it is equal to , which means it is computed in a linear time depending on the number of vertices and edges.
Example: trees
Let's picture the order of team members to process their tasks
Red arrows illustrate the flow of the tasks which are delegated from chefs to their employees, and blue ones picture already processed tasks and results that are directed to the managers of the specific person.
It's time to illustrate the steps of the algorithm in this graph:
We start with Ben and call
visit(Ben).Ben is not visited and has one child node: Pam. We recursively apply
visit(Pam).Pam is also not visited and has two children nodes: Tom and Tim. We have to recursively apply
visit(Tom)andvisit(Tim). For the sake of understanding, let's proceed in detail only with Tim.Tim is unvisited and has two adjacent nodes: Carlos and Adam. Apply
visit(Carlos)andvisit(Adam).Carlos has no children, hence we can't keep calling recursively. It's time to mark Carlos as visited and add it to the list.
Similarly, with Adam. So far we have two visited nodes: Adam and Carlos, and the list with the topological sort is
[Adam, Carlos]so far. Let's move on.We are back to Tim since the recursive calls for its child nodes are done. We mark him as visited and add it to the list:
[Tim, Adam, Carlos].In the same way, we proceed with the left branch and get
[Tom, Sam, John, Jim, Stella, Jack, Tim, Adam, Carlos]. Try it!Finally, we are done with all the descendants of Pam: mark Pam as visited and add it to the list. Finally, the same thing is done with Ben. The final result, which is the topological sort of the graph, is as shown below: .
As you see, we have applied graph traversal to the graph. We can even straighten our graph as it is shown below for a better view:
If you were an attentive reader, you might have noticed an ambiguous phrase: this is the topological sort of our graph. In fact, this is a topological sort of our graph. As we mentioned in our previous topic on topological sort, a graph can have more than one topological sort. Indeed, here is another topological sort of the graph above: .
There are many more possible sorts, find them!
Another example: general DAG
The structure of the project above is pretty clear to understand. However, what are we going to do if nodes are more dependent on each other? In other words, a DAG doesn't necessarily have to be a tree. In this case, the first step of the procedure visit() will come into the game. Let's take a look at a simple example: the order of topics.
The graph above is interesting, as it has many orders of choosing nodes during the algorithm, however, the result will be the same. Let's convince ourselves that the order doesn't matter and that the algorithm works correctly. As always, we apply the procedure to each node. Let's start with node S:
This node has two children: D and J. Apply
visit(D)andvisit(J). We first proceed with the first one:D is unvisited and has two children: J and C. Apply
visit(J)andvisit(C). Let's go with J:J is unvisited and has one child: C. Apply
visit(C):C is unvisited and has no children: mark it as visited and add it to the list:
[C].We are back at J (from
visit(J)in 2). Mark J as visited, add to the list:[J, C].We have C again (from
visit(C)in 2). But C is visited, hence we do nothing. Same withvisit(J)in 1.Back to D: mark as visited, add to the list:
[D, J, C].Similarly, we do the same thing with S and get
[S, D, J, C].
Let's say that we go with visit(J) during the first step, and then with visit(D):
J is unvisited and has one child: C. Apply
visit(C):C is unvisited and has no children: mark it as visited and add it to the list:
[C].Back to J, mark it as visited, and add it to the list:
[J, C].Now we finally apply
visit(D), but its children are visited, hence we mark D as visited, and add it to the list:[D, J, C].Finally, we mark S as visited and add to the list:
[S, D, J, C].
So, we followed two different paths and ended up with the same topological sort. There are other ways to traverse this graph. However, in this specific case, there is only one topological sort of graph, hence, the result will be always the same.
Conclusion
In this topic, you looked at the DFS-based topological sorting algorithm, which is one of the most used and certain methods for searching the topological order. Here is a summary of what we have looked at:
DFS is a recursive algorithm for traversing the graph, and it is especially applicable for DAGs to find their topological sortings.
Depth-first search is performed with linear time, which depends on the number of vertices and edges of a directed acyclic graph.
Such a way of discovering topological sorting applies to lots of scheduling tasks, such as project management.
Now it is time to consolidate gathered knowledge with the help of the following tasks and ensuing topics!