From the previous topics, you are already familiar with the concept of topological sort. You also know that it has applications in many fields: package management, cycle detection, project management, education, etc. As with any problem, it has many approaches to it: for example, one of them is the DFS-based algorithm. In this topic, we will cover another approach, that solves this task no less efficiently – the Kahn's algorithm. So, let's not waste any more time and start discussing the algorithm in detail right away!
Algorithm
For starters, let's go through some definitions, that are important to remember for a better understanding of this topic. Obviously, the crucial thing is the concept of a topological sort. So, it is a way of ordering the vertices of the graph, so that the child node appears after the parent one.
Another one is the in-degree of the node – the number of the edges, that come into it. For example, in the following graph, the in-degree of the vertex is 3, as it is the destination of the 3 edges: , and .
Last but not least is the definition of the directed acyclic graph (DAG). The name says it all – it is a directed graph, that has no cycles in it.
It's high time we move on to the description of the algorithm. Let's discuss the general idea first. It is logical that the vertex, which does not include any edges, will be the first in the sorted graph. Which one will be next? Well, since we have already accounted for this first vertex, let's remove it and all its edges from the graph, and again find the one that does not include any edges. This process can be repeated, while there are still vertices left in the graph. Now, let's go over the steps of the algorithm one by one:
Initialize the counter of the visited vertices with 0;
Count the in-degree for all vertices of the graph;
Put all of the vertices with the in-degree equal to 0 in a queue;
Get one node out of the queue and add it to the list of the sorted vertices;
Increase the counter of the visited nodes by one;
Reduce the in-degree of the children of the node by one;
If the in-degree of the node equals 0 after the reduction, add it to the queue;
Repeat steps 4-7 until the queue is empty.
That's it! If after clearing the queue the counter still doesn't equal the total number of the vertices of the graph, there is no topological sort for this graph – it has a cycle.
Example
In order to make sure that everything is clear, let's sort the following graph with the use of the algorithm:
You can try to find a topological sort yourself first and then compare your answer with the one, given in this paragraph. However, mind that there might be several possible answers, so don't get scared if something doesn't coincide.
First of all, you perform the steps 1-3. You can see no edges coming into the vertex , so it has the in-degree 0. Therefore, you put it into the queue. The list of the sorted vertices is empty for now.
Then you extract the node from the queue, add it to the list of the sorted vertices, increase the counter, and go through its children. There are two of them: and . After you decrease their in-degree, the degree of the node equals 0, which means, that this vertex should be put into the queue. To simplify the process, you may mentally remove and the edges, going from it:
Now you perform all of the above-mentioned operations for the nodes and , and get the following result:
This is where the variation might happen: depending on the order, in which you put the nodes and into the queue, their order in the list of the sorted vertices might also differ. In other words, after node in the list, it is possible that either of them will be next in the list. Let's start with the vertex , for example:
Now there is only one node in the queue – . Its only child is , so, once again, you decrease its in-degree, put into the list of the sorted vertices and increase the counter. The in-degree of now equals 0, so you also need to put it into the queue. It is the last unvisited vertex, therefore after you perform steps 4-7 for it, the work of the algorithm is finished. The final topological sort that we got is . You might have also got the sort , depending on the order, in which you put the nodes and into the queue.
Pseudocode and complexity
Of course, it is crucial to understand the work of the algorithm, but it is no less important to be able to implement it. For example, the pseudocode for Kahn's algorithm may look like this:
class TopologicalSort is
counter = 0
sorted_vertices = []
queue = initQueue()
graph = initGraph()
in_degrees = countInDegree(graph)
method kahn() is
for (i = 0; i < in_degree.size(); ++i):
if (in_degree[i] == 0) then:
queue.enqueue(i)
while (!queue.isEmpty()):
node = queue.dequeue()
for (child in graph[node]):
in_degrees[child] -= 1
if (in_degrees[child] == 0) then:
queue.enqueue(child)
counter += 1
sorted_vertices.append(node)
method initQueue() is ...
method initGraph() is ...
method countInDegree(graph) is ... As for complexity, during the work of the algorithm, you add the vertices to the queue and extract them from it, and also move along the edges, when visiting the children of the nodes. During these operations, you go through all of the vertices and edges respectively, therefore the complexity equals , where is the number of vertices and – the number of the edges.
Conclusion
Phew, that was it! But before finishing the topic, let's refresh the most important points about Kahn's algorithm:
It is used to find the topological sort of the graph.
Topological sort is a way of ordering the vertices of the graph so that the child node appears after the parent one.
Several topological sorts may be found in a single graph.
In-degree of the node is the number of the edges, that come into it.
First of all, you find the in-degree for all vertices and put the nodes with the in-degree 0 into the queue. Then you extract a node from the queue, reduce the in-degree of its children and add the vertex to a list of the sorted ones. The process is repeated until the queue is empty.
If the counter of the visited nodes doesn't equal the total number of the nodes, the topological sort for a graph cannot be found – it has a cycle.
Its complexity is .