Computer scienceAlgorithms and Data StructuresAlgorithmsGraph algorithmsTopological sorting algorithms

Kahn's algorithm

8 minutes read

From the previous topics, you are already familiar with the concept of topological sort. You also know that it has applications in many fields: package management, cycle detection, project management, education, etc. As with any problem, it has many approaches to it: for example, one of them is the DFS-based algorithm. In this topic, we will cover another approach, that solves this task no less efficiently – the Kahn's algorithm. So, let's not waste any more time and start discussing the algorithm in detail right away!

Algorithm

For starters, let's go through some definitions, that are important to remember for a better understanding of this topic. Obviously, the crucial thing is the concept of a topological sort. So, it is a way of ordering the vertices of the graph, so that the child node appears after the parent one.

Another one is the in-degree of the node – the number of the edges, that come into it. For example, in the following graph, the in-degree of the vertex BB is 3, as it is the destination of the 3 edges: ABAB, CBCB and DBDB.

A graph with 5 edges and node B with the in-degree 3

Last but not least is the definition of the directed acyclic graph (DAG). The name says it all – it is a directed graph, that has no cycles in it.

It's high time we move on to the description of the algorithm. Let's discuss the general idea first. It is logical that the vertex, which does not include any edges, will be the first in the sorted graph. Which one will be next? Well, since we have already accounted for this first vertex, let's remove it and all its edges from the graph, and again find the one that does not include any edges. This process can be repeated, while there are still vertices left in the graph. Now, let's go over the steps of the algorithm one by one:

  1. Initialize the counter of the visited vertices with 0;

  2. Count the in-degree for all vertices of the graph;

  3. Put all of the vertices with the in-degree equal to 0 in a queue;

  4. Get one node out of the queue and add it to the list of the sorted vertices;

  5. Increase the counter of the visited nodes by one;

  6. Reduce the in-degree of the children of the node by one;

  7. If the in-degree of the node equals 0 after the reduction, add it to the queue;

  8. Repeat steps 4-7 until the queue is empty.

That's it! If after clearing the queue the counter still doesn't equal the total number of the vertices of the graph, there is no topological sort for this graph – it has a cycle.

Example

In order to make sure that everything is clear, let's sort the following graph with the use of the algorithm:

A graph with 6 vertices

You can try to find a topological sort yourself first and then compare your answer with the one, given in this paragraph. However, mind that there might be several possible answers, so don't get scared if something doesn't coincide.

First of all, you perform the steps 1-3. You can see no edges coming into the vertex AA, so it has the in-degree 0. Therefore, you put it into the queue. The list of the sorted vertices is empty for now.

A table with the in-degrees of the vertices, queue of the vertices and the sorted list in the beginning

Then you extract the node AA from the queue, add it to the list of the sorted vertices, increase the counter, and go through its children. There are two of them: BB and CC. After you decrease their in-degree, the degree of the node BB equals 0, which means, that this vertex should be put into the queue. To simplify the process, you may mentally remove AA and the edges, going from it:

A table with the in-degrees of the vertices, queue of the vertices and the sorted list after the first step A graph ater removing the vertex A

Now you perform all of the above-mentioned operations for the nodes BB and CC, and get the following result:

A table with the in-degrees of the vertices, queue of the vertices and the sorted list after the third step

A graph ater removing the vertices A, B and C

This is where the variation might happen: depending on the order, in which you put the nodes DD and EE into the queue, their order in the list of the sorted vertices might also differ. In other words, after node CC in the list, it is possible that either of them will be next in the list. Let's start with the vertex DD, for example:

A table with the in-degrees of the vertices, queue of the vertices and the sorted list after the fourth step

A graph after removing the vertices A, B, C, D and E

Now there is only one node in the queue – EE. Its only child is FF, so, once again, you decrease its in-degree, put EE into the list of the sorted vertices and increase the counter. The in-degree of FF now equals 0, so you also need to put it into the queue. It is the last unvisited vertex, therefore after you perform steps 4-7 for it, the work of the algorithm is finished. The final topological sort that we got is {A,B,C,D,E,F}\{A, B, C, D, E, F\}. You might have also got the sort {A,B,C,E,D,F}\{A, B, C, E, D, F\}, depending on the order, in which you put the nodes DD and EE into the queue.

Pseudocode and complexity

Of course, it is crucial to understand the work of the algorithm, but it is no less important to be able to implement it. For example, the pseudocode for Kahn's algorithm may look like this:

class TopologicalSort is 
  counter = 0 
  sorted_vertices = [] 
  queue = initQueue() 
  graph = initGraph() 
  in_degrees = countInDegree(graph) 

  method kahn() is 
    for (i = 0; i < in_degree.size(); ++i): 
      if (in_degree[i] == 0) then: 
        queue.enqueue(i) 

    while (!queue.isEmpty()): 
      node = queue.dequeue() 
      
      for (child in graph[node]): 
        in_degrees[child] -= 1 
        
        if (in_degrees[child] == 0) then: 
          queue.enqueue(child) 

      counter += 1 
      sorted_vertices.append(node) 

  method initQueue() is ... 

  method initGraph() is ... 

  method countInDegree(graph) is ... 

As for complexity, during the work of the algorithm, you add the vertices to the queue and extract them from it, and also move along the edges, when visiting the children of the nodes. During these operations, you go through all of the vertices and edges respectively, therefore the complexity equals O(V+E)O(V + E), where VV is the number of vertices and EE – the number of the edges.

Conclusion

Phew, that was it! But before finishing the topic, let's refresh the most important points about Kahn's algorithm:

  • It is used to find the topological sort of the graph.

  • Topological sort is a way of ordering the vertices of the graph so that the child node appears after the parent one.

  • Several topological sorts may be found in a single graph.

  • In-degree of the node is the number of the edges, that come into it.

  • First of all, you find the in-degree for all vertices and put the nodes with the in-degree 0 into the queue. Then you extract a node from the queue, reduce the in-degree of its children and add the vertex to a list of the sorted ones. The process is repeated until the queue is empty.

  • If the counter of the visited nodes doesn't equal the total number of the nodes, the topological sort for a graph cannot be found – it has a cycle.

  • Its complexity is O(V+E)O(V + E).

4 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo