Computer scienceAlgorithms and Data StructuresData structuresGraphsGraph types

Directed acyclic graph

8 minutes read

Have you ever thought about the work process in large IT companies? There are a lot of different teams that deal with diverse parts of the same application, and in the end all parts are combined into one system. What happens if one of the programmers makes a mistake? Does this mean that the entire application will stop working? If so, then none of the systems in the whole IT world would work properly. To avoid this, version control systems are organized in the form of a directed acyclic graph. A version control system can be represented as a graph, in which there are no directed cycles, but there may be parallel paths leaving one node and coming to the final node in different ways. There are some branches of development, for instance, bug fixing or adding new features. At any time, it is possible to create a new branch or to merge the current one with others.

Directed acyclic graph

Formal definition

A Directed Acyclic Graph (DAG) has a name that speaks for itself: it is a directed graph that contains no cycles. Despite this acyclic property, a DAG may contain parallel paths. Parallel paths are paths starting and finishing at the same nodes without forming a cycle. Thus, in directed acyclic graphs there are no directed edges that begin and end at the same vertex, but parallel paths can exist. Do you see the difference?

The images below show a few examples of DAGs, some of which contain parallel paths for you to check out:

A few examples of DAGs

Here, blue and light blue arrows represent pairs of parallel paths. For example, in the first picture, paths 242 - 4 and 23542 - 3 - 5 - 4 start in the node 22 and finish in the node 44, which means they are parallel paths. The same holds for 5785 - 7 - 8 and 585 - 8.

In the second picture, there aren't any parallel paths at all. However, there are parallel paths in the last picture as well. Try to find them!

Basic concepts

So far, it all seems so easy, doesn't it? Let's throw in some more definitions, essential for understanding this topic:

  • We call a node reachable if there is a directed edge that leads to it.
  • We also can order nodes by their reachability. It is called partial order. If there is a directed edge from node uu to node vv, then you can compare these nodes: uvu \leq v.
  • Topological sorting (or order) is an ordering of graph nodes, where for each directed edge from node uu to node vv, uu stands before vv. Topological sorting makes a lot of sense in DAG and graph theory in general, so a few following topics get back to it. Topological sorting can be processed in linear time with O(n)O(n) complexity.

Tree or graph?

In order not to get confused, you need to understand how a DAG differs from an ordinary graph and an ordinary tree. Let's look at the illustrations of this family:

Tree, graph and DAG

As you can see on family trees, a tree always has only one entrance to each node. In general, it makes sense to say so only about directed graphs, which have a direction indicated on each edge. Undirected graphs work differently.

In a directed graph, every node has one or more predecessor nodes and successor nodes. A graph is traversed by using the depth-first search and breadth-first search algorithms. A tree is a special kind of graph that constitutes a hierarchical model. It can be undirected or directed, but always acyclic, i.e. similarly to DAGs, it doesn't contain cycles.

It is essential to be able to judge when it is better to use a tree, a graph, or a DAG. For instance, when you need each vertex to depend strictly on one input to the node, then you can use the simplest tree.

Applications

Strangely, a wide variety of fields make use of directed acyclic graphs. Let's see how these graphs are used in some areas:

  • Scheduling and management. DAGs are used in project management to plan, design, and implement complex projects or tasks.
  • Data compression. In this sphere, you can use directed acyclic graphs as a compact representation of a collection of sequences. When many of the sequences share the same subsequences, you can represent these shared subsequences by a shared part of the DAG. This allows the representation to use less space than it would take to list all of them separately.
  • DAGs are also useful in genealogy and version history, data processing networks, machine learning, and other fields.

Conclusion

This section sums up the main points introduced in this topic.

  • Directed acyclic graphs are a special type of directed graphs that contain no cycles.
  • The use of directed acyclic graphs is indispensable in modern information systems, because they are convenient for all applications where data structures are non-linearly dependent on each other.
  • DAGs are known for reachability and topological sorting.
  • It is important to know when to use a DAG or a tree. However, an oriented tree itself is a DAG, so you can use DAGs for a wider class of problems that require connections without cycles but with directions and parallel paths.
34 learners liked this piece of theory. 1 didn't like it. What about you?
Report a typo