Have you ever thought about the work process in large IT companies? There are a lot of different teams that deal with diverse parts of the same application, and in the end all parts are combined into one system. What happens if one of the programmers makes a mistake? Does this mean that the entire application will stop working? If so, then none of the systems in the whole IT world would work properly. To avoid this, version control systems are organized in the form of a directed acyclic graph. A version control system can be represented as a graph, in which there are no directed cycles, but there may be parallel paths leaving one node and coming to the final node in different ways. There are some branches of development, for instance, bug fixing or adding new features. At any time, it is possible to create a new branch or to merge the current one with others.
Formal definition
A Directed Acyclic Graph (DAG) has a name that speaks for itself: it is a directed graph that contains no cycles. Despite this acyclic property, a DAG may contain parallel paths. Parallel paths are paths starting and finishing at the same nodes without forming a cycle. Thus, in directed acyclic graphs there are no directed edges that begin and end at the same vertex, but parallel paths can exist. Do you see the difference?
The images below show a few examples of DAGs, some of which contain parallel paths for you to check out:
Here, blue and light blue arrows represent pairs of parallel paths. For example, in the first picture, paths and start in the node and finish in the node , which means they are parallel paths. The same holds for and .
In the second picture, there aren't any parallel paths at all. However, there are parallel paths in the last picture as well. Try to find them!
Basic concepts
So far, it all seems so easy, doesn't it? Let's throw in some more definitions, essential for understanding this topic:
- We call a node reachable if there is a directed edge that leads to it.
- We also can order nodes by their reachability. It is called partial order. If there is a directed edge from node to node , then you can compare these nodes: .
- Topological sorting (or order) is an ordering of graph nodes, where for each directed edge from node to node , stands before . Topological sorting makes a lot of sense in DAG and graph theory in general, so a few following topics get back to it. Topological sorting can be processed in linear time with complexity.
Tree or graph?
In order not to get confused, you need to understand how a DAG differs from an ordinary graph and an ordinary tree. Let's look at the illustrations of this family:
As you can see on family trees, a tree always has only one entrance to each node. In general, it makes sense to say so only about directed graphs, which have a direction indicated on each edge. Undirected graphs work differently.
In a directed graph, every node has one or more predecessor nodes and successor nodes. A graph is traversed by using the depth-first search and breadth-first search algorithms. A tree is a special kind of graph that constitutes a hierarchical model. It can be undirected or directed, but always acyclic, i.e. similarly to DAGs, it doesn't contain cycles.
It is essential to be able to judge when it is better to use a tree, a graph, or a DAG. For instance, when you need each vertex to depend strictly on one input to the node, then you can use the simplest tree.
Applications
Strangely, a wide variety of fields make use of directed acyclic graphs. Let's see how these graphs are used in some areas:
- Scheduling and management. DAGs are used in project management to plan, design, and implement complex projects or tasks.
- Data compression. In this sphere, you can use directed acyclic graphs as a compact representation of a collection of sequences. When many of the sequences share the same subsequences, you can represent these shared subsequences by a shared part of the DAG. This allows the representation to use less space than it would take to list all of them separately.
- DAGs are also useful in genealogy and version history, data processing networks, machine learning, and other fields.
Conclusion
This section sums up the main points introduced in this topic.
- Directed acyclic graphs are a special type of directed graphs that contain no cycles.
- The use of directed acyclic graphs is indispensable in modern information systems, because they are convenient for all applications where data structures are non-linearly dependent on each other.
- DAGs are known for reachability and topological sorting.
- It is important to know when to use a DAG or a tree. However, an oriented tree itself is a DAG, so you can use DAGs for a wider class of problems that require connections without cycles but with directions and parallel paths.