Computer scienceAlgorithms and Data StructuresData structuresGraphsGraphs: basics

Representation of graphs

10 minutes read

We demonstrated previously that graphs provide a convenient visual approach to represent and solve a variety of problems. Yet some problems, particularly large ones, can be very difficult to solve manually. Fortunately, your computer can help if you represent your problem as a graph in the computer memory and implement a specific algorithm to solve it. But how can graphs be presented in computer memory?

Assume we need to write a program that solves a certain graph theory problem, and we have a graph like this:

Obviously, we can't use this beautiful picture of letters, circles, and links as input for our program. Hence, we need to think of more practical methods to do so.

Basic operations on graphs

Before considering specific techniques to represent graphs, it is essential to understand the operations we plan to perform on them. Here are some of the most common ones:

adjacent(G, u, v) — Check whether there is an edge between two nodes $u$ and $v$ ;
neighbors(G, v) — List all the nodes adjacent to the given node $v$ ;
add_node(G, v) — Add a new node labeled as $v$ ;
remove_node(G, v) — Remove the existing node $v$ from the graph;
add_edge(G, u, v) — Add a new edge $(u,v)$ between the existing nodes $u$ and $v$ ;
remove_edge(G, u, v) — Remove the existing edge $(u,v)$ from the graph.

Adjacency matrix

The first way of representation is known as the adjacency matrix. This is a matrix where each entry indicates whether a particular edge exists in the graph. The value $1$ in a cell shows that the edge exists, while $0$ means there's no such edge. Here is the adjacency matrix for the graph above:

Adjacency matrix

For undirected graphs, the adjacency matrix is always symmetric, meaning if node $i$ is connected to node $j$ , then node $j$ is also connected to node $i$ ( $A[i,j]=A[j,i]$ ). In the standard convention, self-loops are not allowed, and as a result, the diagonal elements of the adjacency matrix ( $A[i,i]$ ) are all zeros. However, it's worth noting that the general case allows for variations, and it is possible to have undirected graphs with self-loops, leading to non-zero diagonal elements in the adjacency matrix.

What are the advantages of using this way of representation? First of all, we can easily implement it using a two-dimensional array. Moreover, when we have an adjacency matrix, we can establish whether two nodes are connected by an edge just by looking at the appropriate slot in the matrix. In other words, this matrix allows us to check whether two nodes are adjacent in constant time $O(1)$ . Just as little time is needed to add or remove an edge. We can also list all neighbors of a node by scanning the corresponding row or column in $O(n)$ , where $n$ is the number of nodes in the graph. Therefore, such a representation is especially well-suited for graphs where the number of edges is close to the maximum, also known as dense graphs.

Adjacency lists

What if we are out of memory? In practice, we sometimes need to represent huge graphs with thousands or millions of nodes, and $O(n^2)$ memory is quite a lot. Another common way to represent a graph in the computer memory is with adjacency lists. This is an array of $n$ lists, where each list stores the neighbors of the corresponding node. Here are the adjacency lists for the graph above:

Adjacency lists

For undirected graphs, every edge is stored twice (like 'a'→'b' and 'b→'a' in the top example). For directed graphs, edges are stored only once; if the direction in our initial graph was from 'b' to 'a', then we would not have 'a'→'b' in the first list. Instead, the first list would look like this: 'a'→'c'.

For a graph with $n$ nodes and $m$ edges, its representation in the form of an adjacency list needs $O(n+m)$ computer memory. It is more effective than the adjacency matrix for sparse graphs that have a lot fewer edges than the number of nodes squared. This is not the only advantage of the adjacency lists: you can add nodes or edges in $O(1)$ , since you only need to add an additional list for a new node, or a new element on the corresponding list for a new edge. In addition, removing a node is possible in $O(m)$ , as it is enough to scan the edges and remove those which are incident to the given node.

It is also worth noting that this representation has a drawback that can be serious for some types of problems. We cannot check whether one node is connected to another without passing through the elements of the adjacency list. This is a bit more difficult than simply using the adjacency matrix, namely it requires

O(n)

time. Removing an edge is also slower, compared to the adjacency matrix. It takes

O(n)

, because we need to scan all the nodes.

Other representations

Nevertheless, adjacency matrices and lists are not the only existing ways to represent graphs. The simplest one would be a list of all edges in a graph: 'a'→'b', 'a'→'c', 'b'→'c', and so on. It is the easiest method to implement; however, the time taken by every operation is considerably large.

Similarly to an adjacency matrix, you can define an incidence matrix as shown in the picture below.

Incidence matrix

Additionally, a degree matrix is a matrix of dimension $n\times n$ . Its diagonal entries are equal to the degree of the corresponding node, while the other entries are zeros. It is worth noting that these representations are useful only in some very particular situations; generally, they are remarkably ineffective. For example, when using an incidence matrix, almost all the previously mentioned operations can be executed in $O(mn)$ .

Some problems might also need a more specific graph representation for efficiency, depending on the situation.

Summary

In this topic, we've discussed the most common ways of representing graphs in the computer memory:

Adjacency matrix allows checking whether two nodes are adjacent in constant time $O(1)$ . This form of representation needs $O(n^2)$ memory, making it best-suited for working with dense graphs.
Adjacency lists are more efficient in the case of sparse graphs, as they need $O(n+m)$ computer memory. However, this representation has a drawback: we cannot check whether two nodes are adjacent without passing through the elements of an adjacency list.
An array of all edges is one of the easiest, yet the most inefficient ways.
Incidence matrix and degree matrix are analogous to the adjacency matrix, but less common in practice.

Complexities of adjacency matrix and lists

With the help of the table above, it should be easy as pie to choose the optimal representation of a graph based on the operations you plan to perform on it.

147 learners liked this piece of theory. 5 didn't like it. What about you?

Report a typo

Representation of graphs

Basic operations on graphs

Adjacency matrix

Adjacency lists

Other representations

Summary

Related topics