12 minutes read

As a database grows, the time it takes to execute queries increases. The majority of modern databases provide special structures called indexes to deal with this performance issue. Indexes organize data in a special way that significantly increases the speed of READ queries.

In this topic, you will learn about the most popular types of indexes and the situations when they should be implemented.

A real-life analogy

Before proceeding with database indexes, let's consider a real-life analogy — a library with physical books. Usually, there are thousands of them in one library. They can be stored in different rooms and sometimes even in different buildings. It would be inconvenient for the librarian to go through all the books manually upon every book request.

That's why for every book in the library there is a card that stores the main information about this book: the name, the author, the year of publication, and so on. These cards take less space to store and can be easily arranged and ordered. One book can even have several identical cards if we need to search books, for example, by name and by the year of publication that may vary.

Old data store

Now it is time to get back to database indexes, which are quite similar to the structure above.

Database indexes

By default, databases either store entries of a dataset in an unordered way or order them according to some internal rule depending on an internal attribute. It would take too long to find one or more entries that match the given criteria if the amount of data is really huge. A database index represents a data structure that is formed from the values of one or more attributes and points to the corresponding entries of the data. It boosts the data-finding process but requires some extra space to store this additional structure.

It is important to note that indexes exist for both relational and NoSQL databases and work in the same way. When we mention a database entry, we imply a row for relational databases, and documents or something else for NoSQL databases.

The classification of indexes

Both relational and NoSQL databases provide several types of indexes with different properties and purposes. First of all, let's take a look at the classification of indexes based on their properties.

  • Simple and Compound

The simple index is an index created for a single attribute of a data entry to speed up the search of entries that match a condition based on this attribute (e.g. birth_year = 1985), whereas a composite index (which is also known as compound) is built on top of two or more attributes of entries. It is also possible to use several simple and/or composite indexes together.

  • Clustered and non-clustered

The clustered index is an index that physically reorganizes the order in which the data entries are stored and gives the maximum speedup. Whereas a non-clustered index doesn't change the physical structure of a dataset, but only organizes references to the corresponding entries. Usually, one dataset can have only one clustered index that defines the order of data in the set, but the number of non-clustered indexes is not limited because the order is stored in a separate data structure that refers to the initial dataset.

  • The unique index is used to ensure data integrity by prohibiting duplicate values in one or more attributes of data entries.

  • The partial index is created on a specific subset of data based on a given criterion.

Over time, you will get to know all these properties of indexes in practice with different databases.

Apart from the classification presented above, indexes can also be grouped by type and backed data structures. According to this classification, there is a wide variety of indexes, but the most commonly used are the B-Tree, Hash, Bitmap, Inverted, and Spatial indexes.

All the types of database indexes are designed to request data faster, but they require some extra space on disk and make data insertions and updates slower because the information about new data changes should be saved in the index as well.

B-Tree Indexes

B-Tree is the most common type of index. When developers say "index", they often mean this type by default because it is supported by all popular databases and can be used in many different scenarios. As the name suggests, this index is based on a balanced tree that maintains data sorted according to a specified condition.

The B-Tree index is a family of indexes based on balanced trees. Depending on the particular database use, there can be a B+ Tree (which is an advanced form of B-Tree) or a Binary Tree (which is the simplest form of B-Tree), or something else. This choice primarily affects the performance of the database, but the way how we use the index always remains the same.

Diving deeper in how B-Tree works is usually not a part of courses on databases, but we will give you at least the basic idea. Below is an example of a B-Tree that stores years of different historical events as numbers.

B-tree diagram

The top node is the root, and those below it are either child nodes or leaf nodes. We always start searching for our row from the root node. We compare if the value we’re searching for is less than or greater than the value in the node at hand. The result of the comparison tells us which way to go, left or right, depending on the result of our comparison. In the example above, all values lower than 1800 take us to the left, while values greater than 1800 take us to the right, and so on.

This structure allows databases to speed up queries with equality (=), comparison (<, >, <=, >=) and even sorting operators.

But what happens if a new record is added? It has to be inserted somewhere in the tree, and the structure of this binary tree might change significantly to preserve the balance. In this case creation of a new record in the database will take more time compared to the time required for this operation without indexes.

Moreover, B-Tree indexes are not limited to working with numbers only. They can be used with dates, times, strings, and other ordered data types. However, there is an important limitation when working with strings and B-Tree indexes: the equality operator works only from the leftmost part of a string (e.g. "c...", "ca..", "cat...") in alphabetical order. So, if you search for all strings that match the pattern like "..cat..", your index won't work. In this case, you should use an inverted Index.

Hash Index

As you might have guessed, the hash index is based on a hash function. This function converts any database value to a numeric value called hash code. Once you have a query that uses comparison by this value, the database doesn't have to scan the entire dataset to find the match. Instead, it will calculate the hash of the value used in the query condition and directly access that place where data with the same hash is stored.

In the example below, the search key "Julia" is taken through the hash function and is associated with a bucket that refers to the original data entries.

Hash index structure

Hash indexes can potentially outperform B-Tree indexes when you need to find data by a specific key. However, they are way less popular in practice because they can be used only with equality operators and don't work with comparisons (<, >, <=, >=) and sortings. Moreover, not all databases support hash indexes.

When compared with B-Tree indexes, the hash index has the following important points to note:

  • It is usually smaller than a corresponding B-Tree index;

  • Its select and insert performance can be better than a B-Tree index;

  • However, it has many restrictions that limit its use to a few very specific use cases.

Other types of indexes

There are also a few types of indexes such as the bitmap, inverted, and spatial that are used in particular scenarios. We won't consider them in detail, just mention the typical situations where they can be used.

The bitmap index is a special type of index that is used when an attribute has a small number of distinct values compared to the total number of entries with this attribute. This situation is called low cardinality. There are just two possible boolean values false and true. Bitmap indexes are fast to create and don't require a lot of space if we compare them with B-Tree indexes. Some databases may create a bitmap index automatically for every query when it is needed.

The inverted index is another type of index that is primarily used to find data in textual attributes or arrays. The typical scenario when you need to use this index is a full-text search. The key idea of an inverted index is to keep a special data structure that connects terms (words or values), which are used in search, to entries when the terms occur. The example below gives you an example of the index:

"green" -> [1, 2, 3]
"red" -> [2, 3, 5]
"summer" -> [1, 2, 6]

So, if your query needs to find all entries with the term "green", the database just returns [1, 2, 3]. It can also easily return a result when you need all entries that contain both "green" and "red" by intersecting the two lists and getting the result [2, 3].

The spatial index is an index used to speed up work with geometrical and geographical data (geospatial). Spatial indexes are usually used in navigation and geographic information databases. These indexes can quickly respond to queries such as "Find all museums within 5 km of a specific location". In practice, there are different data structures that can be used to organize spatial indexes; one of them is R-Tree.

We've reviewed five types of indexes. Of course, it is not a complete list, but it is enough knowledge to work with databases. If you would like to know more, read about the specific indexes supported in the database management system of your choice.

Conclusion

Indexes are structures for speeding up READ queries. They are various types, such as B-Tree, Hash, Bitmap, Inverted, and Spatial, as well as different properties (Simple, Composite, Unique, Partial, Clustered, Not-Clustered) that define the performance and purpose of the indexes. Do not forget that indexes require additional space and increase the time of queries when modifying data: for instance, insertions and updates.

Although we have overviewed the most important properties and types of indexes, it is impossible to cover all of them in one topic. Mind that the indexes of the same type work slightly differently depending on the particular database and may have different best practices to work with them. You will have a chance to continue learning indexes in the topics related to particular databases.

77 learners liked this piece of theory. 3 didn't like it. What about you?
Report a typo