Computer scienceFundamentalsSQL and DatabasesBasics SQLData Definition Language

Composite indexes

7 minutes read

You are already familiar with indexes. You know that they can speed up searching data. Till now we were using only indexes that cover only one column. In this topic, we will cover indexes that contain more than one column, so-called multicolumn indexes.

By mastering them you will be able to optimize more complex queries, containing more conditions or conditions on many columns. In other words, you will be better prepared for real-life problems.

Why (even many) single-column indexes are not always enough

For the purpose of explanation, let's assume that we have a table people with a list of people, their names, and their birth dates.

We have the following columns there:

  • id

  • first_name

  • last_name

  • year

  • month

  • day

You probably already guessed two common searches that we will discuss: searching by full name and by date. Let's consider the first one:

SELECT * 
FROM people 
WHERE 
    first_name = 'JOHN' 
    AND last_name = 'SMITH';

What index could we create to speed up the search? One can say that we can create an index on first_name and an index on last_name. Does it solve the problem? Well, not really. Suppose we are searching for John Smith. What database engine could do with those indexes? It can find all the Johns and all the Smiths really fast, but there are a lot of Johns not-Smith and a lot of not-John Smiths. As you see, even this quite easy task poses a bigger problem.

How can we solve it then? Wait till the next section...

Multi-column index

Multi column index schema

It is not a mystery that the solution is a multicolumn index. It is an index that contains two (or more) columns. In our example, it will be an index covering first_name and last_name. How we can create such an index? Same way as a single-column index, we need to put a list of columns separated by a comma in parenthesis.

In our example, it will be:

CREATE INDEX full_name 
ON people(last_name, first_name);

We have another example of the possible multicolumn index in our table searching by full date

SELECT * 
FROM people 
WHERE 
    year = 2000 
    AND month = 2 
    AND day = 6;

In this case, it would be great to have an index:

CREATE INDEX full_date 
ON people(year, month, day);

Does the order of columns change anything (Left-Prefix Rule)

You may ask if there is a difference between the (first_name, last_name) and (last_name, first_name) index. A classic printed phonebook is an excellent model to explain this aspect. Items in the phonebook are sorted by the last name, then the first name – it is like having an index on (last_name, first_name). Even without any database knowledge it is obvious that you can easily search all Smiths in the phone book but finding all Johns is hard and order doesn't help much.

The general rule (often called Left-Prefix Rule) is that a multicolumn index is useful when searching with conditions on columns from the beginning (left-side) of the index column list. It means that you cannot skip any column from the index.

We can break down this rule into simpler rules:

  1. an index can be used only from left to right

  2. no columns can be skipped

  3. an index can be used partially, not all columns have to be used

  4. the order of WHERE conditions is not important and does not need to match the order of columns on the index (there is no difference if you write WHERE A = 1 AND B = 2 or WHERE B = 2 AND A = 1

  5. the first condition that is different than equality (A = 1), like inequality ( A < 1 ) or range, which is effectively two inequalities combined) is usually making the next columns in the index unusable for that query

Going back to our example with searching by date, the full_date index is useful when searching by:

  • year only

  • year and month

  • year, month, and day

But it wouldn't help as much when searching by day only. It is because the day column is not a prefix of our multicolumn index. All three examples above are prefixes of our index.

Technically, a database engine can do a full scan of the index (reading only index content) which is faster than scanning the table for results. This will be covered more in optimization topics.

Conclusion

In this topic, we've covered multicolumn indexes – indexes that cover more than one column. They are extremely useful for queries with multicolumn conditions. The order of columns is very important. They work best when used for conditions covering columns from the beginning of the index column list (left prefix), therefore there is no point to create an index on the column A when there's already an index on columns (A, B).

55 learners liked this piece of theory. 2 didn't like it. What about you?
Report a typo