Computer scienceFundamentalsSQL and DatabasesBasics SQLFunctions and operations

Window functions

11 minutes read

Working with a database, you may need to analyze some data. Your employer might ask you to create a list of those who overperform in your company or to make a sales report with three-month rolling averages. You can use some external tools, such as Excel or Python. Or you can use the built-in SQL functionality. In this topic, we will learn, how to do these tasks with MySQL window functions.

Window function

To understand the window function concept, imagine that MySQL has completed all of the query steps, including joining, grouping, and sorting. So, the result set is ready to be returned. What if we tried to stop the query execution at this point, aggregate some data, and analyze it? Of course, you can do this using the GROUP BY construction. However, it means a lot of restrictions: you will get only the unique values of the required columns, so the result will be fewer rows than the original table. Window functions record the result in separate columns and don't change the number of rows. Also, it gives you some useful tools, like an opportunity to set the order of rows bypass.

Let's take a closer look at some simple examples. In the query below, you "open the window" through the OVER() statement and summarize the number of sold tickets.

SELECT
    film_name,
    film_genre,
    ticket_number,
    SUM(ticket_number) OVER() AS sum
FROM 
    film_sales;

film_name

film_genre

ticket_number

sum

The Lion King

cartoon

3

18

Shrek

cartoon

4

18

Back to the Future

fiction

2

18

Inception

fiction

6

18

Intouchables

drama

3

18

As you can see, it is the total amount in the sum column for the whole table. If you want to calculate tickets for each genre separately, you may use the PARTITION BY construction inside the OVER clause, like this:

SELECT
    film_name,
    film_genre,
    ticket_number,
    SUM(ticket_number) OVER(PARTITION BY film_genre) AS sum
FROM 
    film_sales;

It looks like MySQL server divided sampled data into separate "windows" and summarized each one.

film_name

film_genre

ticket_number

sum

The Lion King

cartoon

3

7

Shrek

cartoon

4

7

Back to the Future

fiction

2

8

Inception

fiction

6

8

Intouchables

drama

3

3

Apparently, the window functions can be compared with aggregate functions. The difference is that, after dividing into sets, window functions still process and return each row separately. In contrast, aggregate functions do not return all rows but only the results of groups.

From the inside

What kind of parameters can we clarify in the OVER() clause?

As MySQL documentation tells us:

OVER([partition_clause] [order_clause] [frame_clause])
  • partition_clause points to the column, by which values of the query rows are divided into groups. if this parameter is not specified, rows are considered as one big group.

  • order_clause indicates in what order the rows inside the partition should be placed while executing a window function. While it is unfortunate that exactly the same syntax is used for different purposes, be careful: if you want to have a certain order in the final result, you still need to use ORDER BY clause at the end of the query.

  • frame_clause restricts the window borders relative to the position of the current row. For instance, you can calculate the rolling sum, taking the current, previous, and next rows. You can read more here.

What can you do with window functions?

For a start, you can generate different rankings. By using row_number(), you can simply number the rows in each division. rank() assigns the same numbers to equal values with gaps, dense_rank() does that without gaps. This is easier to understand if you look at the example below, where we select the students with the best exam results and their numbers in the overall rating:

SELECT 
    name,
    exam_score,
    row_number() over w AS `row_number`,
    rank() over w AS `rank`,
    dense_rank() over w AS `dense_rank`
FROM
    report_journal
window
    w AS (ORDER BY exam_score DESC);

name

exam_score

row_number

rank

dense_rank

Mark

100

1

1

1

Emma

99

2

2

2

Harry

99

3

2

2

Maria

98

4

4

3

Paul

97

5

5

4

Also, you can aggregate data, using min, max, avg, and other functions, but instead of placing it in the GROUP BY clause, you can put it in the OVER() clause. A simple example was at the top of this topic. According to it, here is how you can calculate the rolling sum, using FRAME clause (pay attention, the semantic part goes right away, without any keyword):

SELECT
    film_name,
    film_genre,
    ticket_number,
    SUM(ticket_number) OVER(PARTITION BY film_genre ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) AS sum 
FROM
    film_sales;

So, we got bounded by the windows amounts of the row and the one following it. There is explanation in the last column.

film_name

film_genre

ticket_number

sum

The Lion King

cartoon

3

7

= 3+4

Shrek

cartoon

4

4

= 4

Back to the Future

fiction

2

8

= 2+6

Inception

fiction

6

15

= 6+9

Harry Potter

fiction

9

9

= 9

Moreover, by using window functions, you can make your queries shorter and easier to understand. For instance, you can replace many JOINs with window functions, or make a beautiful sorting in just a few lines of code.

Conclusion

Summing up, the window function concept is not the easiest tool in MySQL. Nevertheless, it lets you analyze the data without connecting external instruments. You can divide and arrange your data by some columns and then rank, count, sum, take the average, and so on. Unlike aggregate functions, window functions return each table row in the result of the query.

Thanks to working inside the MySQL server, the processing becomes faster. In addition, it allows you to save the number of code lines and write easy to read queries.

39 learners liked this piece of theory. 10 didn't like it. What about you?
Report a typo