Computer scienceFundamentalsSQL and DatabasesBasics SQLRetrieving Data

The WITH statement

9 minutes read

It often happens that in SQL we come across complex, long queries, especially those with multiple subqueries. To reduce the size of the code, make it more readable, and make it easier to write queries and support them later, the WITH statement exists. It is also known as CTE, or Common Table Expressions, which is one of the types of queries in database management systems. The results of common table expressions can be temporarily stored in memory and accessed again.

CTE essence and syntax

A Common Table Expression (CTE) is a query you can define within another SQL query. CTE works similarly to the subquery, as it generates results containing data in a table. However, unlike in the subquery case, we can give this table a name and can refer to it multiple times within a query. CTE also can be self-referencing (recursive).

You can use a CTE inside the following types of queries:

  • SELECT

  • INSERT

  • UPDATE

  • DELETE

As it was mentioned, CTE has WITH statement in its syntax. Its structure is the following: it starts with the query name, then the name of the columns it returns in parenthesis (optional).

The keyword AS contains the query itself, also in parentheses.

WITH query_name (column_1, column_2, ...) AS (
    query
) 
SELECT * FROM query_name;

You should note that the number of columns after the query_name (in the case mentioned) must coincide with the number of columns in the query.

If you do not define column_names before the AS keyword, CTE will return the column list of the query that forms it.

CTE Example

CTE construction can be a bit confusing at first glance, but when we consider examples, it becomes much easier to comprehend.

Let's say you have a table sales_march with the sales representative's results and you want to analyze the volume of sales for each person, compared to the average sales volume within a department. The following table is the starting point:

id

first_name

last_name

region

department

sales_volume

1

Emily

Wilson

South

X25

128000

2

Chris

Parker

West

B23

157000

3

Kate

Preston

West

X25

203500

4

John

Spacy

North

M9

134000

5

Kevin

Douglas

North

B23

237800

6

Martin

Smith

East

M9

109000

7

Julia

Gutenberg

South

M9

218300

8

Ingrid

Karp

South

X25

117400

Our CTE will be the following:

WITH avg_department AS (
    SELECT 
        department, 
        AVG(sales_volume) AS average_sales_for_department
    FROM sales_march
    GROUP BY department
)
SELECT 
    s.id, 
    s.first_name, 
    s.last_name,
    s.department, 
    s.sales_volume, 
    ROUND(ad.average_sales_for_department) AS average_sales_for_department
FROM sales_march s
JOIN avg_department ad
    ON s.department = ad.department;

First, we use a CTE to create a table with average sales for each department, and afterward, we join it with the sales_march table to visualize the average_sales for the department with sales_volumes of each representative.

We will get the following table as a result:

id

first_name

last_name

department

sales_volume

average_sales_for_department

1

Emily

Wilson

X25

128000

150000

2

Chris

Parker

B23

157000

197400

3

Kate

Preston

X25

203500

150000

4

John

Spacy

M9

134000

154000

5

Kevin

Douglas

B23

237800

197400

6

Martin

Smith

M9

109000

154000

7

Julia

Gutenberg

M9

218300

154000

8

Ingrid

Karp

X25

117400

150000

Multiple CTE

We can have several CTEs in our query. Let's get back to our previous example:

What if we want to have a table where the difference between sales_volume and average_sales_for_department is calculated for every sales representative?

We can make the following CTE:

  • first, we form a CTE avg_department, where we calculate the average sales volume for each department

  • then we form CTE sales_diff_from_avg_by_dep, where we join CTE avg_department with sales_march table and calculate the difference between sales_volume and average_sales_for_department

WITH avg_department AS (
    SELECT department, AVG(sales_volume) AS average_sales_for_department
    FROM sales_march
    GROUP BY department
),
sales_diff_from_avg_by_dep AS (
    SELECT 
        s.id, s.first_name, s.last_name, s.department, 
        (s.sales_volume - ROUND(ad.average_sales_for_department)) AS diff_from_avg_by_dep
    FROM sales_march s
    JOIN avg_department ad
        ON s.department = ad.department
)
SELECT *
FROM sales_diff_from_avg_by_dep;

This example also demonstrates a nested CTE: the CTE sales_diff_from_avg_by_dep uses the results of the avg_department CTE. The resulting column is named diff_from_avg_by_dep, because that is the alias given in the second CTE.

The resulting table looks as the following:

id

first_name

last_name

department

diff_from_avg_by_dep

1

Emily

Wilson

X25

-22000

2

Chris

Parker

B23

-40400

3

Kate

Preston

X25

53500

4

John

Spacy

M9

-20000

5

Kevin

Douglas

B23

40400

6

Martin

Smith

M9

-45000

7

Julia

Gutenberg

M9

64300

8

Ingrid

Karp

X25

-32600

Now it is clear to the manager who might deserve a monthly bonus and who may need to reconsider their sales efforts.

As for us, we now understand how CTEs can be used to answer different types of queries.

Conclusion

Now that we've studied how CTE works, let's sum up its advantages:

  • queries work faster since you don't have to store unnecessary entities in memory for a long time

  • it's easier to understand, as a complex query is divided into several 'blocks': data acquisition becomes consistent and understandable

  • it facilitates support, as it is easier to maintain, fix and modify

  • CTEs are used in many popular systems, they are just called by different names. If you become familiar with table expressions, for example in PostgreSQL, then in the future it will be easier for you to become familiar with similar structures in Oracle or other DBMS.

CTE appears to be quite a useful tool in writing SQL queries, so it is high time to start practicing them!

37 learners liked this piece of theory. 1 didn't like it. What about you?
Report a typo