Computer scienceSoftware development foundationsSQL and DatabasesBasics SQLRetrieving Data

The WITH statement

8 minutes read

It often happens that in SQL we come across complex, long queries, especially those with multiple subqueries. To reduce the size of the code, make it more readable, and make it easier to write queries and support them later, the WITH statement exists. It is also known as CTE, or Common Table Expressions, which is one of the types of queries in database management systems. The results of common table expressions can be temporarily stored in memory and accessed again.

CTE essence and syntax

A Common Table Expression (CTE) is a query you can define within another SQL query. CTE works similarly to the subquery, as it generates results containing data in a table. However, unlike in the subquery case, we can give this table a name and can refer to it multiple times within a query. CTE also can be self-referencing (recursive).

You can use a CTE inside the following types of queries:

SELECT
INSERT
UPDATE
DELETE

As it was mentioned, CTE has WITH statement in its syntax. Its structure is the following: it starts with the query name, then the name of the columns it returns in parenthesis (optional).

The keyword AS contains the query itself, also in parentheses.

WITH query_name (column_1, column_2, ...) AS (
    query
) 
SELECT * FROM query_name;

You should note that the number of columns after the query_name (in the case mentioned) must coincide with the number of columns in the query.

If you do not define column_names before the AS keyword, CTE will return the column list of the query that forms it.

CTE Example

CTE construction can be a bit confusing at first glance, but when we consider examples, it becomes much easier to comprehend.

Let's say you have a table sales_march with the sales representative's results and you want to analyze the volume of sales for each person, compared to the average sales volume within a department. The following table is the starting point:

id	first_name	last_name	region	department	sales_volume
1	Emily	Wilson	South	X25	128000
2	Chris	Parker	West	B23	157000
3	Kate	Preston	West	X25	203500
4	John	Spacy	North	M9	134000
5	Kevin	Douglas	North	B23	237800
6	Martin	Smith	East	M9	109000
7	Julia	Gutenberg	South	M9	218300
8	Ingrid	Karp	South	X25	117400

Our CTE will be the following:

WITH avg_department AS (
    SELECT 
        department, 
        AVG(sales_volume) AS average_sales_for_department
    FROM sales_march
    GROUP BY department
)
SELECT 
    s.id, 
    s.first_name, 
    s.last_name,
    s.department, 
    s.sales_volume, 
    ROUND(ad.average_sales_for_department) AS average_sales_for_department
FROM sales_march s
JOIN avg_department ad
    ON s.department = ad.department;

First, we use a CTE to create a table with average sales for each department, and afterward, we join it with the sales_march table to visualize the average_sales for the department with sales_volumes of each representative.

We will get the following table as a result:

id	first_name	last_name	department	sales_volume	average_sales_for_department
1	Emily	Wilson	X25	128000	150000
2	Chris	Parker	B23	157000	197400
3	Kate	Preston	X25	203500	150000
4	John	Spacy	M9	134000	154000
5	Kevin	Douglas	B23	237800	197400
6	Martin	Smith	M9	109000	154000
7	Julia	Gutenberg	M9	218300	154000
8	Ingrid	Karp	X25	117400	150000

Multiple CTE

We can have several CTEs in our query. Let's get back to our previous example:

What if we want to have a table where the difference between sales_volume and average_sales_for_department is calculated for every sales representative?

We can make the following CTE:

first, we form a CTE avg_department, where we calculate the average sales volume for each department
then we form CTE sales_diff_from_avg_by_dep, where we join CTE avg_department with sales_march table and calculate the difference between sales_volume and average_sales_for_department

WITH avg_department AS (
    SELECT department, AVG(sales_volume) AS average_sales_for_department
    FROM sales_march
    GROUP BY department
),
sales_diff_from_avg_by_dep AS (
    SELECT 
        s.id, s.first_name, s.last_name, s.department, 
        (s.sales_volume - ROUND(ad.average_sales_for_department)) AS diff_from_avg_by_dep
    FROM sales_march s
    JOIN avg_department ad
        ON s.department = ad.department
)
SELECT *
FROM sales_diff_from_avg_by_dep;

This example also demonstrates a nested CTE: the CTE sales_diff_from_avg_by_dep uses the results of the avg_department CTE. The resulting column is named diff_from_avg_by_dep, because that is the alias given in the second CTE.

The resulting table looks as the following:

id	first_name	last_name	department	diff_from_avg_by_dep
1	Emily	Wilson	X25	-22000
2	Chris	Parker	B23	-40400
3	Kate	Preston	X25	53500
4	John	Spacy	M9	-20000
5	Kevin	Douglas	B23	40400
6	Martin	Smith	M9	-45000
7	Julia	Gutenberg	M9	64300
8	Ingrid	Karp	X25	-32600

Now it is clear to the manager who might deserve a monthly bonus and who may need to reconsider their sales efforts.

As for us, we now understand how CTEs can be used to answer different types of queries.

Conclusion

Now that we've studied how CTE works, let's sum up its advantages:

queries work faster since you don't have to store unnecessary entities in memory for a long time
it's easier to understand, as a complex query is divided into several 'blocks': data acquisition becomes consistent and understandable
it facilitates support, as it is easier to maintain, fix and modify
CTEs are used in many popular systems, they are just called by different names. If you become familiar with table expressions, for example in PostgreSQL, then in the future it will be easier for you to become familiar with similar structures in Oracle or other DBMS.

CTE appears to be quite a useful tool in writing SQL queries, so it is high time to start practicing them!

37 learners liked this piece of theory. 1 didn't like it. What about you?

Report a typo

The WITH statement

CTE essence and syntax

CTE Example

Multiple CTE

Conclusion

Related topics