It often happens that in SQL we come across complex, long queries, especially those with multiple subqueries. To reduce the size of the code, make it more readable, and make it easier to write queries and support them later, the WITH statement exists. It is also known as CTE, or Common Table Expressions, which is one of the types of queries in database management systems. The results of common table expressions can be temporarily stored in memory and accessed again.
CTE essence and syntax
A Common Table Expression (CTE) is a query you can define within another SQL query. CTE works similarly to the subquery, as it generates results containing data in a table. However, unlike in the subquery case, we can give this table a name and can refer to it multiple times within a query. CTE also can be self-referencing (recursive).
You can use a CTE inside the following types of queries:
SELECTINSERTUPDATEDELETE
As it was mentioned, CTE has WITH statement in its syntax. Its structure is the following: it starts with the query name, then the name of the columns it returns in parenthesis (optional).
The keyword AS contains the query itself, also in parentheses.
WITH query_name (column_1, column_2, ...) AS (
query
)
SELECT * FROM query_name;You should note that the number of columns after the query_name (in the case mentioned) must coincide with the number of columns in the query.
If you do not define column_names before the AS keyword, CTE will return the column list of the query that forms it.
CTE Example
CTE construction can be a bit confusing at first glance, but when we consider examples, it becomes much easier to comprehend.
Let's say you have a table sales_march with the sales representative's results and you want to analyze the volume of sales for each person, compared to the average sales volume within a department. The following table is the starting point:
id | first_name | last_name | region | department | sales_volume |
1 | Emily | Wilson | South | X25 | 128000 |
2 | Chris | Parker | West | B23 | 157000 |
3 | Kate | Preston | West | X25 | 203500 |
4 | John | Spacy | North | M9 | 134000 |
5 | Kevin | Douglas | North | B23 | 237800 |
6 | Martin | Smith | East | M9 | 109000 |
7 | Julia | Gutenberg | South | M9 | 218300 |
8 | Ingrid | Karp | South | X25 | 117400 |
Our CTE will be the following:
WITH avg_department AS (
SELECT
department,
AVG(sales_volume) AS average_sales_for_department
FROM sales_march
GROUP BY department
)
SELECT
s.id,
s.first_name,
s.last_name,
s.department,
s.sales_volume,
ROUND(ad.average_sales_for_department) AS average_sales_for_department
FROM sales_march s
JOIN avg_department ad
ON s.department = ad.department;First, we use a CTE to create a table with average sales for each department, and afterward, we join it with the sales_march table to visualize the average_sales for the department with sales_volumes of each representative.
We will get the following table as a result:
id | first_name | last_name | department | sales_volume | average_sales_for_department |
1 | Emily | Wilson | X25 | 128000 | 150000 |
2 | Chris | Parker | B23 | 157000 | 197400 |
3 | Kate | Preston | X25 | 203500 | 150000 |
4 | John | Spacy | M9 | 134000 | 154000 |
5 | Kevin | Douglas | B23 | 237800 | 197400 |
6 | Martin | Smith | M9 | 109000 | 154000 |
7 | Julia | Gutenberg | M9 | 218300 | 154000 |
8 | Ingrid | Karp | X25 | 117400 | 150000 |
Multiple CTE
We can have several CTEs in our query. Let's get back to our previous example:
What if we want to have a table where the difference between sales_volume and average_sales_for_department is calculated for every sales representative?
We can make the following CTE:
first, we form a CTE
avg_department, where we calculate the average sales volume for each departmentthen we form CTE
sales_diff_from_avg_by_dep, where we join CTEavg_departmentwithsales_marchtable and calculate the difference betweensales_volumeandaverage_sales_for_department
WITH avg_department AS (
SELECT department, AVG(sales_volume) AS average_sales_for_department
FROM sales_march
GROUP BY department
),
sales_diff_from_avg_by_dep AS (
SELECT
s.id, s.first_name, s.last_name, s.department,
(s.sales_volume - ROUND(ad.average_sales_for_department)) AS diff_from_avg_by_dep
FROM sales_march s
JOIN avg_department ad
ON s.department = ad.department
)
SELECT *
FROM sales_diff_from_avg_by_dep;This example also demonstrates a nested CTE: the CTE sales_diff_from_avg_by_dep uses the results of the avg_department CTE. The resulting column is named diff_from_avg_by_dep, because that is the alias given in the second CTE.
The resulting table looks as the following:
id | first_name | last_name | department | diff_from_avg_by_dep |
1 | Emily | Wilson | X25 | -22000 |
2 | Chris | Parker | B23 | -40400 |
3 | Kate | Preston | X25 | 53500 |
4 | John | Spacy | M9 | -20000 |
5 | Kevin | Douglas | B23 | 40400 |
6 | Martin | Smith | M9 | -45000 |
7 | Julia | Gutenberg | M9 | 64300 |
8 | Ingrid | Karp | X25 | -32600 |
Now it is clear to the manager who might deserve a monthly bonus and who may need to reconsider their sales efforts.
As for us, we now understand how CTEs can be used to answer different types of queries.
Conclusion
Now that we've studied how CTE works, let's sum up its advantages:
queries work faster since you don't have to store unnecessary entities in memory for a long time
it's easier to understand, as a complex query is divided into several 'blocks': data acquisition becomes consistent and understandable
it facilitates support, as it is easier to maintain, fix and modify
CTEs are used in many popular systems, they are just called by different names. If you become familiar with table expressions, for example in PostgreSQL, then in the future it will be easier for you to become familiar with similar structures in Oracle or other DBMS.
CTE appears to be quite a useful tool in writing SQL queries, so it is high time to start practicing them!