Introduction
First of all, we should mention that this topic is intended to be used by an advanced SQL user. Frankly speaking, without basic SQL knowledge, a lot of things that are discussed here can be confusing and complicated.
Apart from that, it would be fair to say that if you read the title of the topic and did not get scared that means that you are more than an experienced SQL user. So let's find out how to make your work more efficient.
EXPLAIN
When you work with large datasets, it is vital to understand how your queries are being executed and identify any drawbacks in your database performance. One tool that can help you do this is the EXPLAIN command.
EXPLAIN – is a tool that shows the query execution plan, which includes information on how the query will be executed and which indexes will be used. It does not actually execute the query and does not provide information on the actual execution time or resource usage. Briefly speaking, this command alone won't give you information about how long each step of the query takes. And that is the part where EXPLAIN ANALYZE comes in handy.
EXPLAIN ANALYZE
EXPLAIN ANALYZE– is a function that exists in many SQL databases (PostgreSQL, Oracle, MySQL, etc.). It allows a user to get information on how the database engine executes a given query.
And it not only shows the query execution plan but also executes the query and provides information on the actual execution time, resource usage, and other performance metrics. This feature is useful for identifying performance bottlenecks in a query and optimizing it for better performance.
When you use a query with EXPLAIN ANALYZE, your database shows its plan, which is an instruction on how it will execute your query. The plan underlines which tables will be used in the query, what indexes (if they exist) will be used, and how the data will be combined and filtered from different tables.
It provides detailed information about the query execution plan and the cost of each operation in the query, allowing you to spot inefficiencies. By analyzing the query plan and the performance statistics, you can identify potential performance drawbacks and search for ways to optimize your query.
Let's move on to how we can use it. Use this data to create tables.
Here's the template:
Let's analyze this query that gives us average marks for the Ravenclaw faculty.
EXPLAIN ANALYZE
SELECT AVG(s.mark) as mark,
s.faculty_name
FROM (SELECT f.name as faculty_name,
m.mark
FROM mark m
INNER JOIN students s on s.id = m.student_id
INNER JOIN (SELECT id, name FROM faculty_name) f ON f.id = s.faculty_id) as s
WHERE s.faculty_name = 'Ravenclaw'
GROUP BY s.faculty_name;But what if we optimize it a little bit, get rid of the subquery, and put the where condition to the join condition?
EXPLAIN ANALYZE
SELECT f.name as faculty_name,
AVG(m.mark) as mark
FROM mark m
INNER JOIN students s on s.id = m.student_id
INNER JOIN faculty_name f ON f.id = s.faculty_id and f.name = 'Ravenclaw'
GROUP BY faculty_name;Let's look at the results.
As you can see, the cost indicator, rows, and loops stayed the same except for the actual time which decreased significantly!
Understanding EXPLAIN ANALYZE
Additionally, you can examine other factors such as the number of rows returned, the number of rows scanned, the number of join operations, and the presence of any sequential or index scans.
In the output of EXPLAIN ANALYZE, you can focus on the following sections to identify optimization opportunities:
Planning Time. This section shows the time taken by the optimizer to generate the query plan. A high planning time may indicate a complex query or the need for additional indexes.
Actual Time. This section shows the total execution time of the query. Lower execution time indicates a more optimized query.
Total Cost. This section shows the total estimated cost of executing the query. Lower total cost indicates a more optimized query.
Rows Returned. This section shows the number of rows returned by the query. Fewer rows returned indicate a more efficient query.
Join Operations. This section shows the number of join operations performed by the query. Fewer join operations typically indicate a more optimized query.
Sequential or Index Scans. This section shows the number of sequential or index scans performed by the query. Fewer scans indicate a more optimized query, as these operations are typically more expensive than other types of scans.
When comparing two queries using EXPLAIN ANALYZE, you should primarily look at the execution time and the total cost of the query. As a rule, the lower the execution time and the total cost are, the more optimized the query is.
Conclusion
One of the key features that help users to optimize queries is the ability to analyze queries using the EXPLAIN and EXPLAIN ANALYZE commands.
In summary, EXPLAIN helps with query optimization by showing the execution plan, while EXPLAIN ANALYZE takes it a step further by actually executing the query and providing detailed performance metrics. By using these commands to examine query plans and execution times, you can identify potential performance issues and make changes to improve the performance of your database.