Data is crucial in most Django projects. Imagine having a database where you store your T-shirt sales information.
If you want to know how many T-shirts are in stock, there are multiple ways to accomplish that. One method could be adding all the T-shirts in inventory to a Python list and then determining the length of the list.
However, if you have a large number of T-shirts in your database, then a Python list may not be the most efficient way to count them.
The good news is, your database can count items for you, via operations. The counting becomes much faster if done through database operations.
Learning these database-specific operations is beneficial as they tend to be faster and easier to comprehend than their Python equivalent.
Database operations help you retrieve data from your database, a process referred to as querying. The SQL programming language is utilized to query your database.
Rather than using a Python list to count your T-shirts, you can write an SQL query to let the database return the total number of T-shirts in stock.
However, dealing with raw SQL queries in Django can be challenging. Firstly, you need to write the SQL query, which might get complex. Using the raw() manager method requires iterating through its result, which is a model instance.
Writing raw SQL might expose your application to SQL injections, and you must remember to safeguard your code against such attacks.
You can count all the T-shirts in stock in a more straightforward way using the Django Object Relational Mapping (ORM). It offers options like aggregation and annotation.
Aggregation and annotation exploit Django's ORM layer to return values from a QuerySet. A QuerySet pertains to data in a database. In our T-shirt example, a QuerySet could represent T-shirt data in a table or model named "Tshirts".
Aggregation helps us calculate summarized values for the entire QuerySet. For instance, the total number of T-shirts we have in stock, or the total number of small-sized T-shirts sold.
On the other hand, annotation returns details of every T-shirt in our database, or every T-shirt of a particular size.
This topic primarily focuses on aggregations. We perform aggregations via the GROUP BY clause in SQL.
We will delve into Django's built-in aggregation functions, Min, Max, Sum, and Count.
To understand how these aggregate functions work, we'll create a Django model called "Tshirt". It will include the fields: tshirt_id, size, color, fabric, price, and defective.
from django.db import models
class Tshirt(models.Model):
tshirt_id = models.IntegerField()
size = models.CharField(max_length=100)
color = models.CharField(max_length=100)
fabric = models.CharField(max_length=100)
price = models.DecimalField(max_digits=10, decimal_places=2)
defective = models.BooleanField(default=False)
def __str__(self):
return f"{self.size} {self.color} {self.fabric}"
Now let's look at the structure of the project
Aggregations(root folder)/
├── Aggregations/
│ ├── asgi.py
│ ├── __init__.py
│ ├── settings.py
│ ├── urls.py
│ └── wsgi.py
├── AggregationsApp/
│ ├── admin.py
│ ├── apps.py
│ ├── __init__.py
│ ├── migrations/
│ │ ├── 0001_initial.py
│ │ ├── __init__.py
│ ├── models.py
│ ├── tests.py
│ └── views.py
├── manage.py
├── structure.txt
└── templates/
└── data.html
Let's dive deeper into aggregations. We'll start with a simple aggregation that counts the QuerySet.
Count()
To find how many T-shirts you have in stock, you could create a Python list, and then calculate the length of the list with Python's len() function.
If your database contains a large number of T-shirts, a Python list might not be the most efficient method of counting them.
A better alternative is the Count () aggregate function. It allows you to tally a certain number of rows.
We can import the Count () aggregate function from the Django ORM and use it to compute the total number of T-shirts we currently have in our database.
from . models import Tshirt
from django.db.models import Count
total = Tshirt.objects.aggregate(all_tshirts_=Count("tshirt_id"))
print(f"The total number of tshirts is {total}")
# The total number of tshirts is {'all_tshirts_': 15}
The Count method returns a single value, which in this case is the total number of T-shirts in our database table, 15.
We apply the Count aggregate method on the Tshirt model's manager (objects) to count all rows in the tshirt_id field in the "Tshirt" model.
The result of this aggregation operation, a dictionary, is stored in the "total" variable, where the key is "all_tshirts," and the value is the total number of T-shirts.
If we perform the same aggregation in SQL, we would use the Count() function.
SELECT COUNT(tshirt_id) FROM tshirts;
# count
# 15
# (1 row)
We pass the "tshirt_id" column to the Count() function. The function then counts all rows in the specified column and returns a single value, 15, which is the total number of T-shirts in our table.
The functions Count(), Min(), and Max() are not the only useful aggregates. Let's examine some others.
Min()
The Min() aggregate function returns the smallest value in a dataset. We might want to find the cheapest T-shirt we currently have in stock, for example.
Here is the query we should execute.
from . models import Tshirt
from django.db.models import Min
min_price = Tshirt.objects.aggregate(least_priced=Min("price"))
print(min_price)
# {"least_priced": Decimal ("1.00")}
We've retrieved the minimum price of T-shirts from our database.
We use the Min aggregate method on the "Tshirt" model's manager (objects) to find the minimum value in the price field in the "Tshirt" model.
We store the result of this aggregation operation in the min_price variable. It contains a dictionary where the key is "least_priced," and the value is the minimum price.
From this, we can conclude that the cheapest T-shirt in our database costs 1.00. We can see that our result includes the data type (Decimal ("1.00").
This information is crucial when working with databases because it helps prevent unintentional type conversions and data loss.
We can convert our result into a float that doesn't include the data type by adjusting our query.
min_price_decimal = min_price["least_priced"]
min_price_float = float(min_price_decimal)
print(f"The cheapest tshirt in our database is: {min_price_float}")
# The cheapest tshirt in our database is: 1.0
If we need to execute the same aggregation in SQL, we would use the Min() function.
SELECT MIN(price) FROM tshirts;
# min
# 1.00
# (1 row)
We pass the "price" column to the Min() function. The function then finds the row with the smallest value in the specified column and returns a single value, 1.00, which is the lowest price of a T-shirt in our table.
We can further refine our Min() function to find the cheapest cotton T-shirt in our stock, for example.
from . models import Tshirt
from django.db.models import Min
min_price = Tshirt.objects.filter(fabric="cotton").aggregate(least_priced=Min("price"))
min_price_decimal = min_price["least_priced"]
min_price_float = float(min_price_decimal)
print(f"The cheapest cotton tshirt in our database is: {min_price_float}")
# The cheapest cotton tshirt in our database is: 1.2
Let's consider more Django aggregations for various operations. Next on our list is Max.
Max()
The Max() function returns the largest value in a dataset. For instance, we could find the most expensive T-shirt in stock.
from . models import Tshirt
from django.db.models import Max
max_price = Tshirt.objects.aggregate(most_priced=Max("price"))
print(max_price)
# {"most_priced":Decimal("4.50")}
Here, we calculate the maximum (highest) value of the price field among all T-shirt objects in the database and assign it to the variable max_price.
The result is stored as a dictionary with the alias "most_priced".
We can convert our result into a float that doesn't include the data type. Here is the way to adjust our query.
max_price_decimal = max_price["most_priced"]
max_price_float = float(max_price_decimal)
print(f"The most expensive tshirt in our database is: {max_price_float}")
# The most expensive tshirt in our database is: 4.5
We now have our result as a float.
If we need to execute the same aggregation in SQL, we would use the Max() function.
SELECT MAX(price) FROM tshirts;
# max
# 4.50
# (1 row)
We pass the "price" column to the Max() function. The function then identifies the row with the highest value in the specified column and returns a single value, 4.50, which is the price of the most expensive T-shirt in our stock.
We can further refine our Max() function to find the most expensive black T-shirt in our stock, for example.
from . models import Tshirt
from django.db.models import Max
max_price = Tshirt.objects.filter(color="black").aggregate(most_priced=Max("price"))
max_price_decimal = max_price["most_priced"]
max_price_float = float(max_price_decimal)
print(f"The most expensive black tshirt in our database is: {max_price_float}")
# The most expensive black tshirt in our database is: 4.0
Interestingly, we can combine operations.
We can find the difference between the most expensive and cheapest T-shirts in stock.
from . models import Tshirt
from django.db.models import Min, Max
max_price = Tshirt.objects.aggregate(most_priced=Max("price"))
max_price_decimal = max_price["most_priced"]
max_price_float = float(max_price_decimal)
min_price = Tshirt.objects.aggregate(least_priced=Min("price"))
min_price_decimal = min_price["least_priced"]
min_price_float = float(min_price_decimal)
difference_price = max_price_float - min_price_float
print(f"This is the difference in price: {difference_price}")
# This is the difference in price: 3.5
We see that the price difference between our most expensive and cheapest T-shirt is 3.5
Another frequently used operation is Sum (). Let's explore how to use it.
Sum()
If we wanted to compute the total value of our stock, we would need to add the prices of all the T-shirts in our database. Python's sum() function could be helpful here.
We would need to store our T-shirts in a Python iterable, such as a list because the sum() function only takes an iterable as its first parameter.
If our database has many T-shirts, or if we need to work with multiple tables, the sum() function may not be the right choice.
The Sum() aggregate function enables us to find the total value of a complete QuerySet.
We can find the total value of all the T-shirts in stock.
Let's write the query.
from . models import Tshirt
from django.db.models import Sum
total_value = Tshirt.objects.aggregate(total_val=Sum("price"))
print(total_value)
# {"total_val": Decimal("43.70")}
We calculate the total value of the price field in the "Tshirt" model.
We then print the result of the aggregation operation, a dictionary with the alias "total_val" as the key and the calculated sum of prices as the value.
We can convert our result into a float that doesn't include the data type. To do this, we'll need to adjust our query.
total_value = Tshirt.objects.aggregate(total_val=Sum("price")) ["total_val"]
print(f"The total value of all tshirts in our database is: {total_value}")
# the total value of all tshirts in our database is: 43.70
We now have our result as a float.
Let's write the raw SQL query to perform the same operation.
SELECT SUM(price) FROM tshirts;
# sum
# 43.70
# (1 row)
We would pass the "price" column to the Sum() function. The function would then return the sum of all the values in the specified column returning a single value, 43.70.
We can further refine our Sum() function, to get the total value of all nylon T-shirts in our database.
from . models import Tshirt
from django.db.models import Sum
total_value = Tshirt.objects.filter(fabric="nylon").aggregate(total_value=Sum("price"))["total_value"]
print(f"The total value of nylon tshirts in our database is: {total_value}")
# The total value of nylon tshirts in our database is: 17.40
We have looked at commonly used aggregations: Count, Max, Min, and Sum.
However, sometimes, you want to further refine your conditions within the SELECT statement, in addition to the filtering that the WHERE clause does.
Both raw SQL and the Django ORM support this functionality. Let's delve into it.
Case/When
Sometimes, we may want to perform aggregations that meet certain conditions, just like we would with if else statements in Python.
For example, we may want to exclude defective T-shirts when calculating the total value of T-shirts in our database.
We would need to then include this condition when querying the database. Let's adjust our Sum () query.
from . models import Tshirt
from django.db.models import Sum, Case, When, F, Value, IntegerField
total_value = Tshirt.objects.aggregate(
tshirts_value=Sum(
Case(
When(defective=False, then=F("price")),
default=Value(0),
output_field=IntegerField(),
)
)
)
print(f"The total value of non-defective tshirts is {total_value['tshirts_value']}")
# the total value of non-defective tshirts is 40
We have used Django's Case and When expressions to calculate the total value of T-shirts when defective is False. This should give us the total value of all non-defective T-shirts.
The F object references the price field, and default=Value(0) ensures that if there are no T-shirts where defective is False, the total value will be 0.
We specify output_field=IntegerField() to ensure the result is treated as an integer.
As we can see, the result is the total value of non-defective T-shirts in our QuerySet.
Let's look into the raw SQL version.
We can exclude defective T-shirts when calculating the total value of our stock.
SELECT SUM(CASE WHEN NOT defective THEN price ELSE 0 END) AS total_price
FROM tshirts;
# total_price
# 40.50
We'll use CASE/WHEN to pass the condition to exclude defective T-shirts into our Sum() aggregation.
The "When" part of the clause is equivalent to an IF statement. The "then" part of the clause is equivalent to the code that comes after the IF statement.
In our example, our condition is "WHEN T-Shirts are NOT defective THEN calculate the value of the price column". This is the same as the pseudocode IF statement, "IF T-Shirts are not defective THEN calculate the value of the price column".
We then have the ELSE statement, which is what happens if the condition is not met. In our example, if for some reason all our T-shirts are defective, then our Sum will be zero.
Eventually, we have an alias for our calculated value, "total_price".
Conclusion
Aggregate functions come in handy to help you perform operations in your database.
The above examples introduce you to aggregation and querying your database in Django. These are, however, only the basics, and you can create complex queries as needed.
Let's summarize some things to keep in mind.
-
Typically, the closer you are to your data, the faster the computation should work. It is beneficial to delegate operations and number crunching to the DB queries instead of using pure Python.
-
For simple counting use the Count() aggregation. You can use it to compute the number of objects that conform to certain restrictions.
-
When you need to find some limits on a float field, it is convenient to use Min() and Max() aggregations.
-
Use Sum() to compute the total amount in a QuerySet.
-
When you need some simple logic or error handling, use Case/When.
Aggregations can help you save your sanity because Django keeps the implementation hidden and you don't need to worry if you change the database. Aggregations are also more Pythonic.