Aggregations and Ordering

12 minutes read

Filtering data is the same as asking what objects with certain features we have. Database queries can answer other questions. For example, how many objects do we have? How much money do you have in the bank? What is the height of all towers in Spain? We can also gather our data in groups and aggregate each value. Another helpful operation is sorting data by attributes. Let's see how it works in Django ORM.

Ordering

Everybody wants to know what will happen in the future. Even the most sophisticated level of programming can't give you that. But come to think about it; weather forecasts are a form of fortune-telling. The methods are inaccurate, but we may rely on historical data and statistics. The only model in our application is DayWeather:

from django.db import models


class DayWeather(models.Model):
    date = models.DateField()
    precipitation = models.FloatField()
    temperature = models.FloatField()
    was_raining = models.BooleanField()

Are you curious to find the three coldest days in our database? Let's look for an answer:

top_three_coldest_days = DayWeather.objects.order_by('temperature')[:3]

Remember that ordering works like sorting: the first value is the smallest, and the last is the biggest. We call the order_by method of the object manager with temperature as a parameter, and it returns data sorted by temperature field.

This method may be applied for the object manager or a QuerySet; you may filter data, order it, carry out aggregation operations, and sort the total result.

How about getting the top three hottest days from the database?

top_three_hottest_days = DayWeather.objects.order_by('-temperature')[:3]

Everything changes when we add a minus to our parameter; it reverses the order. In our case, we get the most significant values of temperature first.

Aggregations

The simple rule of weather forecasts is that tomorrow's weather will be similar to today's weather. We also know that the weather is seasonal. So if we want to know how many days will be rainy and gloomy next month, we should look at how many days it rained in the same month last year. Assume that we have already defined variables in our code for the year and month: last_year and next_month:

raining_days = DayWeather.objects.filter(
    date__year=last_year, date__month=next_month, was_raining=True
)

raining_days_forecast = raining_days.count()

We make a QuerySet and call the count method. We think it is an excellent approximation to count the rainy days in the same month last year and base our forecast on this number.

Why do we use count() instead of len()? If you only need to determine the number of records in the set (and don't need the actual objects), it's much more efficient to handle a count at the database level using SQL SELECT COUNT(*). Django provides the count() method for this.

We may also look at the average temperature for the next week. For this forecast, analyze the temperature for the past week:

from datetime import date, timedelta
from django.db.models import Avg

query = DayWeather.objects.filter(date__gt=date.today() - timedelta(days=7))

average_temperature = query.aggregate(average=Avg('temperature'))['average']

We call the aggregate method and pass any custom name as a parameter. The value of this parameter is a special function Avg, and the parameter of the Avg function is the name of the field we want to process.

The result of this function is Python's dictionary {'average': ...}. We get the value of the average temperature by the custom name we chose early in the aggregate method. Then, refer to the calculated value (the value in the dictionary) using the key name: ['aggregate'].

It almost looks as if programmers can do everything, from Hello, World! to predicting this world's future, at least to an extent.

In the examples, we have constructed filters that compare the value of a model field with a constant. But what if we want to compare the value of a model field with another field on the same model? Django provides F() expressions for such comparisons. Instances of F() act as a reference to a model field within a query. These references can then be used in query filters to compare the values of two fields on the same model instance.

For example, to find a list of all days that go after the first sunny day, we construct an F() object to reference the first_sunny_day date and use that F() object in the query:

from django.db.models import F

query = DayWeather.objects.filter(date__gt=F('first_sunny_day'))

Group by aggregations

We look through Django aggregation functions and find Avg, Count, Max, Min, StdDev, Sum, and Variance. We can apply any of these functions to the numerical field values of a QuerySet.

Another task is to predict the total precipitation for each month for the year ahead. Should we create twelve QuerySets and process them one by one? Well, this is one way to do it; the other is to group values by month:

from django.db.models import Sum

precipitation = DayWeather.objects.filter(date__year=last_year) \
                          .values('date__month') \
                          .annotate(sum=Sum('precipitation'))

# precipitation is <Queryset [{'date__month': 1, 'sum': ...}, ...]>

We make a query and group our values by the month calling values method, and pass a field or a field lookup to it. Then we call the annotate method; its syntax rules are the same as for aggregate in the previous example.

The result is a QuerySet consisting of customized objects in the form of dictionaries. We can access each object by index or convert the QuerySet to a Python collection and work with it as we would with any other collection.

Another difference between annotate and aggregate is that annotate runs the aggregation function for each row in a QuerySet, while aggregate does it on the whole QuerySet.

Count function vs. count method

Another prediction is the number of warm days with the outdoor temperature greater than or equal to 20 degrees Celsius per week. Look at the values of the last year for each week:

from django.db.models import Count

warm_days = DayWeather.objects.filter(date__year=last_year) \
                      .filter(temperature__gte=20) \
                      .values('date__week') \
                      .annotate(count=Count('date'))

# warm_days is <Queryset [{'date__week': 1, 'count': ...}, ...]>

The call is similar to the previous one, but this time we pass the function Count to the annotate method.

There is a difference between the count function and the count method of a QuerySet. The Count function returns statistics for each annotated value; the method returns the number of elements in QuerySet.

Select_related and prefetch_related

Ok, our app already has almost all the necessary functionality. We need an understanding of what location the weather description is talking about. Let's add a new model Location and add the ForeignKey field to our first model:

class Location(models.Model):
    name = models.CharField(max_length=100)

class DayWeather(models.Model):
    date = models.DateField()
    precipitation = models.FloatField()
    temperature = models.FloatField()
    was_raining = models.BooleanField()
    location = models.ForeignKey(Location, on_delete=models.CASCADE)

Now if we want to retrieve temperature and associated locations, you can use the following code:

weathers = DayWeather.objects.all()

for weather in weathers:
    location = weather.location
    print(f"{location.name} has temperature {weather.temperature}")

But it leads to one SQL query to retrieve all the weather information and then one additional query for each weather to retrieve the associated location.

However, there is select_related method that tells Django to include the related objects in the executed query instead of retrieving them with separate queries.

weathers = DayWeather.objects.select_related('location').all()

for weather in weathers:
    location = weather.location
    print(f"{location.name} has temperature {weather.temperature}")

This will result in a single SQL query retrieving all the weather and associated locations.

Note that select_related method works with only foreign key relationships and not many-to-many relationships. For the many-to-many relationships, there is prefetch_related method.

For example, we want to add a measurement method to our app. We create a new model MeasurementMethod and add a new field to the DayWeather model:

class DayWeather(models.Model):
    date = models.DateField()
    precipitation = models.FloatField()
    temperature = models.FloatField()
    was_raining = models.BooleanField()
    measurement_method = models.ManyToManyField('MeasurementMethod', related_name='day_weathers')

class MeasurementMethod(models.Model):
    name = models.CharField(max_length=100)

Now if we want to retrieve the temperature and the measurement method, we can use the following code:

weathers = DayWeather.objects.prefetch_related('measurement_method').all()

for weather in weathers:
    for measurement_method in day_weather.measurement_method.all():
          print(f"{measurement_method.name} shows temperature {weather.temperature}")

Using prefetch_related is very useful when you need to fetch multiple related objects for each record, as it can significantly reduce the number of database queries required to retrieve all the data.

Conclusion

In this topic, we have learned some essential methods of QuerySet and working with various queries. If you feel that you need some more detail related to this, you can check out the QuerySet documentation and Aggregation documentation. Now, let's go to practice!

2 learners liked this piece of theory. 3 didn't like it. What about you?

Report a typo