The Django ORM is a powerful tool but certain aspects of it are counterintuitive, such as the SQL order of execution.
Let’s look at an example of this trap and how we can fix it using subqueries:
class Book(models.Model):
class Meta:
constraints = [
models.UniqueConstraint(
fields=["name", "edition"],
name="%(app_label)s_%(class)s_unique_name_edition",
)
]
name = models.CharField(max_length=255)
edition = models.CharField(max_length=255)
release_year = models.PositiveIntegerField(null=True)
I want to write a query that reads:
Out of the latest books, give me the ones with a non-null release year.
My first attempt might be:
Book.objects.order_by("name", "-edition")
.distinct("name")
.filter(release_year__isnull=False)
This seems like a sound approach. Note we’re PostgreSQL because it’s the only database that supports distinct(*fields)
(docs).
Let’s test this out using the following sample data:
Name | Edition | Release Year |
---|---|---|
Django tips | 1 | 2020 |
Django tips | 2 | 2022 |
Django tips | 3 | null (coming soon) |
Golf swings | 1 | 2018 |
Golf swings | 2 | 2021 |
We’re writing our unit test in pytest
and pytest-django
. Our query should return just the last record, book5
.
import pytest
from pytest_django.asserts import assertQuerysetEqual
def test_queryset():
book1 = Book.objects.create(name="Django tips", edition=1, release_year=2020)
book2 = Book.objects.create(name="Django tips", edition=2, release_year=2022)
book3 = Book.objects.create(name="Django tips", edition=3, release_year=None)
book4 = Book.objects.create(name="Golf swings", edition=1, release_year=2018)
book5 = Book.objects.create(name="Golf swings", edition=2, release_year=2021)
qs = (
Book.objects.order_by("name", "-edition")
.distinct("name")
.filter(release_year__isnull=False)
)
assertQuerysetEqual(qs, [book5])
This test fails because the queryset returns [book2, book5]
. We can see the raw SQL using qs.query
, which looks something like this:
SELECT DISTINCT ON (name) * FROM core_book
WHERE release_year IS NOT NULL
ORDER BY name ASC, edition DESC;
The WHERE
clause appears before the ORDER BY
clause, which produces a completely different query that reads:
Out of the books with a non-null release year, give me the latest ones.
Unfortunately, we can’t just simply swap the clauses. The SQL order of execution guarentees WHERE
clauses are executed before ORDER BY
clauses.
So how can we fix this? The answer is using SQL subqueries.
Book.objects.filter(
id__in=Book.objects.order_by("name", "-edition").distinct("name").values_list("id", flat=True),
release_year__isnull=False,
)
Here, we’re using two different queries to fetch our result:
- Get the latest books.
- Out of the latest books, give me the ones with a non-null release year.
Note, you don’t actually need .values_list("id", flat=True)
. Django will automatically return just the id
field when a queryset is used in a subquery.
Here’s our updated test:
def test_queryset():
...
qs = Book.objects.filter(
id__in=Book.objects.order_by("name", "-edition").distinct("name"),
release_year__isnull=False,
)
assertQuerysetEqual(qs, [book5])
The test passes!
In conclusion, subqueries are a great mechanism for writing fast and complex queries. Check out Django’s documentation here for more examples.