-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to tune project listing query again #11620
base: main
Are you sure you want to change the base?
Conversation
agjohnson
commented
Sep 25, 2024
- Don't subquery for builds in project listing prefetch
This seems like it's unneccessary, but the prefetch is not accurate using Build.objects first.
.values_list("id", flat=True)[:1] | ||
# Get most recent and recent successful builds | ||
builds_latest = ( | ||
Build.internal.filter(project__in=self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear why Build.internal
is needed in both the inner query and the prefetch query. It seems like one of them could be Build.objects
at very least? The performance is a little better with Build.objects
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand is to avoid getting builds from PRs (external versions) and I'd say that .internal
should perform better than .objects
since it removes a lot of builds to consider and they should be removed using an index Build.type
. That's the theory, tho 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh same, that was my thought initially. There is some complexity added by this method though, which I think ultimately annoys the query planner.
.annotate(latest=Max("pk")) | ||
.values_list("latest", flat=True) | ||
) | ||
builds_success = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it could be combined into the query above, saving ~400ms.
This reduced the time needed for prefetch from 12s to 3s, but this is still not usable. It's really not clear why, but the planner was previously falling apart on this and triggered a sequence scan on This is most easily testable against one of our accounts: In [1]: %time list(Project.objects.dashboard(User.objects.filter(is_staff=True).first()))
CPU times: user 28.8 ms, sys: 45 μs, total: 28.9 ms
Wall time: 2.44 s The current explain looks different than it did (specifically it doesn't sequence scan builds_version), but it is overly complex for :
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to note here is that we are using .prefetch_related
here, which does the joining on the Python side. That could explain why some of there queries are fast when testing them in the DB, but slow when using Django to access these views:
prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python
(from https://docs.djangoproject.com/en/5.0/ref/models/querysets/#prefetch-related)
Since we are using Max()
to get the latest build and the latest successful build for each project, we could probably use select_related
instead here which will make everything at the DB.
.values_list("id", flat=True)[:1] | ||
# Get most recent and recent successful builds | ||
builds_latest = ( | ||
Build.internal.filter(project__in=self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand is to avoid getting builds from PRs (external versions) and I'd say that .internal
should perform better than .objects
since it removes a lot of builds to consider and they should be removed using an index Build.type
. That's the theory, tho 😄
.values_list("id", flat=True)[:1] | ||
# Get most recent and recent successful builds | ||
builds_latest = ( | ||
Build.internal.filter(project__in=self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of __in
can't we just use project__pk=self.pk
here as I did in another PR? That worked pretty good there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self
is a Project.objects
queryset, not an individual model instance.
# Get most recent and recent successful builds | ||
builds_latest = ( | ||
Build.internal.filter(project__in=self) | ||
.values("project") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this line here? I understand the project
value is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not, but this is for grouping by project. The latest build day per project is what is needed here. If there is a different way to group this, we don't need the second query at all.
I quickly tested this and it's slower 😄 |