-
-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized nested pagination is broken for m2m fields #650
Comments
For reference, our understanding is that (at a high level) the current nested pagination looks like this: tag_queryset.prefetch_related(
Prefetch(
"issues",
queryset=issue_queryset.annotate(
_strawberry_row_number=Window(
RowNumber(),
partition_by=["tags"],
order_by=[...],
),
_strawberry_total_count=Window(
Count(1),
partition_by=["tags"],
),
).filter(
_strawberry_row_number__gt=5,
_strawberry_row_number__lte=10,
),
)
) But to work correctly, it could be updated to something like this: tag_queryset.prefetch_related(
Prefetch(
"issues",
queryset=issue_queryset[5:10],
to_attr="_strawberry_optimized_issues",
)
).annotate(
_strawberry_total_count_issues=Subquery(
issue_queryset.filter(tags=OuterRef("pk"))
.values("tags")
.annotate(total_count=Count("pk"))
.values("total_count"),
)
) The main complexity here is how to get Strawberry / Strawberry-Django to:
The other complexity is how to calculate things like |
Hey @SupImDos , I was doing some tests on this and discovered that the issue is with Django. If I do this in your Tag.objects.prefetch_related(
Prefetch("issues", queryset=Issue.objects.all().annotate(foo=F("tags")))
) I get duplicated issues as well. This is the SQL generated: SELECT ("projects_issue_tags"."tag_id") AS "_prefetch_related_val_tag_id",
"projects_issue"."name",
"projects_issue"."id",
"projects_issue"."kind",
"projects_issue"."priority",
"projects_issue"."milestone_id",
"projects_issue_tags"."tag_id" AS "foo"
FROM "projects_issue"
LEFT OUTER JOIN "projects_issue_tags" ON ("projects_issue"."id" = "projects_issue_tags"."issue_id")
INNER JOIN "projects_issue_tags" T4 ON ("projects_issue"."id" = T4."issue_id")
WHERE T4."tag_id" IN (3,
4)
ORDER BY "projects_issue"."id" ASC The only thing I did here was to annotate We can try to find a related issue for this or report it to django. In the meantime, I would like to find a way to workaround this without having to rewrite this, because as you already pointed out, the challenge will probably be even greater. I was trying to play with the |
Opened this issue on the Django bug tracker: https://code.djangoproject.com/ticket/36035 |
Thank you for putting in the effort and figuring out what was going on there. 🙏 |
Btw, it seems that we might get a bug fix soon: https://code.djangoproject.com/ticket/36035#comment:1 I tested the patch they mentioned and it indeed fixes the issue. I'll try to workaround some solution for strawberry-django in the meantime |
When annotating something from the relation into the prefetch queryset for a m2m relation, Django will mistakenly not reuse the existing join and end up resulting in the generation of spurious results. There's an ongoing fix for this i this ticket: https://code.djangoproject.com/ticket/35677 This is monkey patching older versions of Django which doesn't contain the fix, and most likely won't (Django usually only backports security issues), to fix the issue. Thanks @SupImDos for providing an MRE in the form of a test for this! Fix #650
When annotating something from the relation into the prefetch queryset for a m2m relation, Django will mistakenly not reuse the existing join and end up resulting in the generation of spurious results. There's an ongoing fix for this i this ticket: https://code.djangoproject.com/ticket/35677 This is monkey patching older versions of Django which doesn't contain the fix, and most likely won't (Django usually only backports security issues), to fix the issue. Thanks @SupImDos for providing an MRE in the form of a test for this! Fix #650
When annotating something from the relation into the prefetch queryset for a m2m relation, Django will mistakenly not reuse the existing join and end up resulting in the generation of spurious results. There's an ongoing fix for this i this ticket: https://code.djangoproject.com/ticket/35677 This is monkey patching older versions of Django which doesn't contain the fix, and most likely won't (Django usually only backports security issues), to fix the issue. Thanks @SupImDos for providing an MRE in the form of a test for this! Fix #650
The only workaround I found was to monkey patch django =P Although I prefer to avoid doing that, this seems safe as the monkey patched functions share the same code in all supported versions (4.2, 5.0 and 5.1) and we are not going to apply this for 5.2 which is going to contain the fix. The PR is opened here: #681 |
Describe the Bug
Nested pagination through m2m fields is broken with the optimizer enabled. This appears to be because using
.prefetch_related()
with aQuerySet
annotated with aWindow
function wherepartition_by
is an m2m field causes an extra join and duplicate results.Reproducible Example
See a minimal reproducible example here:
mre/nested-pagination-m2m
In the unit test above, the expected result of the query is:
However, the test fails, and the actual result of the query is:
System Information
Additional Context
It appears that this is caused by the implementation of
apply_window_pagination()
.As suggested above, when you annotate a
.prefetch_related()
QuerySet
with aWindow
function and refer back to the other side of the m2m, you add another join - causing duplicate results. When Django introduced the ability to prefetch with sliced querysets (which Strawberry-Django isn't directly using) they had to do a bunch of hacks to combine calls to filter together (seeQuerySet._next_is_sticky()
and_filter_prefetch_queryset()
).To demonstrate the difference directly with the Django ORM see: a6f4e31.
In the unit test above, Strawberry-Django produces the following SQL query for the paginated prefetch:
Whereas Django produces this SQL query for the paginated prefetch:
Note the extra
LEFT OUTER JOIN
which causes the issue.Potential Solution
As noted in the
apply_window_pagination()
docstring, Django 4.2+ actually supports slicedQuerySet
s in.prefetch_related()
now.The solution here may be to actually use Django's inbuilt support for
.prefetch_related()
QuerySet
slicing. However, this means we can't annotate_strawberry_total_count
onto the nodes anymore. It is likely that we would need to refactor that functionality onto the parent records, maybe using a subquery count andOuterRef
?Upvote & Fund
The text was updated successfully, but these errors were encountered: