-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
additional changes to LR2 branch #1989
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ources2 This implementation should be much faster for intersections, exclusions and caveats due to early tree shearing and check hints
…dispatched resources when checking those already received from another dispatch Adds some parallelism back into LR2
Also adds additional testing to ensure check hints are used in LR2
uses string.Builder and strings.Repeat (which also uses string.Builder) to reduce CPU and allocations showed up in generating 10% of allocations and 5% of CPU on a LookupResources-based workload. This reduces 66.25% allocations and 92% CPU
github-actions
bot
added
area/CLI
Affects the command line
area/api v1
Affects the v1 API
area/datastore
Affects the storage system
area/tooling
Affects the dev or user toolchain (e.g. tests, ci, build tools)
area/dispatch
Affects dispatching of requests
labels
Jul 19, 2024
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
I have read the CLA Document and I hereby sign the CLA |
recheck |
authzedbot
added a commit
to authzed/cla
that referenced
this pull request
Jul 19, 2024
we've observed improvements on LR workloads when increasing the dispatch chunk size
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
area/api v1
Affects the v1 API
area/CLI
Affects the command line
area/datastore
Affects the storage system
area/dispatch
Affects dispatching of requests
area/tooling
Affects the dev or user toolchain (e.g. tests, ci, build tools)
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Based on #1905 with extra optimizations I found that have improved performance
There are several optimizations here, mostly around reducing allocations and GC pressure.
changes to
default_query_exec_mode
This is a long-standing issue with SpiceDB and prepared statements, and the changes here should address support for PG proxies. As part of the profiling and performance testing I did locally, I observed compelling reasons to move to
exec
mode. We still need to observe the impact in a real cluster since the protocol is now likely chattier (it's "text" based mode instead of "binary") but remain confident the overhead in the protocol is amortized by the removal of a full roundtrip and the computation of a custom query plan.Moving
default_query_exec_mode=exec
showed better LR2 throughput and also reduced allocations. I took the changes from 45c300d (hi @bradengroom 👋) and adjusted them so the default mode is nowexec
which disables prepared plans, which have shown to be counterproductive with SpiceDB.SpiceDB has been ignoring generic plans for a long time now. The hypothesis is that since SpiceDB queries have so much argument variability the generic plans become suboptimal leading to slow queries. Consequently, we switched to
force_custom_plan
, meaning a new query plan is created for each query. This wasn't the most efficient choice but it was the available workaround at the time as it could be triggered via URI parameters, and then it was made the default behavior in the code.From the PG docs:
exec
query exec mode does not use the binary protocol, so I had to add support fortext
support for some of the arguments, specificallyxid8
and themap[string]any
used for the context column which is mapped to JSONB. Without query plans and with the text mode, there is heavy reliance by pgx over reflection and custom interfaces. I madexid8
implement a pgx interface that bypasses reflection, andmap[string]any
didn't need a decoder since a JSONB decoder for maps already exists, I just had to register the mapping.I'm debating if this should be the new default or if we should be more conservative about it. The current configuration is very inefficient and I can't think of reasons this wouldn't be universally beneficial.
Reducing allocations when generating queries
An important reduction in allocations (10% reduction!) which in turn also has a dramatic impact in throughput happens in FilterToResourceIDs and FilterWithSubjectsSelectors. Both concatenated strings to create schema filterers used by LR code. This was extremely inefficient, and by switching to a
strings.Builder
an important reduction in allocations on a LR-based workload is observedI also reduced the allocations in other parts of the code that showed up in the profiles and weren't as impactful but seemed like low-hanging fruit (~ 1% reduction).
changes to chunk size
We also observed improvements for certain workloads with wide relations when the dispatch chunk size was changed. Right now I just hardcoded an increase to experiment with it, but the idea is to turn it into a flag. This is still TODO.
The changes to chunk size helps a ton, so I'll turn it into a configurable flag instead of the hardcoded value of
100