Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LookupResources2 follow-ups #1994

Merged
merged 5 commits into from
Jul 24, 2024
Merged

LookupResources2 follow-ups #1994

merged 5 commits into from
Jul 24, 2024

Conversation

vroldanbet
Copy link
Contributor

@vroldanbet vroldanbet commented Jul 24, 2024

Follow up to #1905
Closes #1217

There are several optimizations here, mostly around reducing allocations and GC pressure, and addresses the situation with pgx exec mode, making LR2 faster and learner

changes to default_query_exec_mode

This is a long-standing issue with SpiceDB and prepared statements, and the changes here should address support for PG proxies. As part of the profiling and performance testing I did locally, I observed compelling reasons to move to exec mode. We still need to observe the impact in a real cluster since the protocol is now likely chattier (it's "text" based mode instead of "binary") but remain confident the overhead in the protocol is amortized by the removal of a full roundtrip and the computation of a custom query plan.

Moving default_query_exec_mode=exec showed better LR2 throughput and also reduced allocations. I took the changes from 45c300d and adjusted them so the default mode is now exec which disables prepared plans, which have shown to be counterproductive with SpiceDB.

SpiceDB has been ignoring generic plans for a long time now. The hypothesis is that since SpiceDB queries have so much argument variability the generic plans become suboptimal leading to slow queries. Consequently, we switched to force_custom_plan, meaning a new query plan is created for each query. This wasn't the most efficient choice but it was the available workaround at the time as it could be triggered via URI parameters, and then it was made the default behavior in the code.

From the PG docs:

Prepared statements (either explicitly prepared or implicitly generated, for example by PL/pgSQL) can be executed using custom or generic plans. Custom plans are made afresh for each execution using its specific set of parameter values, while generic plans do not rely on the parameter values and can be re-used across executions. Thus, use of a generic plan saves planning time, but if the ideal plan depends strongly on the parameter values then a generic plan may be inefficient. The choice between these options is normally made automatically, but it can be overridden with plan_cache_mode. The allowed values are auto (the default), force_custom_plan and force_generic_plan. This setting is considered when a cached plan is to be executed, not when it is prepared. For more information see PREPARE.

exec query exec mode does not use the binary protocol, so I had to add support for text support for some of the arguments, specifically xid8 and the map[string]any used for the context column which is mapped to JSONB. Without query plans and with the text mode, there is heavy reliance by pgx over reflection and custom interfaces. I made xid8 implement a pgx interface that bypasses reflection, and map[string]any didn't need a decoder since a JSONB decoder for maps already exists, I just had to register the mapping.

I'm debating if this should be the new default or if we should be more conservative about it. The current configuration is very inefficient and I can't think of reasons this wouldn't be universally beneficial.

Reducing allocations when generating queries

An important reduction in allocations (10% reduction!) which in turn also has a dramatic impact in throughput happens in FilterToResourceIDs and FilterWithSubjectsSelectors. Both concatenated strings to create schema filterers used by LR code. This was extremely inefficient, and by switching to a strings.Builder an important reduction in allocations on a LR-based workload is observed

I also reduced the allocations in other parts of the code that showed up in the profiles and weren't as impactful but seemed like low-hanging fruit (~ 1% reduction).

changes to chunk size

We also observed improvements for certain workloads with wide relations when the dispatch chunk size was changed. This introduces a new --dispatch-chunk-size flag that is also used for datastore query filtering. The default value continues to be 100.

vroldanbet and others added 4 commits July 24, 2024 17:48
uses string.Builder and strings.Repeat (which
also uses string.Builder) to reduce CPU and
allocations

showed up in generating 10% of allocations
and 5% of CPU on a LookupResources-based workload.

This reduces 66.25% allocations and 92% CPU
@github-actions github-actions bot added area/CLI Affects the command line area/api v1 Affects the v1 API area/datastore Affects the storage system area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) area/dispatch Affects dispatching of requests labels Jul 24, 2024
@vroldanbet vroldanbet force-pushed the lr2-follow-ups branch 3 times, most recently from 9955b4c to b10b802 Compare July 24, 2024 17:01
@vroldanbet vroldanbet self-assigned this Jul 24, 2024
@vroldanbet vroldanbet force-pushed the lr2-follow-ups branch 2 times, most recently from 57f5e98 to a12c965 Compare July 24, 2024 18:02
we've observed improvements on LR workloads when
increasing the dispatch chunk size
@vroldanbet vroldanbet marked this pull request as ready for review July 24, 2024 18:04
@vroldanbet vroldanbet requested a review from a team as a code owner July 24, 2024 18:04
internal/datastore/postgres/snapshot.go Show resolved Hide resolved
return nil, common.RedactAndLogSensitiveConnString(ctx, errUnableToInstantiate, err, pgURL)
}
// Setup the default query execution mode setting, if applicable.
pgConfig := DefaultQueryExecMode(parsedConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the plan_cache_mode (a server setting that tells pg how to plan) is orthogonal to the QueryExecMode (a client setting that tells pgx how to send queries to pg).

Are we sure that it's okay to remove force_custom_plan? I would think we'd need both, but maybe I am misunderstanding the full implications of exec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it is orthogonal. If we disable prepared statements, there is no longer a reason to change the plan cache mode. We also have incentives to get rid of it, as it will make SpiceDB work out of the box with pgbouncer (requires ignoring the flag) and RDS Proxy (downright does not work because it does not have an option to ignore flags).

See #1217 for more context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we disable prepared statements, there is no longer a reason to change the plan cache mode.

I was under the impression we were already not using statement caching, and were using the mode that just uses prepared statements for type information. But it looks like that isn't correct, and I agree with you that if we just don't use prepared statements at all, there's no reason to send force_custom_plan

@vroldanbet vroldanbet added this pull request to the merge queue Jul 24, 2024
Merged via the queue into main with commit 5d16bb1 Jul 24, 2024
22 checks passed
@vroldanbet vroldanbet deleted the lr2-follow-ups branch July 24, 2024 19:09
@github-actions github-actions bot locked and limited conversation to collaborators Jul 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/api v1 Affects the v1 API area/CLI Affects the command line area/datastore Affects the storage system area/dispatch Affects dispatching of requests area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support pgbouncer with SpiceDB Postgres datastore w/o session pooler mode
4 participants