Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50522][SQL] Support for indeterminate collation #49103

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

stefankandic
Copy link
Contributor

@stefankandic stefankandic commented Dec 7, 2024

What changes were proposed in this pull request?

This pull request updates how we handle collation mismatches during concat, concat_ws, and collation expressions. Currently, Spark throws an error for any collation mismatch. This change modifies that behavior by allowing these functions to proceed and return an indeterminate collation instead of failing.

However, we should never serialize any data with indeterminate collation, but we can show it back to the user, create views on top of it etc.

If more string functions are identified that don’t depend on collation, they can be easily added to the canContainIndeterminateCollationmethod and everything should work from that point on.

Why are the changes needed?

Throwing errors for all collation mismatches can break queries unnecessarily, especially for functions that don’t rely on collation (like concat). These functions combine strings without needing ordering rules, making collation enforcement unnecessary.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Dec 7, 2024
@stefankandic stefankandic changed the title [DRAFT][SQL] Support for indeterminate collation [SPARK-SPARK-50522][SQL] Support for indeterminate collation Dec 9, 2024
@stefankandic stefankandic changed the title [SPARK-SPARK-50522][SQL] Support for indeterminate collation [SPARK-50522][SQL] Support for indeterminate collation Dec 9, 2024
@stefankandic
Copy link
Contributor Author

@dejankrak-db @stevomitric please take a look, thanks!

@stefankandic stefankandic marked this pull request as ready for review December 25, 2024 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant