Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a float atomics sample #125

Merged
merged 10 commits into from
Oct 3, 2024
Merged

add a float atomics sample #125

merged 10 commits into from
Oct 3, 2024

Conversation

bashbaug
Copy link
Owner

@bashbaug bashbaug commented Sep 28, 2024

Adds a sample demonstrating how to use floating-point atomics in an OpenCL kernel.

@bashbaug bashbaug marked this pull request as draft September 28, 2024 00:08
@bashbaug
Copy link
Owner Author

bashbaug commented Sep 28, 2024

This is currently a "draft" because the emulated floating-point atomic add that returns a value is not producing correct intermediate results. The final results of all floating-point atomic adds is correct, though, and the intermediate results using cl_ext_float_atomics and other device-specific solutions is working.

To see the issue with intermediate results, pass the -c option to add "intermediate results checking", and perhaps the -e option to force the emulated atomic add codepath. Example:

$ ./floatatomics -p3 -c -e
Running on platform: Intel(R) OpenCL Graphics
Running on device: Intel(R) Arc(TM) A750 Graphics
Forcing emulation.
Finished in 0.007057 seconds
Basic Validation: Success.
Error at index 0: expected 0.000000, got 1.000000!
Error at index 1: expected 1.000000 > 1.000000!
<snip>
Intermediate Results Validation: Found 64797 mismatches / 65536 values!!!

@bashbaug bashbaug marked this pull request as ready for review October 3, 2024 04:48
@bashbaug
Copy link
Owner Author

bashbaug commented Oct 3, 2024

After some discussion, I confirmed that the emulated float atomic implementation that uses atomic_xchg does not reliably return the previous value in memory ("intermediate results"). So, I've added a slower (but safer, and correct) emulated version that does reliably return the previous value in memory. The faster version is still the default version chosen by passing -e, but now the slower version can be selected by passing -s.

The non-emulated versions are still preferred and are chosen by default, for supporting devices.

@bashbaug bashbaug merged commit 3f487d8 into main Oct 3, 2024
10 of 12 checks passed
@bashbaug bashbaug deleted the float-atomics branch October 3, 2024 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant