-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add testing for CL_KERNEL_LOCAL_MEM_SIZE #1235 #2089
base: main
Are you sure you want to change the base?
Add testing for CL_KERNEL_LOCAL_MEM_SIZE #1235 #2089
Conversation
4e50673
to
301a77f
Compare
|
||
size_t elements = 100; | ||
size_t sizeToAllocate = elements * sizeof(cl_int); | ||
int* localData = (cl_int*)malloc(sizeToAllocate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a smart pointer here rather than malloc (e.g. make_unique
) so that if the function returns early before the free(localData)
we don't leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
localData
is unused. We may as well remove it entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: it seems odd to put this test into test_api_consistency.cpp
, since it has nothing to do with API consistency checks. I'd recommend putting it into its own new file test_kernel_local_memory_size.cpp
, similar to the existing test_kernel_private_memory_size.cpp
.
@@ -25,6 +25,34 @@ __kernel void test(__global int* dst) { | |||
} | |||
)CLC"; | |||
|
|||
const char* empty_kernel[] = { "__kernel void empty_kernel()\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Can we use C++ raw string literals here? It will make these kernels much easier to read and modify. Example: see test_kernel
, above.
"__local float array[10000];\n" | ||
"for(int i = 0; i<10000; i++)\n" | ||
" array[i]*=2;\n" | ||
"}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several observations about this kernel:
- It's declaring an array of 10000 floats, so 40000 bytes. This exceeds the minimum required value for CL_DEVICE_LOCAL_MEM_SIZE, even for the full profile. Unless there's really a reason to need this large of a local array, we should choose something smaller, ideally something that will work with the embedded profile also (so, <1KB).
- The array is never properly initialized, and the values that are written to it are never used to modify global memory or anything else that is externally observable. A smart compiler may be able to remove it entirely, defeating the purpose of the test. Can we do something with the local array contents so the local array cannot be removed? One possibility: every work-item writes something to one part of the array, then one work-item sums the contents of the array and writes the sum to global memory.
- There is a data race, since multiple work-items are writing to the same memory location. This probably won't matter in practice since the kernel is never executed, but it'd be nice to avoid it if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bashbaug My NVIDIA driver gave me wrong results than eg. POCL which which added optimizations (?) ant result is 0 memory as you said. I will add correction.
if (create_single_kernel_helper(context, &program, &kernel, 1, empty_kernel, | ||
"empty_kernel") | ||
!= 0) | ||
{ | ||
return -1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really ought to return TEST_FAIL
here rather than the magic number -1
. Or, even better, something like:
if (create_single_kernel_helper(context, &program, &kernel, 1, empty_kernel, | |
"empty_kernel") | |
!= 0) | |
{ | |
return -1; | |
} | |
error = create_single_kernel_helper(context, &program, &kernel, 1, empty_kernel, | |
"empty_kernel"); | |
test_error(error, "Unable to create empty_kernel"); |
test_error(error, | ||
"clGetKernelWorkGroupInfo for CL_KERNEL_LOCAL_MEM_SIZE failed"); | ||
|
||
test_assert_error(kernelLocalUsage > 0, "kernel local mem size failed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what we could be checking here, but requiring kernelLocalUsage > 0
is almost certainly incorrect. The empty kernel does not have any local memory, so kernelLocalUsage == 0
is a valid result. I don't think we can require this, though, because some devices (in theory, I'm not aware of any devices that do this in practice) could require some amount of local memory unconditionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bashbaug On my NVIDIA card kernelLocalUsage is equal to 1. Checked now using POCL and it is 0. So as this is empty kernel with no __local args then it should be 0. If I change this condition to kernelLocalUsage == 0 then test will be failing. What do you prefer: remove this test or move it to separated test which will be failing on my device, on other maybe not.
@@ -161,6 +161,7 @@ test_definition test_list[] = { | |||
ADD_TEST_VERSION(negative_create_command_queue_with_properties, | |||
Version(2, 0)), | |||
ADD_TEST(negative_create_command_queue_with_properties_khr), | |||
ADD_TEST(kernel_local_mem_size), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: could we call this test "kernel_local_memory_size" instead, to be consistent with the existing "kernel_private_memory_size" test?
#1235