Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add testing for CL_KERNEL_LOCAL_MEM_SIZE #1235 #2089

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kamil-goras-mobica
Copy link
Contributor


size_t elements = 100;
size_t sizeToAllocate = elements * sizeof(cl_int);
int* localData = (cl_int*)malloc(sizeToAllocate);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a smart pointer here rather than malloc (e.g. make_unique) so that if the function returns early before the free(localData) we don't leak.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

localData is unused. We may as well remove it entirely.

Copy link
Contributor

@bashbaug bashbaug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: it seems odd to put this test into test_api_consistency.cpp, since it has nothing to do with API consistency checks. I'd recommend putting it into its own new file test_kernel_local_memory_size.cpp, similar to the existing test_kernel_private_memory_size.cpp.

@@ -25,6 +25,34 @@ __kernel void test(__global int* dst) {
}
)CLC";

const char* empty_kernel[] = { "__kernel void empty_kernel()\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Can we use C++ raw string literals here? It will make these kernels much easier to read and modify. Example: see test_kernel, above.

Comment on lines +35 to +38
"__local float array[10000];\n"
"for(int i = 0; i<10000; i++)\n"
" array[i]*=2;\n"
"}\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several observations about this kernel:

  1. It's declaring an array of 10000 floats, so 40000 bytes. This exceeds the minimum required value for CL_DEVICE_LOCAL_MEM_SIZE, even for the full profile. Unless there's really a reason to need this large of a local array, we should choose something smaller, ideally something that will work with the embedded profile also (so, <1KB).
  2. The array is never properly initialized, and the values that are written to it are never used to modify global memory or anything else that is externally observable. A smart compiler may be able to remove it entirely, defeating the purpose of the test. Can we do something with the local array contents so the local array cannot be removed? One possibility: every work-item writes something to one part of the array, then one work-item sums the contents of the array and writes the sum to global memory.
  3. There is a data race, since multiple work-items are writing to the same memory location. This probably won't matter in practice since the kernel is never executed, but it'd be nice to avoid it if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bashbaug My NVIDIA driver gave me wrong results than eg. POCL which which added optimizations (?) ant result is 0 memory as you said. I will add correction.

Comment on lines +1191 to +1196
if (create_single_kernel_helper(context, &program, &kernel, 1, empty_kernel,
"empty_kernel")
!= 0)
{
return -1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really ought to return TEST_FAIL here rather than the magic number -1. Or, even better, something like:

Suggested change
if (create_single_kernel_helper(context, &program, &kernel, 1, empty_kernel,
"empty_kernel")
!= 0)
{
return -1;
}
error = create_single_kernel_helper(context, &program, &kernel, 1, empty_kernel,
"empty_kernel");
test_error(error, "Unable to create empty_kernel");

test_error(error,
"clGetKernelWorkGroupInfo for CL_KERNEL_LOCAL_MEM_SIZE failed");

test_assert_error(kernelLocalUsage > 0, "kernel local mem size failed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what we could be checking here, but requiring kernelLocalUsage > 0 is almost certainly incorrect. The empty kernel does not have any local memory, so kernelLocalUsage == 0 is a valid result. I don't think we can require this, though, because some devices (in theory, I'm not aware of any devices that do this in practice) could require some amount of local memory unconditionally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bashbaug On my NVIDIA card kernelLocalUsage is equal to 1. Checked now using POCL and it is 0. So as this is empty kernel with no __local args then it should be 0. If I change this condition to kernelLocalUsage == 0 then test will be failing. What do you prefer: remove this test or move it to separated test which will be failing on my device, on other maybe not.

@@ -161,6 +161,7 @@ test_definition test_list[] = {
ADD_TEST_VERSION(negative_create_command_queue_with_properties,
Version(2, 0)),
ADD_TEST(negative_create_command_queue_with_properties_khr),
ADD_TEST(kernel_local_mem_size),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: could we call this test "kernel_local_memory_size" instead, to be consistent with the existing "kernel_private_memory_size" test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants