Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][CUDA][HIP] Remove CUDA and HIP PI unit tests #12459

Merged
merged 1 commit into from
Feb 5, 2024

Conversation

npmiller
Copy link
Contributor

@npmiller npmiller commented Jan 22, 2024

These tests are not currently running and are covered in other test
suites:

After this both the CUDA and HIP directories could be removed. There are two PI tests remaining, one with regards to xpti handling of PI call arguments, and one regarding OpenCL interop ownership.

@npmiller npmiller changed the title [SYCL] Remove PI unit tests [SYCL][CUDA][HIP] Remove CUDA and HIP PI unit tests Jan 24, 2024
@npmiller npmiller marked this pull request as ready for review January 24, 2024 18:10
@npmiller npmiller requested a review from a team as a code owner January 24, 2024 18:10
These tests are not currently running and are covered in other test
suites:

* `test_primary_context.cpp`
  * Deprecated feature, covered in `test-e2e/Basic/context.cpp`
* `test_commands.cpp`
  * Covered by UR CTS
* `test_sampler_properties.cpp`
  * Covered by UR CTS: https://github.com/oneapi-src/unified-runtime/tree/main/test/conformance/sampler
* `PlatformTest.cpp`
  * Covered by UR CTS: https://github.com/oneapi-src/unified-runtime/blob/main/test/conformance/platform/urPlatformGetInfo.cpp
* `test_device.cpp`
  * Covered by UR CTS: https://github.com/oneapi-src/unified-runtime/blob/main/test/conformance/device/urDeviceGetInfo.cpp
* `EnqueueMemTest.cpp`
  * Covered by UR CTS: https://github.com/oneapi-src/unified-runtime/blob/main/test/conformance/enqueue/urEnqueueMemBufferFill.cpp
* `test_mem_obj.cpp`
  * Moved to UR CTS
* `test_contexts.cpp`
  * https://github.com/oneapi-src/unified-runtime/blob/main/test/adapters/cuda/context_tests.cpp
* `test_kernels.cpp`
  * https://github.com/oneapi-src/unified-runtime/blob/main/test/adapters/cuda/kernel_tests.cpp
* `test_base_objects.cpp`
  * Basic tests mostly covered in UR
* `test_interop_get_native.cpp`
  * Mostly covered in UR tests and E2E tests
@npmiller
Copy link
Contributor Author

npmiller commented Feb 5, 2024

ping @intel/llvm-reviewers-runtime

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay! Seems reasonable, given the UR replacements.

@steffenlarsen steffenlarsen merged commit b781e6c into intel:sycl Feb 5, 2024
12 checks passed
@aelovikov-intel
Copy link
Contributor

aelovikov-intel commented Feb 5, 2024

Post commit failures on Arc GPU:

Failed Tests (2):
  SYCL :: OneapiDeviceSelector/level_zero_top.cpp
  SYCL :: Plugin/level_zero_ext_intel_cslice.cpp
FAIL: SYCL :: OneapiDeviceSelector/level_zero_top.cpp (1421 of 1868)
******************** TEST 'SYCL :: OneapiDeviceSelector/level_zero_top.cpp' FAILED ********************
Exit Code: -6

Command Output (stdout):
--
# RUN: at line 2
/__w/llvm/llvm/toolchain/bin//clang++   -fsycl -fsycl-targets=spir64 /__w/llvm/llvm/llvm/sycl/test-e2e/OneapiDeviceSelector/level_zero_top.cpp -o /__w/llvm/llvm/build-e2e/OneapiDeviceSelector/Output/level_zero_top.cpp.tmp.out
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -fsycl -fsycl-targets=spir64 /__w/llvm/llvm/llvm/sycl/test-e2e/OneapiDeviceSelector/level_zero_top.cpp -o /__w/llvm/llvm/build-e2e/OneapiDeviceSelector/Output/level_zero_top.cpp.tmp.out
# note: command had no output on stdout or stderr
# RUN: at line 3
env ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/OneapiDeviceSelector/Output/level_zero_top.cpp.tmp.out
# executed command: env ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/OneapiDeviceSelector/Output/level_zero_top.cpp.tmp.out
# .---command stdout------------
# | Level-Zero GPU Device is found: true
# | Intel(R) Level-Zero is found: true
# | Expectedly, cpu device is not found.
# | Expectedly, ACC device is not found.
# | Abort was called at 253 line in file:
# | ../../neo/level_zero/core/source/builtin/builtin_functions_lib_impl.cpp
# `-----------------------------
# error: command failed with exit status: -6
FAIL: SYCL :: Plugin/level_zero_ext_intel_cslice.cpp (1457 of 1868)
******************** TEST 'SYCL :: Plugin/level_zero_ext_intel_cslice.cpp' FAILED ********************
Exit Code: -6

Command Output (stdout):
--
# RUN: at line 4
/__w/llvm/llvm/toolchain/bin//clang++   -fsycl -fsycl-targets=spir64 /__w/llvm/llvm/llvm/sycl/test-e2e/Plugin/level_zero_ext_intel_cslice.cpp -o /__w/llvm/llvm/build-e2e/Plugin/Output/level_zero_ext_intel_cslice.cpp.tmp.out
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -fsycl -fsycl-targets=spir64 /__w/llvm/llvm/llvm/sycl/test-e2e/Plugin/level_zero_ext_intel_cslice.cpp -o /__w/llvm/llvm/build-e2e/Plugin/Output/level_zero_ext_intel_cslice.cpp.tmp.out
# note: command had no output on stdout or stderr
# RUN: at line 6
env ZEX_NUMBER_OF_CCS=0:4 UR_L0_DEBUG=1 env ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/Plugin/Output/level_zero_ext_intel_cslice.cpp.tmp.out 2>&1 | /__w/llvm/llvm/toolchain/bin/FileCheck /__w/llvm/llvm/llvm/sycl/test-e2e/Plugin/level_zero_ext_intel_cslice.cpp --check-prefixes=CHECK-PVC
# executed command: env ZEX_NUMBER_OF_CCS=0:4 UR_L0_DEBUG=1 env ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Plugin/Output/level_zero_ext_intel_cslice.cpp.tmp.out
# note: command had no output on stdout or stderr
# error: command failed with exit status: -6
# executed command: /__w/llvm/llvm/toolchain/bin/FileCheck /__w/llvm/llvm/llvm/sycl/test-e2e/Plugin/level_zero_ext_intel_cslice.cpp --check-prefixes=CHECK-PVC
# note: command had no output on stdout or stderr

@npmiller
Copy link
Contributor Author

npmiller commented Feb 5, 2024

I don't see how this patch could have caused these failures, could it be something on the machine or from a previous patch?

This just removed old tests from a separate test suite, that were already disabled and not running.

@aelovikov-intel
Copy link
Contributor

That wasn't a call to action. I'm just posting the failures (and encourage others in @intel/llvm-gatekeepers do the same) so that failures are searchable through Github interface (it can't look into the logs). That way, we can get some statistics on how flaky the test is, what configuration it could fail on, etc.

Ultimately, once a search for a failing test provides several instances, we should be creating an issue/internal bug report and disabling the test until that is resolved.

@bader
Copy link
Contributor

bader commented Feb 5, 2024

It sounds like something better be automated rather than requested from gatekeepers.
@stdale-intel, FYI.

@aelovikov-intel
Copy link
Contributor

It sounds like something better be automated rather than requested from gatekeepers. @stdale-intel, FYI.

While I agree that some automation would be helpful, it's still gatekeeper's responsibilities to explain every failure in the post-commit or request the PR author to do so. We cannot be simply ignoring all the flaky failures in our CI.

@ldrumm
Copy link
Contributor

ldrumm commented Feb 6, 2024

I'm just posting the failures (and encourage others in @intel/llvm-gatekeepers do the same) so that failures are searchable through Github interface

How does this make failures searchable? Are you using some consistent language in your comment?

It's still gatekeeper's responsibilities to explain every failure in the post-commit or request the PR author to do so.

Do we really want to divert the attention of all users in the failure blame list even if it's obvious which commits are not responsible? Seems like pointless busywork for the author and the gatekeeper. If we have a buildbot commenting on every PR in the blamelist, then fine - since the machine can't make that distinction - but if a human is in the loop, then surely they should use their judgment; if the human is not allowed to use their judgment they should be replaced by a machine in order to not waste human time.

@aelovikov-intel
Copy link
Contributor

How does this make failures searchable? Are you using some consistent language in your comment?

Copy-paste the failing test and then search for it using github's repo search:

https://github.com/search?q=repo%3Aintel%2Fllvm+Plugin%2Flevel_zero_ext_intel_cslice.cpp&type=pullrequests

image

Do we really want to divert the attention of all users

I usually tag people if I expect some action from them, so I don't think we divert attention that much.

We do have lots of flaky tests though, and we need to do something about that. The first step there is to gather some statistics and that's the best we can do for now. If somebody is willing to write scripts to parse the logs, update some spreadsheet/database with those flaky fails, and maintain that process I'd be more than happy to switch to it, but nobody volunteered so far.

Yes, it's gatekeepers' responsibility, because we can't place that burden on occasional contributors, but we have to have somebody looking into the issues.

@sommerlukas
Copy link
Contributor

An alternative would be to create issues for each flaky test (or a number of related flaky tests) and post PRs for which the test failed in post-commit without being related to changes in the PR in that issue.

That way, the information for a flaky test would be collected in a single location (the issue) and the PR author's attention would not be diverted.

@aelovikov-intel
Copy link
Contributor

aelovikov-intel commented Feb 6, 2024

That would only work if you already know the issue number. In my experience, we couldn't even make people post a comment without searches, expecting them to find an issue first in unrealistic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants