You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To avoid issues like these: pytorch/pytorch#130684
I would like to add support for testing with minimum supported Nvidia Drivers to release validations
Adding more driver tests sounds generally like a valid idea.
However, I don't think pytorch/pytorch#130684 is the best motivation for it, as it's unclear to me if we even support PyTorch + CUDA 11.8 + Triton, see: pytorch/pytorch#106144 (comment)
From the linked issue:
Yes, triton always uses cuda-12
It's somewhat hard to test something like that in CI, as runners are provisioned with the latest kernel driver in order be usable with both CUDA-12 and CUDA-11.8. Also, older driver is less stable, so we run into a multiple hangs/segfaults that were mitigated by installing newer driver.
I would not want to add the risk of using older drivers (unless these are additional tests) to test a potentially invalid or unsupported PyTorch + Triton combination or is Triton now fully supported in our CUDA 11.8 builds?
To avoid issues like these: pytorch/pytorch#130684
I would like to add support for testing with minimum supported Nvidia Drivers to release validations
Add nvidia-driver parameter to linux_job.yml:
https://github.com/pytorch/test-infra/blob/main/.github/workflows/linux_job.yml
Make sure we pass this parameter to:
https://github.com/pytorch/test-infra/blob/main/.github/actions/setup-nvidia/action.yml
Add option to validate release binaries to https://github.com/pytorch/builder/actions/workflows/validate-binaries.yml
The text was updated successfully, but these errors were encountered: