Add support for testing with minimum supported Nvidia Drivers to release validations #5434

atalman · 2024-07-16T13:11:53Z

To avoid issues like these: pytorch/pytorch#130684
I would like to add support for testing with minimum supported Nvidia Drivers to release validations

Add nvidia-driver parameter to linux_job.yml:
https://github.com/pytorch/test-infra/blob/main/.github/workflows/linux_job.yml
Make sure we pass this parameter to:
https://github.com/pytorch/test-infra/blob/main/.github/actions/setup-nvidia/action.yml
Add option to validate release binaries to https://github.com/pytorch/builder/actions/workflows/validate-binaries.yml

malfet · 2024-07-16T14:00:49Z

An easier way to accomplish that would probably be having a different AMI with the driver we want..

atalman · 2024-07-16T14:03:27Z

@malfet not sure. Building AMI and managing it quite a big headache. This should be straight forward .

setup-nvidia action already supports driver-version as a parameter:
https://github.com/pytorch/test-infra/blob/main/.github/actions/setup-nvidia/action.yml#L6

Hence all we have to do is to pass it.

ptrblck · 2024-07-16T15:56:12Z

Adding more driver tests sounds generally like a valid idea.
However, I don't think pytorch/pytorch#130684 is the best motivation for it, as it's unclear to me if we even support PyTorch + CUDA 11.8 + Triton, see: pytorch/pytorch#106144 (comment)

From the linked issue:

Yes, triton always uses cuda-12

It's somewhat hard to test something like that in CI, as runners are provisioned with the latest kernel driver in order be usable with both CUDA-12 and CUDA-11.8. Also, older driver is less stable, so we run into a multiple hangs/segfaults that were mitigated by installing newer driver.

I would not want to add the risk of using older drivers (unless these are additional tests) to test a potentially invalid or unsupported PyTorch + Triton combination or is Triton now fully supported in our CUDA 11.8 builds?

atalman mentioned this issue Jul 18, 2024

Add nvidia-driver to linux_job workflow #5435

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for testing with minimum supported Nvidia Drivers to release validations #5434

Add support for testing with minimum supported Nvidia Drivers to release validations #5434

atalman commented Jul 16, 2024

malfet commented Jul 16, 2024

atalman commented Jul 16, 2024

ptrblck commented Jul 16, 2024

Add support for testing with minimum supported Nvidia Drivers to release validations #5434

Add support for testing with minimum supported Nvidia Drivers to release validations #5434

Comments

atalman commented Jul 16, 2024

malfet commented Jul 16, 2024

atalman commented Jul 16, 2024

ptrblck commented Jul 16, 2024