bandaid for run-readme-pr-macos.yml incorrectly loading to MPS #1417

mikekgfb · 2024-12-11T17:32:35Z

as per #1416 torchchat on hosts without MPS (which is all github hosts which use kvm to virtualize MacOS, but not MPS) should choose CPU as "fast" device. The logic is present (see discussion in #1416 ), but either not fully functional (that would be the easier one to fix, just print the result of get_device_str and fix the code!) or specifically ignored on load in torch/serialization.py (If this is the case, we're effectively looking at a core PyTorch issue....)

In the meantime, this bandaid just forces the use of CPU on MacOS tests, to make MacOS tests run on CPU -- labeit hsortcircuiting test/execution of the "fast" device logic. Not ideal, but some testing beats no testing.

as per pytorch#1416 torchchat on hosts without MPS (which is all github hosts which use kvm to virtualize MacOS, but not MPS) should choose CPU as "fast" device. The logic is present (see discussion in pytorch#1416 ), but either not fully functional (that would be the easier one to fix, just print the result of get_device_str and fix the code!) or specifically ignored on load in torch/serialization.py (If this is the case, we're effectively looking at a core PyTorch issue....) In the meantime, this bandaid just forces the use of CPU on MacOS tests, to make MacOS tests run on CPU -- labeit hsortcircuiting test/execution of the "fast" device logic. Not ideal, but some testing beats no testing.

pytorch-bot · 2024-12-11T17:32:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1417

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 5130bd4 with merge base 68b8087 ():

NEW FAILURE - The following job has failed:

Run the aoti runner with CUDA using stories / test-runner-aot-cuda / linux-job (gh)
RuntimeError: Command docker exec -t 0cdeeadd061114f0f7ef4a691ecbb10519702e2ecb37490dcb6ecae85ff1422e /exec failed with exit code 1

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / runner-aoti (16-core-ubuntu) (gh) (trunk failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mikekgfb · 2024-12-11T17:34:00Z

PS: This fix should make #1315 not trigger in this case (because tests should succeed, so we don't run into quiet fail), but doesn't do anything to actually address the #1315 issue.

Add informational message to MacOS CPU tests

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 11, 2024

mikekgfb and others added 4 commits December 11, 2024 09:45

Update run-readme-pr-macos.yml

63f8307

Add informational message to MacOS CPU tests

Merge branch 'main' into patch-27

21b1004

Merge branch 'pytorch:main' into patch-27

0fe8a21

Merge branch 'main' into patch-27

5130bd4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bandaid for run-readme-pr-macos.yml incorrectly loading to MPS #1417

bandaid for run-readme-pr-macos.yml incorrectly loading to MPS #1417

mikekgfb commented Dec 11, 2024

pytorch-bot bot commented Dec 11, 2024 •

edited

Loading

mikekgfb commented Dec 11, 2024

bandaid for run-readme-pr-macos.yml incorrectly loading to MPS #1417

Are you sure you want to change the base?

bandaid for run-readme-pr-macos.yml incorrectly loading to MPS #1417

Conversation

mikekgfb commented Dec 11, 2024

pytorch-bot bot commented Dec 11, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1417

❌ 1 New Failure, 1 Unrelated Failure

mikekgfb commented Dec 11, 2024

pytorch-bot bot commented Dec 11, 2024 •

edited

Loading