Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimizer CPU offload doesn't work outside of CUDA #958

Open
bghira opened this issue Sep 26, 2024 · 3 comments
Open

optimizer CPU offload doesn't work outside of CUDA #958

bghira opened this issue Sep 26, 2024 · 3 comments

Comments

@bghira
Copy link

bghira commented Sep 26, 2024

The CPUOptimizerOffload class is very clever, but overly relies on CUDA Streams, which aren't available w/o a CUDA device.

should use torch.cpu.Stream and torch.cpu.current_stream instead.

additionally, pin_memory=True if torch.cuda.is_available() else False as MPS is a unified mem arch.

@gau-nernst
Copy link
Collaborator

Yes, CPU offload optimizer only works on CUDA. There is no point to support other devices. Perhaps we should make it clearer 🤔

  • If you train with CPU, well, you don't need CPU offload
  • If you train with MPS, since it is unified memory, again, you don't need CPU offload.

I'm not too familiar with other hardware, so not sure how this works with AMD and Intel GPUs.

@bghira
Copy link
Author

bghira commented Sep 26, 2024

pytorch isn't taking much advantage of MPS in a unified memory way, and, i agree, there isn't a whole lot of point to it on MPS other than to be able to run the same code and ensure a consistent experience.

for ROCm, i'm almost certain it just masquerades as CUDA and is invisible. the Intel and other systems like Ascend NPU rely on XPU or NPU extensions in pytorch, and TPUs require XLA.

@msaroufim
Copy link
Member

Yeah agree with @gau-nernst I'd just maybe prioritize sanity checking whether the offloader works on an AMD GPU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants