Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline Parallelism + FSDP #562

Open
jeromeku opened this issue Aug 29, 2024 · 1 comment
Open

Pipeline Parallelism + FSDP #562

jeromeku opened this issue Aug 29, 2024 · 1 comment
Labels
question Further information is requested

Comments

@jeromeku
Copy link

On PP + FSDP and PP + TP + FSDP:

  • Is there any documentation on how these different parallelisms compose?
  • What are the largest training runs these strategies have been tested on?
  • Are there benchmarks for how these strategies compare against other distributed training frameworks that expose similar parallelisms?

Particularly interested in how PP + FSDP work together as it seems DeepSpeed explicitly disallows ZeRO 2/3 + PP (see here specifically, and here for discussion).

@wconstab @weifengpy @wanchaol

@wconstab
Copy link
Contributor

specifically for Zero3+PP, we haven't published guides or anything, but we are working on it. You can compose them, just have to be aware of scheduling the unshard/reshard well with respect to peak memory. We don't have an out of the box support for this yet but are planning to offer that.
cc @H-Huang @donglimm about Zero3+PP and other PP schedule questions.

We do have a guide showing how to compose FSDP+PP without zero-3 in torchtitan-
https://github.com/pytorch/torchtitan/blob/main/torchtitan/parallelisms/parallelize_llama.py

For benchmarks, we do not have large benchmarks due to resource constraints but we are preparing 64-gpu benchmarks for torchtitan. We are happy to collaborate on larger benchmarks if you have resources to run them and want help with any optimization opportunities.

@tianyu-l tianyu-l added the question Further information is requested label Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants