-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Middle Class? #2161
Comments
Hey @EugenHotaj - glad you're checking out torchtune. Up til now, we've managed to provide pretty extensive offerings including long-context, large models up to 405B, and RLHF all on single node. This has allowed people will smaller GPU budgets to fine-tune some pretty incredible models and develop new features faster b/c single node is much easier to debug. Now, all that said, torchtune technically already supports multi-node for FSDP. And we plan on adding tensor parallel + model parallel very soon. The absolute latest we will have these features in torchtune is end of January, but I would bet on sooner! Would you need anything beyond these parallelism techniques, e.g. pipeline parallel? Are you running on MAST or something like SLURM? |
Thanks @joecummings that's awesome to hear!
Yes we use SLURM -- I'm currently trying to hack a multi-node run from your suggestions on #2018 and torchtitan, so having some examples in torchtune would be super useful imo. We'd also take all the parallelisms we can get 😃, e.g. model, pipeline, and attention parallelism for longer context. |
I second SLURM! I have also been trying to hack this into torchtune since the single-node experience is quite good. |
Thanks folks for the interest! Us torchtune devs are evidently not in the GPU middle class yet 😅 and I think only @joecummings has access to a multi-node setup as of today. I know he is working on testing this out, but until then @EugenHotaj we would love to include any SLURM scripts you're able to put together as part of our documentation. |
@ebsmothers the torchtitan SLRUM file worked pretty much out of the box for us since we have a similar cluster setup (p5s on aws). I was able to run Llama 3.3 70B full finetuning on 16 nodes with no issues 😄 . |
@EugenHotaj Thanks for the tip. Did you use something like https://github.com/pytorch/torchtune/blob/main/recipes/full_finetune_distributed.py as the entry point to replace "./train.py" in line 63 ? |
@tginart right you have to replace that
|
Does torchtune have any plans to support "GPU middle class" users?
We're trying to evaluate using torchtune for post-training, especially since there are many useful features implemented (RLHF, LORA, etc). However, one big sticking point is that the system seems heavily geared towards single-node training. Are there plans to support multi-node training (e.g. 16-64 nodes) and things like model parallelism, 128k context training, etc?
If not, is torchtitan the recommended system to use?
Thanks!
The text was updated successfully, but these errors were encountered: