Create distributed.md #1438

mikekgfb · 2024-12-22T22:58:25Z

Initial documentation for use of distributed inference w/ torchchat. @mreso @lessw2020 @kwen2501 please review and update as appropriate.

@mreso

Initial documentation for use of distributed inference w/ torchchat. @mreso please review and update as appropriate.

pytorch-bot · 2024-12-22T22:58:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1438

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Add support for extracting distributed inference tests in run-docs

kwen2501

Looks great! Thanks so much for writing it up!

mreso

Thanks @mikekgfb looks good, the response of the server is a bit different but thats true for the non distributed case too.

docs/distributed.md

mreso · 2024-12-23T20:51:38Z

docs/distributed.md

+<!--
+[skip default]: begin
+## Generate output (requires testing and review by mreso)


Suggested change

<!--

[skip default]: begin

## Generate output (requires testing and review by mreso)

<!--

[skip default]: begin

## Generate output

yes, that works

docs/distributed.md

Co-authored-by: Matthias Reso <[email protected]>

@mreso

Uncommenting section about generate subcommand w/ distributed inference after review by @mreso Also, Added HF login to make this fully self-contained

Wording

Wording and formatting

mikekgfb · 2024-12-24T20:20:20Z

@mreso @kwen2501 @lessw2020 do you have a perspective what list of models will run with distributed? I know we had a separate model dictionary at some point?

Or should we just sweep all the models from README.md and see which work? With this integration this will be pretty easy and cool to do. (yay!) Is there a minimum size for models to run distributed? I know below a certain size it doesn't make sense to run distributed inference, but just for kicks (or a test that's quick to download?)... can it run stories 15M on distributed?

Any attempts at 405B? Wer should def add some of the super large models to the models.Json so they're accessible for distributed inference?

Also, should we create a spare table here, or we could put a second column in the main readme for "distributed inference"? Next to "mobile" column?

Since I'm on a roll asking questions, I'll assume it works only on GPUs (only on CUDA?!), not CPU? Should we add that somewhere? Wanna suggest wording to this effect?

Create distributed.md

3be38cc

Initial documentation for use of distributed inference w/ torchchat. @mreso please review and update as appropriate.

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 22, 2024

Add support for extracting distributed inference tests in run-docs

8e2ca2d

Add support for extracting distributed inference tests in run-docs

Jack-Khuu requested review from kwen2501, mreso and lessw2020 December 23, 2024 01:11

Jack-Khuu added the Distributed Issues related to all things distributed label Dec 23, 2024

mikekgfb added 4 commits December 22, 2024 23:54

Merge branch 'pytorch:main' into patch-34

1ff7351

Update distributed.md

66dd025

Update distributed.md

8f4b312

Update distributed.md

f3ff014

kwen2501 approved these changes Dec 23, 2024

View reviewed changes

mreso approved these changes Dec 23, 2024

View reviewed changes

mikekgfb and others added 6 commits December 24, 2024 03:06

Update docs/distributed.md

3af0f52

Co-authored-by: Matthias Reso <[email protected]>

Update docs/distributed.md

b65f0e4

Co-authored-by: Matthias Reso <[email protected]>

Update distributed.md

f401a2f

Uncommenting section about generate subcommand w/ distributed inference after review by @mreso Also, Added HF login to make this fully self-contained

Update distributed.md

1bb3303

Wording

Update distributed.md

17e2764

Wording and formatting

Merge branch 'main' into patch-34

7aa18e1

Merge branch 'main' into patch-34

8f17a16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create distributed.md #1438

Create distributed.md #1438

mikekgfb commented Dec 22, 2024 •

edited

Loading

pytorch-bot bot commented Dec 22, 2024 •

edited

Loading

kwen2501 left a comment

mreso left a comment

mreso Dec 23, 2024

mreso Dec 23, 2024

mikekgfb commented Dec 24, 2024 •

edited

Loading

Create distributed.md #1438

Are you sure you want to change the base?

Create distributed.md #1438

Conversation

mikekgfb commented Dec 22, 2024 • edited Loading

pytorch-bot bot commented Dec 22, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1438

kwen2501 left a comment

Choose a reason for hiding this comment

mreso left a comment

Choose a reason for hiding this comment

mreso Dec 23, 2024

Choose a reason for hiding this comment

mreso Dec 23, 2024

Choose a reason for hiding this comment

mikekgfb commented Dec 24, 2024 • edited Loading

mikekgfb commented Dec 22, 2024 •

edited

Loading

pytorch-bot bot commented Dec 22, 2024 •

edited

Loading

mikekgfb commented Dec 24, 2024 •

edited

Loading