Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Distributed Checkpoint Tutorial #3068

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

LTMeyer
Copy link

@LTMeyer LTMeyer commented Sep 30, 2024

Fixes #3067

Description

Update the code provided in the tutorial.

  • Fix undefined variables model and optimizer;
  • Fix missing import of Stateful;
  • Fix the state_dict variable for correct loading.

Checklist

  • The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
  • Only one issue is addressed in this pull request
  • Labels from the issue that this PR is fixing are added to this pull request
  • No unnecessary issues are included into this pull request.

cc @wconstab @osalpekar @H-Huang @kwen2501

Copy link

pytorch-bot bot commented Sep 30, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3068

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 1f1aad6 with merge base be7f1b3 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] - Distributed Checkpoint Code Example Is Failing
3 participants