Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moved logits .float() to loss and compiled it if compiling #551

Open
wants to merge 2 commits into
base: gh/awgu/16/base
Choose a base branch
from

Conversation

awgu
Copy link
Contributor

@awgu awgu commented Aug 21, 2024

Stack from ghstack (oldest at bottom):

Compiling the loss improves performance. Moving the .float() upcast to inside this compiled loss further improves performance.

awgu added a commit that referenced this pull request Aug 21, 2024
ghstack-source-id: 6a83cae2c00f1384c02925fe686e3b222664b8de
Pull Request resolved: #551
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 21, 2024
@awgu
Copy link
Contributor Author

awgu commented Aug 21, 2024

@yifuwang I need to fix PP before this is landable 😢

@awgu
Copy link
Contributor Author

awgu commented Aug 21, 2024

@H-Huang @wconstab do you have any idea if the output logits being fp32 is a hard requirement for PP? anyway we can leave them as bf16?

awgu added a commit that referenced this pull request Aug 23, 2024
ghstack-source-id: 99a696d59af53f173d0af0b5c589056b4d76c7de
Pull Request resolved: #551
@@ -137,7 +137,7 @@ def main(job_config: JobConfig):
# loss function to be shared by Pipeline Parallel and SPMD training
def loss_fn(pred, labels):
return torch.nn.functional.cross_entropy(
pred.flatten(0, 1), labels.flatten(0, 1)
pred.flatten(0, 1).float(), labels.flatten(0, 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @Chillee torch.compile should respect this upcast numerically?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants