Transformer building blocks tutorial #3075

mikaylagawarecki · 2024-10-04T21:59:01Z

Description

This adds the tutorial for transformer building blocks following the outline discussed in nn/optim triage on Friday (9/27/24) here https://docs.google.com/document/d/1TMrd0bDiM9-lcFHi079edkMRP1Ux5MTxt4lI1diiAKI/edit

This tutorial also links to a repo https://github.com/mikaylagawarecki/temp which

has examples of implementing the rest of the nn.Transformer-related layers in pytorch in a NJT friendly manner (basically no more *_padding_mask)
Notes some cases that we don't intend to demonstrate (e.g. see here)
removes fast path logic from MHA/TEL/TE
sanity checks that for MHA/TEL/TDL over kwargs: new_layer + NJT + compile we have correctness + perf gains over nn.layer + dense + mask + compile (as we expect :)). (TE, TD and T are just higher level wrappers so we didn't test those)

To run this tutorial with correctness, we likely need torch 2.6

There are a few pending sections in this tutorial that hope to demonstrate more cool examples of composing feature with NJT that are pending some PRs. Not sure whether we should consider this a v0 and add those as v1?

NJT index_put_ (KV caching section) Add support for index_put_ in NT pytorch#135722
FlexAttention + NJT FlexAttention support for NJT pytorch#136792
Grouped Query Attention + NJT (not sure if there is a plan for this yet)

Checklist

The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
Only one issue is addressed in this pull request
Labels from the issue that this PR is fixing are added to this pull request
No unnecessary issues are included into this pull request.

pytorch-bot · 2024-10-04T21:59:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3075

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 111843b with merge base 97b20b3 ():

NEW FAILURES - The following jobs have failed:

Build tutorials / pytorch_tutorial_build_worker (14, 15, linux.4xlarge.nvidia.gpu) (gh)
NotImplementedError: aten.to_padded_tensor.default
Check spelling / pyspelling (gh)
##[error]Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

transformer building blocks tutorial

c881238

facebook-github-bot added the cla signed label Oct 4, 2024

mikaylagawarecki requested a review from jbschlosser October 4, 2024 22:01

mikaylagawarecki added 2 commits October 4, 2024 15:09

some wording fixes

71c1bac

Fix wording

111843b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformer building blocks tutorial #3075

Transformer building blocks tutorial #3075

mikaylagawarecki commented Oct 4, 2024 •

edited

Loading

pytorch-bot bot commented Oct 4, 2024 •

edited

Loading

Transformer building blocks tutorial #3075

Are you sure you want to change the base?

Transformer building blocks tutorial #3075

Conversation

mikaylagawarecki commented Oct 4, 2024 • edited Loading

Description

Checklist

pytorch-bot bot commented Oct 4, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3075

❌ 2 New Failures

mikaylagawarecki commented Oct 4, 2024 •

edited

Loading

pytorch-bot bot commented Oct 4, 2024 •

edited

Loading