[float8] Allow specifying arbitrary dtype for each tensor #1326

lw · 2024-11-22T09:49:52Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

ghstack-source-id: 7dabc91df68ce20e15551c5488071579e49c263c Pull Request resolved: #1326

pytorch-bot · 2024-11-22T09:49:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1326

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 97c9983 with merge base 1a0dbf1 ():

NEW FAILURE - The following job has failed:

Code Analysis with Ruff / build (3.9) (gh)
test/float8/test_dtensor.py:24:1: I001 [*] Import block is un-sorted or un-formatted

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

ghstack-source-id: d9c0e023cb8667d6f13e2845e9b6845e1669f78a Pull Request resolved: #1326

[ghstack-poisoned]

ghstack-source-id: e86e1f6a42610d776bccd3f33044f23e311688eb Pull Request resolved: #1326

[ghstack-poisoned]

ghstack-source-id: aa9f551c8d274f349c4298932fc95c88040abb09 Pull Request resolved: #1326

[ghstack-poisoned]

ghstack-source-id: c339ea060b7062871a5f57a939e8880e5f727de4 Pull Request resolved: #1326

[ghstack-poisoned]

ghstack-source-id: 4b3a2f0007d74e3453cefde1307f2a9c5271e83e Pull Request resolved: #1326

vkuzo · 2024-11-26T22:44:52Z

torchao/float8/config.py

@@ -62,6 +62,7 @@ class CastConfig:
    scaling_type: ScalingType = ScalingType.DYNAMIC
    scaling_granularity: ScalingGranularity = ScalingGranularity.TENSORWISE
    static_scale: Optional[torch.Tensor] = None
+    dtype: Optional[torch.dtype] = None


nit:

can we add a comment on what this is used for, and that None means the default e4m3|e5m2 value will be used?

optional - thoughts about naming this in a more specific way such as target_dtype, lowp_dtype, etc? dtype is a bit ambiguous across torchao unfortunately :(

vkuzo · 2024-11-26T22:46:42Z

torchao/float8/config.py

@@ -343,12 +367,14 @@ def recipe_name_to_linear_config(
        cc_w = CastConfig(scaling_granularity=ScalingGranularity.AXISWISE)

        # grad_input_hp = grad_output_fp8_axiswise_dim0 @ weight_fp8_tensorwise
-        cc_go = CastConfig(scaling_granularity=ScalingGranularity.AXISWISE)
+        cc_go = CastConfig(
+            scaling_granularity=ScalingGranularity.AXISWISE, dtype=e4m3_dtype


nit: maybe we can also add some context in the comments on L353:L363 that it also uses e4m3 for grads?

vkuzo · 2024-11-26T22:48:18Z

torchao/float8/float8_linear.py

-    NoopFwToFloat8E5M2BwDelayed,
-    NoopFwToFloat8E5M2BwDynamic,
-    NoopFwToFloat8E5M2BwStatic,
+    NoopFwToFloat8BwDelayed,


thanks for updating these!

vkuzo · 2024-11-26T22:50:23Z

torchao/float8/float8_linear_utils.py

@@ -303,13 +311,16 @@ def inner_func():

        # Calculate the new scales from the updated history stacks
        new_input_scales = amax_history_to_scale_stack(
-            fp8_input_amax_history_stack, e4m3_dtype, x_dtype, scale_fn_recipe
+            fp8_input_amax_history_stack, input_dtype, x_dtype, scale_fn_recipe


will likely have to rebase on top of #1329 which changed this line

vkuzo · 2024-11-26T22:53:44Z

torchao/float8/config.py

@@ -62,6 +62,7 @@ class CastConfig:
    scaling_type: ScalingType = ScalingType.DYNAMIC
    scaling_granularity: ScalingGranularity = ScalingGranularity.TENSORWISE
    static_scale: Optional[torch.Tensor] = None
+    dtype: Optional[torch.dtype] = None

    def short_str(self):


can we also add the dtype here, so it appears when we print an instance of Float8Linear? Float8Linear.__extra_repr__ calls this method.

vkuzo · 2024-11-26T22:54:10Z

This is great! LGTM, had some comments but all are pretty nitty. CI is green - ship it!

[ghstack-poisoned]

ghstack-source-id: d8300e2a07c087f3cd51b03e0e21125a83a29489 Pull Request resolved: #1326

lw · 2024-12-04T15:57:39Z

Superseded by #1378

Update

e511579

[ghstack-poisoned]

lw mentioned this pull request Nov 22, 2024

[float8] Re-enable slow-accum in the bwd of axis-wise scaling schemes #1325

Merged

lw added a commit that referenced this pull request Nov 22, 2024

[float8] Allow specifying arbitrary dtype for each tensor

28949f8

ghstack-source-id: 7dabc91df68ce20e15551c5488071579e49c263c Pull Request resolved: #1326

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 22, 2024

Update

51acb5b

[ghstack-poisoned]

lw added a commit that referenced this pull request Nov 22, 2024

[float8] Allow specifying arbitrary dtype for each tensor

a57d7c8

ghstack-source-id: d9c0e023cb8667d6f13e2845e9b6845e1669f78a Pull Request resolved: #1326

lw added the topic: new feature Use this tag if this PR adds a new feature label Nov 22, 2024

Update

9e89f6a

[ghstack-poisoned]

lw added a commit that referenced this pull request Nov 22, 2024

[float8] Allow specifying arbitrary dtype for each tensor

b4876df

ghstack-source-id: e86e1f6a42610d776bccd3f33044f23e311688eb Pull Request resolved: #1326

Update

97b9cf8

[ghstack-poisoned]

lw added a commit that referenced this pull request Nov 22, 2024

[float8] Allow specifying arbitrary dtype for each tensor

a749a5f

ghstack-source-id: aa9f551c8d274f349c4298932fc95c88040abb09 Pull Request resolved: #1326

Update

810ad91

[ghstack-poisoned]

lw added a commit that referenced this pull request Nov 22, 2024

[float8] Allow specifying arbitrary dtype for each tensor

71f2ea1

ghstack-source-id: c339ea060b7062871a5f57a939e8880e5f727de4 Pull Request resolved: #1326

Update

b9672f5

[ghstack-poisoned]

lw added a commit that referenced this pull request Nov 22, 2024

[float8] Allow specifying arbitrary dtype for each tensor

7d28acf

ghstack-source-id: 4b3a2f0007d74e3453cefde1307f2a9c5271e83e Pull Request resolved: #1326

vkuzo reviewed Nov 26, 2024

View reviewed changes

Update

97c9983

[ghstack-poisoned]

lw added a commit that referenced this pull request Dec 4, 2024

[float8] Allow specifying arbitrary dtype for each tensor

541454b

ghstack-source-id: d8300e2a07c087f3cd51b03e0e21125a83a29489 Pull Request resolved: #1326

lw mentioned this pull request Dec 4, 2024

[float8] Allow specifying arbitrary dtype for each tensor #1378

Merged

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

lm_eval: fix links and pin version to 0.4.2 (pytorch#1326)

2ea11b0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[float8] Allow specifying arbitrary dtype for each tensor #1326

[float8] Allow specifying arbitrary dtype for each tensor #1326

lw commented Nov 22, 2024 •

edited

Loading

pytorch-bot bot commented Nov 22, 2024 •

edited

Loading

vkuzo Nov 26, 2024

vkuzo Nov 26, 2024

vkuzo Nov 26, 2024

vkuzo Nov 26, 2024

vkuzo Nov 26, 2024

vkuzo commented Nov 26, 2024

lw commented Dec 4, 2024 •

edited

Loading

[float8] Allow specifying arbitrary dtype for each tensor #1326

Are you sure you want to change the base?

[float8] Allow specifying arbitrary dtype for each tensor #1326

Conversation

lw commented Nov 22, 2024 • edited Loading

pytorch-bot bot commented Nov 22, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1326

❌ 1 New Failure

vkuzo Nov 26, 2024

Choose a reason for hiding this comment

vkuzo Nov 26, 2024

Choose a reason for hiding this comment

vkuzo Nov 26, 2024

Choose a reason for hiding this comment

vkuzo Nov 26, 2024

Choose a reason for hiding this comment

vkuzo Nov 26, 2024

Choose a reason for hiding this comment

vkuzo commented Nov 26, 2024

lw commented Dec 4, 2024 • edited Loading

lw commented Nov 22, 2024 •

edited

Loading

pytorch-bot bot commented Nov 22, 2024 •

edited

Loading

lw commented Dec 4, 2024 •

edited

Loading