AO and Automated Mixed Precision #1390

bhack · 2024-12-08T13:52:15Z

Can we clarify in the readme what are the best practices to use ao at inference with a pytorch AMP trainer model/checkpoint?

…torch#1390)

vkuzo · 2024-12-09T22:41:44Z

@bhack , would love to learn some more context - is there an issue when you use the checkpoint directly?

bhack · 2024-12-09T23:20:49Z

Yes exactly it is the case when we use a checkpoint trained with AMP and we are entering in the AO world to optimize it for inference.

bhack · 2024-12-09T23:36:32Z

Generally in the repo before AO I just see model.eval().to(torch.bfloat16).
Is this the best practice?
If yes we could add some notes in the documentation.

vkuzo · 2024-12-16T15:52:55Z

Sorry for the late response, I was on holiday last week. There are no special things you need to do when using amp + torchao for converting the model to inference - just use torchao according to the documentation. Would you mind clarifying your question if that does not answer it?

bhack · 2024-12-16T16:16:45Z

Just to make an example from the Doc it is not clear if this is the best practice or not:

       model.eval()
       quantize_(model, int8_dynamic_activation_int8_weight())
       unwrap_tensor_subclass(model)
       with torch.amp.autocast(enabled=use_amp, dtype=input_dtype, device_type="cuda"):
           ......
           torch.export.export(....)
           aoti_compile_and_package(...)

vkuzo · 2024-12-17T15:39:54Z

I see, looks like your question is whether using autocast for quantized model inference is supported / recommended. I'll tag @jerryzh168 on this one.

jerryzh168 · 2024-12-17T21:01:37Z

Just to make an example from the Doc it is not clear if this is the best practice or not:

       model.eval()
       quantize_(model, int8_dynamic_activation_int8_weight())
       unwrap_tensor_subclass(model)
       with torch.amp.autocast(enabled=use_amp, dtype=input_dtype, device_type="cuda"):
           ......
           torch.export.export(....)
           aoti_compile_and_package(...)

we haven't tested this, but this seems reasonable to me, and

the quantized kernel should be able to handle the activation dtype from autocast
for inference, people typically already starts with these lower precision dtypes like bfloat16, so not sure what is the use case here

bhack · 2024-12-17T21:16:32Z

For 2:
Do you mean model.eval().to(torch.bfloat16) without AMP context?

jerryzh168 · 2024-12-17T21:55:58Z

For 2: Do you mean model.eval().to(torch.bfloat16) without AMP context?

Yes

bhack · 2024-12-17T22:00:56Z

Ok, it think it is better to add a note somewhere as it is quite common to have models trained with AMP.

jerryzh168 · 2024-12-17T22:59:01Z

Ok, it think it is better to add a note somewhere as it is quite common to have models trained with AMP.

Do you know the e2e flow for a model trained with AMP? do people

train with AMP, save the state_dict
load the state_dict and quantize the model etc.?

bhack · 2024-12-17T23:15:08Z

I think users want to know if they need to use the amp context on a quantized model or not as the documentation still suggests to use it for inference in the amp cases:
https://pytorch.org/docs/stable/amp.html

jerryzh168 · 2024-12-18T01:33:15Z

I think users want to know if they need to use the amp context on a quantized model or not as the documentation still suggests to use it for inference in the amp cases: pytorch.org/docs/stable/amp.html

I see, I think we could add a note to say that amp is typically not required in the torchao context anymore, since we typically start with a bfloat16 or float16 model

But if people want to run inference for a model trained with AMP, I'd suggest to start with saving the trained model and load the model: https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html#saving-resuming and quantize. but we'll need to test this path before updating the README

yanbing-j pushed a commit to yanbing-j/ao that referenced this issue Dec 9, 2024

Changing the referenced AAR so that it uses the AAR from the docs (py…

6895a18

…torch#1390)

drisspg added topic: documentation Use this tag if this PR adds or improves documentation question Further information is requested labels Dec 16, 2024

vkuzo assigned jerryzh168 Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AO and Automated Mixed Precision #1390

AO and Automated Mixed Precision #1390

bhack commented Dec 8, 2024

vkuzo commented Dec 9, 2024

bhack commented Dec 9, 2024

bhack commented Dec 9, 2024 •

edited

Loading

vkuzo commented Dec 16, 2024

bhack commented Dec 16, 2024 •

edited

Loading

vkuzo commented Dec 17, 2024

jerryzh168 commented Dec 17, 2024 •

edited

Loading

bhack commented Dec 17, 2024

jerryzh168 commented Dec 17, 2024

bhack commented Dec 17, 2024

jerryzh168 commented Dec 17, 2024

bhack commented Dec 17, 2024

jerryzh168 commented Dec 18, 2024

AO and Automated Mixed Precision #1390

AO and Automated Mixed Precision #1390

Comments

bhack commented Dec 8, 2024

vkuzo commented Dec 9, 2024

bhack commented Dec 9, 2024

bhack commented Dec 9, 2024 • edited Loading

vkuzo commented Dec 16, 2024

bhack commented Dec 16, 2024 • edited Loading

vkuzo commented Dec 17, 2024

jerryzh168 commented Dec 17, 2024 • edited Loading

bhack commented Dec 17, 2024

jerryzh168 commented Dec 17, 2024

bhack commented Dec 17, 2024

jerryzh168 commented Dec 17, 2024

bhack commented Dec 17, 2024

jerryzh168 commented Dec 18, 2024

bhack commented Dec 9, 2024 •

edited

Loading

bhack commented Dec 16, 2024 •

edited

Loading

jerryzh168 commented Dec 17, 2024 •

edited

Loading