Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AO and Automated Mixed Precision #1390

Open
bhack opened this issue Dec 8, 2024 · 13 comments
Open

AO and Automated Mixed Precision #1390

bhack opened this issue Dec 8, 2024 · 13 comments
Assignees
Labels
question Further information is requested topic: documentation Use this tag if this PR adds or improves documentation

Comments

@bhack
Copy link

bhack commented Dec 8, 2024

Can we clarify in the readme what are the best practices to use ao at inference with a pytorch AMP trainer model/checkpoint?

@vkuzo
Copy link
Contributor

vkuzo commented Dec 9, 2024

@bhack , would love to learn some more context - is there an issue when you use the checkpoint directly?

@bhack
Copy link
Author

bhack commented Dec 9, 2024

Yes exactly it is the case when we use a checkpoint trained with AMP and we are entering in the AO world to optimize it for inference.

@bhack
Copy link
Author

bhack commented Dec 9, 2024

Generally in the repo before AO I just see model.eval().to(torch.bfloat16).
Is this the best practice?
If yes we could add some notes in the documentation.

@vkuzo
Copy link
Contributor

vkuzo commented Dec 16, 2024

Sorry for the late response, I was on holiday last week. There are no special things you need to do when using amp + torchao for converting the model to inference - just use torchao according to the documentation. Would you mind clarifying your question if that does not answer it?

@bhack
Copy link
Author

bhack commented Dec 16, 2024

Just to make an example from the Doc it is not clear if this is the best practice or not:

       model.eval()
       quantize_(model, int8_dynamic_activation_int8_weight())
       unwrap_tensor_subclass(model)
       with torch.amp.autocast(enabled=use_amp, dtype=input_dtype, device_type="cuda"):
           ......
           torch.export.export(....)
           aoti_compile_and_package(...)

@drisspg drisspg added topic: documentation Use this tag if this PR adds or improves documentation question Further information is requested labels Dec 16, 2024
@vkuzo
Copy link
Contributor

vkuzo commented Dec 17, 2024

I see, looks like your question is whether using autocast for quantized model inference is supported / recommended. I'll tag @jerryzh168 on this one.

@jerryzh168
Copy link
Contributor

jerryzh168 commented Dec 17, 2024

Just to make an example from the Doc it is not clear if this is the best practice or not:

       model.eval()
       quantize_(model, int8_dynamic_activation_int8_weight())
       unwrap_tensor_subclass(model)
       with torch.amp.autocast(enabled=use_amp, dtype=input_dtype, device_type="cuda"):
           ......
           torch.export.export(....)
           aoti_compile_and_package(...)

we haven't tested this, but this seems reasonable to me, and

  1. the quantized kernel should be able to handle the activation dtype from autocast
  2. for inference, people typically already starts with these lower precision dtypes like bfloat16, so not sure what is the use case here

@bhack
Copy link
Author

bhack commented Dec 17, 2024

For 2:
Do you mean model.eval().to(torch.bfloat16) without AMP context?

@jerryzh168
Copy link
Contributor

For 2: Do you mean model.eval().to(torch.bfloat16) without AMP context?

Yes

@bhack
Copy link
Author

bhack commented Dec 17, 2024

Ok, it think it is better to add a note somewhere as it is quite common to have models trained with AMP.

@jerryzh168
Copy link
Contributor

Ok, it think it is better to add a note somewhere as it is quite common to have models trained with AMP.

Do you know the e2e flow for a model trained with AMP? do people

  1. train with AMP, save the state_dict
  2. load the state_dict and quantize the model etc.?

@bhack
Copy link
Author

bhack commented Dec 17, 2024

I think users want to know if they need to use the amp context on a quantized model or not as the documentation still suggests to use it for inference in the amp cases:
https://pytorch.org/docs/stable/amp.html

@jerryzh168
Copy link
Contributor

I think users want to know if they need to use the amp context on a quantized model or not as the documentation still suggests to use it for inference in the amp cases: pytorch.org/docs/stable/amp.html

I see, I think we could add a note to say that amp is typically not required in the torchao context anymore, since we typically start with a bfloat16 or float16 model

But if people want to run inference for a model trained with AMP, I'd suggest to start with saving the trained model and load the model: https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html#saving-resuming and quantize. but we'll need to test this path before updating the README

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested topic: documentation Use this tag if this PR adds or improves documentation
Projects
None yet
Development

No branches or pull requests

4 participants