Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does torch.export preserve the quantize_per_tensor/dequantize_per_tensor ops? #986

Open
justinchuby opened this issue Oct 1, 2024 · 6 comments

Comments

@justinchuby
Copy link

Does torch.export preserve the quantize_per_tensor/dequantize_per_tensor ops? I was testing with

import torch
from torchao.quantization.quant_api import (
    quantize_,
    int8_dynamic_activation_int8_weight,
    int4_weight_only,
    int8_weight_only,
    unwrap_tensor_subclass,
)

# define a floating point model where some layers could be statically quantized
class M(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # QuantStub converts tensors from floating point to quantized
        # self.conv = torch.nn.Conv2d(1, 1, 1)
        self.linear = torch.nn.Linear(4, 8)
        self.relu = torch.nn.ReLU()
        # DeQuantStub converts tensors from quantized to floating point

    def forward(self, x):
        # manually specify where tensors will be converted from floating
        # point to quantized in the quantized model
        x = self.linear(x)
        x = self.relu(x)
        return x

# create a model instance
model = M()
model.eval()

quantize_(model, int8_weight_only())
model = unwrap_tensor_subclass(model)

input_fp32 = torch.randn(1, 1, 4)

# dynamo export
program = torch.onnx.export(
    model,
    (input_fp32,),
    dynamo=True,
    report=True
)

print(program)

There I don't seem to see the quant/dequant ops. I was hoping that they are preserved so that converting to onnx is easier. Or is there a different convention for representing the quantized operations?

@supriyar
Copy link
Contributor

supriyar commented Oct 2, 2024

@jerryzh168 we have a way to preserve these ops in export, right?

@jerryzh168
Copy link
Contributor

I'll add a tutorial for this, but maybe you can try running torch.export.export before torch.onnx.export? we have a test case here:

self.assertTrue(torch.ops.quant.choose_qparams_affine.default in targets)
self.assertTrue(torch.ops.quant.quantize_affine.default in targets)

@justinchuby
Copy link
Author

Thanks - is the ops in torch.ops.quantized_decomposed still used, or is torch.ops.quant new/different? Do we need to import torchao to register these ops or are they native to pytorch?

@jerryzh168
Copy link
Contributor

jerryzh168 commented Oct 3, 2024

@justinchuby we tend to move away from torch.ops.quantized_decomposed as it's not general, in the long term we want to use torch.ops.quant ops that supports all types of granularities through the block_size argument (we may need to refine this arg a bit as we see different use cases)

torch.ops.quant ops are defined in torchao (https://github.com/pytorch/ao/blob/main/torchao/quantization/quant_primitives.py) so you'll have to import torchao to use them I think

@justinchuby
Copy link
Author

Could you point me to where the torch.ops.quant.* ops are declared, and is there a list of all ops available?

@jerryzh168
Copy link
Contributor

@justinchuby you can search for the ops annotated with register_custom_op in https://github.com/pytorch/ao/blob/main/torchao/quantization/quant_primitives.py
specifically the ops defined are: torch.ops.quant.choose_qparams_affine, torch.ops.quant.quantize_affine, torch.ops.quant.dequantize_affine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants