Torch-TensorRT Backend for torch.compile #1690

narendasan · 2023-02-21T20:50:05Z

narendasan
Feb 21, 2023
Collaborator

Torch-TensorRT Backend for torch.compile

TL;DR

PyTorch 2.0's inference story is centralized around torchdynamo and torch.compile. It provides an interface and compiler stack to optimize models and therefore a natural entry point to integrate the FX frontend.

Goal(s)

Provide a natural interface to all of Torch-TensorRT's features through torch.compile

Usecases

Proposed APIs / UX

Users would interact with torch.compile as an API for torch-tensorrt.

Example Workflow

import torch.nn as nn

class MLP(nn.Module):
  def __init__(self):
    super().__init__()
    self.fc1 = nn.Linear(32, 64)

  def forward(self, x):
    x = self.fc1(x)
    x = torch.nn.functional.gelu(x)
    return x

model = MLP()

batch_size = 8
input = torch.randn(batch_size, 32)

fn = torch.compile(backend="tensorrt", dynamic=True)(model)

# triggers compilation of forward graph on the first run
out = fn(
    inputs=[torch_tensorrt.Input((128, 3, 224, 224), dtype=torch.float32)],
    enabled_precisions = torch.float32, # Run with FP32
    workspace_size = 1 << 22
)

Limitations

There shouldnt be any difference between torch.compile and torch-tensorrt.compile

Internal Implementation

Design

The main component of the implementation is the definition of the backend which invokes intermediary steps such as AOTAutorgrad.

def torch_tensorrt_dynamo_backend(
    gm, 
    inputs=[],
    enabled_precisions=set([_enums.dtype.float]),
    device=Device._current_device(),
    disable_tf32=False,
    sparse_weights=False,
    enabled_precisions=set(),
    refit=False,
    debug=False,
    capability=_enums.EngineCapability.default,
    num_avg_timing_iters=1,
    workspace_size=0,
    dla_sram_size=1048576,
    dla_local_dram_size=1073741824,
    dla_global_dram_size=536870912,
    calibrator=None,
    truncate_long_and_double=False,
    require_full_compilation=False,
    min_block_size=3,
    torch_executed_ops=[],
    torch_executed_modules=[],
):
    def torch_tensorrt_compile(gm, 
        inputs=[],
        enabled_precisions=set([_enums.dtype.float]),
        device=Device._current_device(),
        disable_tf32=False,
        sparse_weights=False,
        enabled_precisions=set(),
        refit=False,
        debug=False,
        capability=_enums.EngineCapability.default,
        num_avg_timing_iters=1,
        workspace_size=0,
        dla_sram_size=1048576,
        dla_local_dram_size=1073741824,
        dla_global_dram_size=536870912,
        calibrator=None,
        truncate_long_and_double=False,
        require_full_compilation=False,
        min_block_size=3,
        torch_executed_ops=[],
        torch_executed_modules=[],
    ):
        # Split, Compile etc.
        return gm.forward
    return aot_module_simplified(
        gm,
        fw_compiler=torch_tensorrt_compile
        inputs,
        enabled_precisions,
        device,
        disable_tf32,
        sparse_weights,
        enabled_precisions,
        refit,
        debug,
        capability,
        num_avg_timing_iters,
        workspace_size,
        dla_sram_size,
        dla_local_dram_size,
        dla_global_dram_size,
        calibrator,
        truncate_long_and_double,
        require_full_compilation,
        min_block_size,
        torch_executed_ops,
        torch_executed_modules)

Extensions Required to Core API implementations

There should be few changes to the core API

Data Structures

There might be ways to make it easier for users to group settings like using the compile spec object.

Details specific for TorchScript Support

N/A

Details specific for FX support

See above

Implementation Phases

Prototype - M

Implement an initial backend in the torch-tensorrt library which doesnt need to have splitting or all settings supported

MVP `(<TARGET RELEASE VERSION>)` - S

All settings are supported as is graph splitting

aifartist · 2023-02-26T23:46:00Z

aifartist
Feb 26, 2023

This won't help much if it continues to be seemingly impossible to install pytorch-tensorrt. I get the same version 0.0.0 installation problem I see opened over and over again here with no resolution.
After many month I still don't see a resolution like: Just type:
pip3 install pytorch-tensorrt==1.4.0 -f https://github.com/pytorch/TensorRT/releases/1.4.0
Where is the nightly build? How do I get v1.4.0.dev0...?
Is the cmake build ever going to support the Python API stuff which people using >>>PYTHON<<< Torch TRT certainly need?

If torch.compile(backend="tensorrt") doesn't yet work should torch_tensorrt.compile() work? And if it does how do I install the frustratingly difficult to install thing? I keep reading more and more so called resolutions and none actually resolve anything.
Also why would one group of pytorch-trt folks say something like: Oh, it looks like the releases/vx.y.z stuff has moved on github but no one seems to know where it was moved to?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torch-TensorRT Backend for torch.compile #1690

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Torch-TensorRT Backend for torch.compile #1690

narendasan Feb 21, 2023 Collaborator

Torch-TensorRT Backend for torch.compile

TL;DR

Goal(s)

Usecases

Proposed APIs / UX

Example Workflow

Limitations

Internal Implementation

Design

Extensions Required to Core API implementations

Data Structures

Details specific for TorchScript Support

Details specific for FX support

Implementation Phases

Prototype - M

MVP (<TARGET RELEASE VERSION>) - S

Replies: 1 comment

aifartist Feb 26, 2023

narendasan
Feb 21, 2023
Collaborator

MVP `(<TARGET RELEASE VERSION>)` - S

aifartist
Feb 26, 2023