Accelerating Performance in Dynamo + Torch-TRT #1825

gs-olive · 2023-04-14T17:51:21Z

gs-olive
Apr 14, 2023
Collaborator

Accelerating Performance in Dynamo + Torch-TRT

TL;DR

torch.compile and torch._dynamo.export are two promising new utilities for model compilation which are actively under development for integration within the Torch-TRT framework. This is a discussion of methods and suggestions to further accelerate inference in these paths, with a focus on torch.compile.

Goal(s)

The objective of this discussion is to highlight current shortcomings of the torch.compile backend and related issues with the torch._dynamo.export path, and suggest ideas for improved acceleration of these frameworks. Currently, the performance pitfalls of torch.compile originate from three sources: Module-Level Acceleration, Converter Coverage, and Control Flow.

1. Module-Level Acceleration

Problem Context

Module-level acceleration encompasses the notion that translating high-level modules, such as Attention modules, directly into their respective accelerated counterparts is a more performance-effective acceleration method than breaking up such modules into their smaller components. Currently, in both the torch.compile and torch._dynamo.export paths, large modules such as Attention are being subdivided into component modules composed of aten operators. Ideally, such modules would instead be directly replaced with their accelerated counterparts.

Proposed Solution

The approach to solution in this scenario could take one of two paths. First, one could intercept the Attention module prior to lowering, and replace it automatically with its accelerated counterpart. Alternatively, one could lower the Attention module to its aten components, and then use subgraph matching to match the pattern of calls which correspond to an Attention module. While the latter is much easier to implement, since there is no known way to use the former in the current torch.compile framework, the former would be a cleaner solution. Some options of subgraph-matching utilities include the Torch FX subgraph rewriter and the Inductor pattern matcher.

2. Converter Coverage

Problem Context

Converter coverage is a critical piece in both the torch.compile and torch._dynamo.export paths, as the aten converters are used in both of these. Improved converter coverage is critical since it allows acceleration of more operations in a given model, and reduces the segmentation caused by partitioning.

Proposed Solution

Currently, we are working to implement aten converters which are key for certain critical models. This effort should be in conjunction with effective lowering passes which can both reduce the number of necessary converter implementations, but also improve code performance and readability. Another alternative to keep in mind is the use of Prims IR, which is a low-level, more restricted version of the aten operators. The potential utility of the prims IR is that we could implement the entirety of the set of prim operators and thereby support many more models. The drawback in this case is that the decompositions are down to a much lower level, so the optimizations we can make are much more limited.

3. Control Flow

Problem Context

One of the most promising aspects of the torch.compile path is its ability to handle control flow automatically and split the graph into subgraphs based on control flow branches. This is also one of the drawbacks of this method, as excessive control flow can deteriorate performance. The torch._dynamo.export path also provides restricted support for control flow, when using experimental Torch conditional operators.

Proposed Solution

On this topic, the solution is more of a trade-off. In the torch.compile path, taking a new branch on a control flow decision will spur recompilation, but no changes are needed to the codebase. In the torch._dynamo.export path, the resulting model will not need recompilation, but substantial model rewriting is required to support control flow within the model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerating Performance in Dynamo + Torch-TRT #1825

{{title}}

Replies: 0 comments

Select a reply

Accelerating Performance in Dynamo + Torch-TRT #1825

gs-olive Apr 14, 2023 Collaborator

Accelerating Performance in Dynamo + Torch-TRT

TL;DR

Goal(s)

1. Module-Level Acceleration

Problem Context

Proposed Solution

2. Converter Coverage

Problem Context

Proposed Solution

3. Control Flow

Problem Context

Proposed Solution

Replies: 0 comments

gs-olive
Apr 14, 2023
Collaborator