❓ [Question] TensorRT Export Failure with Large Input Sizes #3307

AndreaBrg · 2024-11-29T16:01:14Z

❓ Question

I'm trying to export a torch model that processes large inputs (e.g., 8192x2048). I have noticed that torch_tensorrt.compile fails with inputs greater than 4096x2048 (I haven't tried them all, only powers of 2). Specifically, the conversion fails for convolution and ReLU operations with a "No valid tactics" and "Illegal memory access" error:

[1A2024-11-29 16:56:42,307 - torch_tensorrt [TensorRT Conversion Context] - ERROR - [scopedCudaResources.cpp::~ScopedCudaStream::55] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
2024-11-29 16:56:42,311 - torch_tensorrt [TensorRT Conversion Context] - ERROR - IBuilder::buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node [CONVOLUTION]-[aten_ops.convolution.default]-[teacher.3/convolution_5] + [RELU]-[aten_ops.relu.default]-[teacher.4/relu_4].)
2024-11-29 16:56:42,312 - [MODEL EXPORT] - ERROR - TensorRT export failed: 
Traceback (most recent call last):
  File "/nfs/home/bragagnolo/qinstinct-fabric-inspection/tools/launchers.py", line 398, in <module>
    export(
  File "/nfs/home/bragagnolo/qinstinct-fabric-inspection/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/nfs/home/bragagnolo/qinstinct-fabric-inspection/tools/launchers.py", line 298, in export
    trt_model = torch_tensorrt.compile(model, **compile_spec)
  File "/nfs/home/bragagnolo/qinstinct-fabric-inspection/.venv/lib/python3.10/site-packages/torch_tensorrt/_compile.py", line 269, in compile
    trt_graph_module = dynamo_compile(
  File "/nfs/home/bragagnolo/qinstinct-fabric-inspection/.venv/lib/python3.10/site-packages/torch_tensorrt/dynamo/_compiler.py", line 288, in compile
    trt_gm = compile_module(
  File "/nfs/home/bragagnolo/qinstinct-fabric-inspection/.venv/lib/python3.10/site-packages/torch_tensorrt/dynamo/_compiler.py", line 464, in compile_module
    trt_module = convert_module(
  File "/nfs/home/bragagnolo/qinstinct-fabric-inspection/.venv/lib/python3.10/site-packages/torch_tensorrt/dynamo/conversion/_conversion.py", line 142, in convert_module
    interpreter_result = interpret_module_to_result(
  File "/nfs/home/bragagnolo/qinstinct-fabric-inspection/.venv/lib/python3.10/site-packages/torch_tensorrt/dynamo/conversion/_conversion.py", line 121, in interpret_module_to_result
    interpreter_result = interpreter.run()
  File "/nfs/home/bragagnolo/qinstinct-fabric-inspection/.venv/lib/python3.10/site-packages/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py", line 635, in run
    assert serialized_engine
AssertionError

Here attached is the script and full output log: issue.zip

Environment

PyTorch Version (e.g., 1.0): 2.5.1+cu121
TorchTensorRT Version: 2.5.0
CPU Architecture: AMD EPYC 7543 32-Core Processor
OS (e.g., Linux): Ubuntu 22.04.5 LTS
How you installed PyTorch (conda, pip, libtorch, source): pip
Python version: 3.10.12
CUDA version: Cuda compilation tools, release 12.1, V12.1.66 Build cuda_12.1.r12.1/compiler.32415258_0
GPU models and configuration: NVIDIA A100-SXM4-80GB, on SLURM with MIG enabled.

Is there any limit to the input size when converting using torch_tensorrt? Any solution to this problem?

Thanks.

The text was updated successfully, but these errors were encountered:

narendasan · 2024-12-04T15:53:39Z

These sorts of errors do occur when there is high memory pressure, typically constraining settings workspace size can help, but will take a deeper look

AndreaBrg added the question Further information is requested label Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

❓ [Question] TensorRT Export Failure with Large Input Sizes #3307

❓ [Question] TensorRT Export Failure with Large Input Sizes #3307

AndreaBrg commented Nov 29, 2024

narendasan commented Dec 4, 2024

❓ [Question] TensorRT Export Failure with Large Input Sizes #3307

❓ [Question] TensorRT Export Failure with Large Input Sizes #3307

Comments

AndreaBrg commented Nov 29, 2024

❓ Question

Environment

narendasan commented Dec 4, 2024