Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

❓ [Question] TensorRT Export Failure with Large Input Sizes #3307

Open
AndreaBrg opened this issue Nov 29, 2024 · 1 comment
Open

❓ [Question] TensorRT Export Failure with Large Input Sizes #3307

AndreaBrg opened this issue Nov 29, 2024 · 1 comment
Labels
question Further information is requested

Comments

@AndreaBrg
Copy link

❓ Question

I'm trying to export a torch model that processes large inputs (e.g., 8192x2048). I have noticed that torch_tensorrt.compile fails with inputs greater than 4096x2048 (I haven't tried them all, only powers of 2). Specifically, the conversion fails for convolution and ReLU operations with a "No valid tactics" and "Illegal memory access" error:

[1A2024-11-29 16:56:42,307 - torch_tensorrt [TensorRT Conversion Context] - ERROR - [scopedCudaResources.cpp::~ScopedCudaStream::55] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
2024-11-29 16:56:42,311 - torch_tensorrt [TensorRT Conversion Context] - ERROR - IBuilder::buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node [CONVOLUTION]-[aten_ops.convolution.default]-[teacher.3/convolution_5] + [RELU]-[aten_ops.relu.default]-[teacher.4/relu_4].)
2024-11-29 16:56:42,312 - [MODEL EXPORT] - ERROR - TensorRT export failed: 
Traceback (most recent call last):
  File "/nfs/home/bragagnolo/qinstinct-fabric-inspection/tools/launchers.py", line 398, in <module>
    export(
  File "/nfs/home/bragagnolo/qinstinct-fabric-inspection/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/nfs/home/bragagnolo/qinstinct-fabric-inspection/tools/launchers.py", line 298, in export
    trt_model = torch_tensorrt.compile(model, **compile_spec)
  File "/nfs/home/bragagnolo/qinstinct-fabric-inspection/.venv/lib/python3.10/site-packages/torch_tensorrt/_compile.py", line 269, in compile
    trt_graph_module = dynamo_compile(
  File "/nfs/home/bragagnolo/qinstinct-fabric-inspection/.venv/lib/python3.10/site-packages/torch_tensorrt/dynamo/_compiler.py", line 288, in compile
    trt_gm = compile_module(
  File "/nfs/home/bragagnolo/qinstinct-fabric-inspection/.venv/lib/python3.10/site-packages/torch_tensorrt/dynamo/_compiler.py", line 464, in compile_module
    trt_module = convert_module(
  File "/nfs/home/bragagnolo/qinstinct-fabric-inspection/.venv/lib/python3.10/site-packages/torch_tensorrt/dynamo/conversion/_conversion.py", line 142, in convert_module
    interpreter_result = interpret_module_to_result(
  File "/nfs/home/bragagnolo/qinstinct-fabric-inspection/.venv/lib/python3.10/site-packages/torch_tensorrt/dynamo/conversion/_conversion.py", line 121, in interpret_module_to_result
    interpreter_result = interpreter.run()
  File "/nfs/home/bragagnolo/qinstinct-fabric-inspection/.venv/lib/python3.10/site-packages/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py", line 635, in run
    assert serialized_engine
AssertionError

Here attached is the script and full output log: issue.zip

Environment

  • PyTorch Version (e.g., 1.0): 2.5.1+cu121
  • TorchTensorRT Version: 2.5.0
  • CPU Architecture: AMD EPYC 7543 32-Core Processor
  • OS (e.g., Linux): Ubuntu 22.04.5 LTS
  • How you installed PyTorch (conda, pip, libtorch, source): pip
  • Python version: 3.10.12
  • CUDA version: Cuda compilation tools, release 12.1, V12.1.66 Build cuda_12.1.r12.1/compiler.32415258_0
  • GPU models and configuration: NVIDIA A100-SXM4-80GB, on SLURM with MIG enabled.

Is there any limit to the input size when converting using torch_tensorrt? Any solution to this problem?

Thanks.

@AndreaBrg AndreaBrg added the question Further information is requested label Nov 29, 2024
@narendasan
Copy link
Collaborator

These sorts of errors do occur when there is high memory pressure, typically constraining settings workspace size can help, but will take a deeper look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants