Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torch not compiled with CUDA enabled when deploying T5 using Triton #4651

Open
subhamiitk opened this issue May 4, 2024 · 1 comment
Open

Comments

@subhamiitk
Copy link

Link to the notebook
https://github.com/aws/amazon-sagemaker-examples/blob/main/inference/nlp/realtime/triton/single-model/t5_pytorch_python-backend/t5_pytorch_python-backend.ipynb

Describe the bug
When following this notebook, getting an error when creating the endpoint. Endpoint creation fails with error: creating server: Invalid argument - load failed for model '/opt/ml/model/::t5_pytorch': version 1 is at UNAVAILABLE state: Internal: AssertionError:
error in the Cloudwatch.
To reproduce
Followed the above notebook for T5 model deployment, getting error at creating the endpoint.

Logs
error: creating server: Invalid argument - load failed for model '/opt/ml/model/::t5_pytorch': version 1 is at UNAVAILABLE state: Internal: AssertionError:

@HubGab-Git
Copy link

Hi @subhamiitk,
Could you share what environment you’re using? I ran the setup with the following configuration, and everything worked smoothly:

•	Platform: JupyterLab
•	Instance: ml.t3.medium
•	Image: SageMaker Distribution 2.0.0
•	Storage: 20GB
•	Kernel: Python 3 (default)

Looking forward to hearing from you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants