Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for dlami-cloudwatch-agent in ecs-ami for GPU #195

Open
mostafafarzaneh opened this issue Jan 30, 2024 · 1 comment
Open

Add Support for dlami-cloudwatch-agent in ecs-ami for GPU #195

mostafafarzaneh opened this issue Jan 30, 2024 · 1 comment

Comments

@mostafafarzaneh
Copy link

DESCRIPTION:

We are currently utilizing the amzn2-ami-ecs-gpu-hvm-2.0.20240109-x86_64-ebs AMI for our ECS instances. However, we have observed that this AMI lacks support for the dlami-cloudwatch-agent, a crucial component present in the DLAMI (Deep Learning AMI GPU TensorFlow 2.12.0 (Ubuntu 20.04) 20230529).

Our specific requirement is to publish GPU utilization metrics to CloudWatch using the dlami-cloudwatch-agent. This capability is essential for monitoring and optimizing our GPU resources effectively.

EXPECTED BEHAVIOR:

We request an update to the amzn2-ami-ecs-gpu-hvm-2.0.20240109-x86_64-ebs AMI to include support for the dlami-cloudwatch-agent. This addition will enable us to seamlessly integrate GPU utilization metrics into our CloudWatch monitoring infrastructure.

ADDITIONAL CONTEXT:

  • Current State: The dlami-cloudwatch-agent is present in DLAMI but absent in the mentioned ECS AMI.

  • Use Case: Our use case involves closely monitoring GPU utilization for better resource management and performance optimization.

IMPACT:

This enhancement will benefit users relying on the amzn2-ami-ecs-gpu-hvm-2.0.20240109-x86_64-ebs AMI, enabling them to leverage CloudWatch for comprehensive GPU monitoring.

@sparrc
Copy link
Contributor

sparrc commented Apr 19, 2024

Hi @mostafafarzaneh, are you able to install this package in userdata on instance startup? I'd be reticent to add this package to the default AMI since opting all customers into additional cloudwatch metrics would lead to increased billing for metrics that they may not use or need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants