-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in CW agent prometheus 1.247348.0b251302 #264
Comments
This issue was marked stale due to lack of activity. |
Still happening with 1.247349.0b251399 |
Hey Andrey, we believe that we have a fix for this issue in our latest release. It was related to data that was not paginated that was returned from the k8s api server. Please keep an eye out for the 50 release, which should be coming in mid February |
This is happening on non-EKS CWAgent as well. Specifically the Windows agent. |
@CraigHead could you describe your issue specifically, including related errors you are seeing. The issue listed in this ticket was related to contains being killed for OOM in EKS. |
This should be resolved with the latest version of the agent |
Still seeing pod being OOMKilled with 2500Mi memory limit when using |
I removed memory limits to see how much memory it would use. I stopped this experiment after it consumed 20GB. |
Hey @ashevtsov-wawa, Therefore, for next course of action, would you help me in sharing the following information:
|
Hi, Not sure if it is the same issue, but we faced a memory leak when the endpoint was unreachable. It seems CW agent would accumulate the connections / not clean everything ? and finally would get OOM killed after some time. Fixing the network issue resolved our problem, but I believe it could be dealt with in the code, so that an unreachable endpoint does not end in an OOM. thanks, Nicolas |
@nmamn Can you provide some additional context into the issue you're seeing? Which version of the agent were you seeing this in? Were there any logs indicating that the agent was failing to reach the endpoint? It would help us debug the issue. |
We are facing a similar issue and reported here. Our agent is consuming more than 50Gi (limit) and getting OOMKilled |
After upgrading CW agent Prometheus from 1.247347.5b250583 to 1.247348.0b251302 the pod started getting killed by Kubernetes (OOMKilled).
Memory limit is set to 2000m. Tried increasing the limit up to 8000m to no avail.
Downgrading to 1.247347.5b250583 fixes the issue (with 2000m limit).
We run the agent in EKS 1.19.
We are experiencing this in a couple environments, each running over 120 pods (including those of daemonsets). Environments where this is not an issue have ~30-50 pods running.
Last messages in the container logs of the killed pods aren't consistent
one instance:
another instance (same cluster):
Let me know if you need any other information/logs that will help in troubleshooting.
The text was updated successfully, but these errors were encountered: