-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ServiceExportReconciler keep re registering instances in aws cloudmap #312
Comments
Hello @mukshe01, We don't have support custom attributes export/import.
The code compares the endpoints IP/Port with CloudMap instances. I don't believe changing the attributes will trigger re-registration. |
Hi Akash, Many thanks for your response, we are trying to understand what change is triggering the re registrations. in kubernates the app exposed by the service hasnt restarted for last 16 hours means the ip shouldnt change, also the port 8080 stays constant. $ date $ kubectl get po rest-api-develop-fc4d7778c-pqss5 -n participant1-develop -o wide this is kubernates service config. $ kubectl describe svc rest-api-develop -n participant1-develop below are logs of two occurrences of endpoint re registration for namespace participant1-develop, service rest-api-develop. as you see its registering instances/endpoints to cloudmap with same ip(ip didnt change), ideally we expect it shouldnt update cloudmap as there is no change. i have attached full logs of controller for your reference. --
|
You can use this reference to customize the logging https://sdk.operatorframework.io/docs/building-operators/golang/references/logging/ |
Hi, Thank you for that we changed loglevel to debug(--zap-log-level debug), and captured the logs when re registration of endpoints happened. looking at logs it is not clear why this happened, as the endpoint app did not restart and endpoint ips did not change. here is the full log. log snippet when re registering endpoints.
|
On reviewing the ServiceExport controller code, it seems we are exporting the endpoints based on the k8s events on the Service and EndpointSlice. Updating metadata will trigger the k8s events, thus it's getting overridden. |
Hi Akash, we compared Service, EndpointSlice k8s objects definitions for effected endpoints before and after the occurrence of the problem and they are exactly identical, we do not see any metadata difference (command ran kubectl describe endpointslices rest-api-staging-xkqrz -n participant1-staging). here is an example endpoint slice. also we captured kubernates events for last 24 hours of namespace the service is in , we do not see any k8s event at exact time the problem occurs. are we missing anything?. On a side note we created service export objects for 2 new services(just to test) within same cluster yesterday. we did not see the problem occurred for the new services. however for original service it occurred 8 to 10 times in last 24 hours. 1 of test service runs on same k8s node(ec2) and other runs on different node. interestingly the only difference between the test services and live service is the custom attributes we populated in cloudmap for the instances using a lambda. Would you help us to get to the bottom of this issue, please let us know if you need any information from our side. Regards |
The controller watches the kubernetes service and endpoint slice resources. The code for that is abstracted out by operator sdk. Perhaps you can subscribe to the events and debug on what attribute change is triggering reconciliation. |
Hi Akash, we captured all k8s events in the cluster for last few hours, attached the events dump file. also attaching mcs controller log. mcs_controller_log_20231212.txt as we see in mcs controller log controller is re reregistering instance tcp-172_24_125_249-8080 at 2023-12-12T11:45:55.
we do not see any K8s events at this time. we see some events at 2023-12-12T11:40:42(5 minutes before the issue) of type SuccessfullyReconciled for targetGroupBinding,
we can confirm these events wont update service/endpointslices meta data as we captured and compared service/endpointslices k8s objects before and after the issue and they are identical. would you be able to help us to identify which event is causing the controller to re register the endpoints. |
Hi, Just to update on this, We are still not sure what is triggering this change. however we are sure that the issue only occurs to instances to which we add custom attributes in aws cloudmap. we have 4 similar applications running on same K8s node, the issue happens to 3 instances (around same time), for 1 it wont happen. the 1 instance being the one for which we are not altering aws cloudmap attributes. we we added custom attribute to this instance in cloudmap the issue started happening. Regards |
Hello, There is no code logic within the controller that triggers the re-registration. The controller subscribes to kubernetes updates events for the kind ServiceExport, Service, EndpointSlice and ClusterProperty. Unfortunately we don't have enough observability to say what kind-of update is triggering an event. You can choose to fork the controller and implement the attributes reconciliation. The way to approach this will be to add the instance attributes as ServiceExport object annotations
And then update the serviceexport_controller.go to export the annotations as attributes. You can push the change as a PR, and we can review the code. And a disclaimer that the mcs controller is in alpha stage. Should be used in production system with caution. |
Hi,
We are running v0.3.1 of aws-cloud-map-mcs-controller to register services to aws cloudmap. we have another application looks up cloudmap for service discovery. we have a requirement to populate some custom attributes for cloud map instances.
is there a way to define custom attributes in ServiceExport definition of kubernates object?. currently its defined like this.
kind: ServiceExport
apiVersion: multicluster.x-k8s.io/v1alpha1
metadata:
namespace: namespace1
name: service-name
currently we are using a lambda trigger in aws to populate these custom attributes. this is causing an issue with ServiceExportReconciler , its keep re registering instances in aws cloudmap as endpoint/instance definition is different in aws cloudmap to what controller desires . is there a way to stop this re register from happening?.
Thank you.
The text was updated successfully, but these errors were encountered: