Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection refused error when calling service endpoint #273

Open
WJay-tec opened this issue Feb 24, 2023 · 2 comments
Open

Connection refused error when calling service endpoint #273

WJay-tec opened this issue Feb 24, 2023 · 2 comments

Comments

@WJay-tec
Copy link

WJay-tec commented Feb 24, 2023

Problem

Calling Clusterset service endpoint after deleting a pod for that service will result in connection refused error.

Step to reproduce connection refused error

  1. Have pods in 2 clusters → (for example stag-eks , stag-eks-2)
  2. Create a ServiceExport for the service you are trying to expose in stag-eks
  3. Step 2 will automatically create a ServiceImport on both clusters
  4. Create a dummy pod in stag-eks-2, and exec into it. Run a curl command to the ClusterSet endpoint that was exported in step 2 (The curl command will successfully obtain a response)
  5. Delete the service pod u created in step 2 in stag-eks
  6. Wait for the pod to get recreated, and run the curl command again (which will get a connection refused error)

Steps to resolve the issue

  1. Delete ServiceImport in stag-eks-2 (where the caller is from)
  2. Rerun the curl command in the dummy pod in stag-eks-2, and u will get a successful response

Based on my current observation, it seems like coreDNS is not getting the latest pod IP and is still resolving to the old pod ip.
When the ServiceImport is recreated, it started to work fine again probably because the coreDNS record is updated due to the recreation.

Its also worth to add, that removing readinessProbe from the deployment manifest fixes the issue mentioned above (which i dont really understand how that fixes it)

@runakash
Copy link
Member

@WJay-tec - Thanks for the detailed steps to produce.

If I understand correctly, your service is ClusterIP?

Can you check the ips in the ServiceImport object? If it gets updated after deleting the pod.

spec:
  ips:
  - x.x.x.x
  type: "ClusterSetIP"
status:
  clusters:
  - cluster: stag-eks-2

@WJay-tec
Copy link
Author

WJay-tec commented Feb 27, 2023

Yes is ClusterIP, and yes i confirm that the ips in the ServiceImport object is updated after deleting the pod.

And i can also confirm that manually deleting the ServiceImport after that, my curl command is able to successfully get a response.

Probably some sort of race condition happening here? @runakash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants