Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

federated-learning-job run error: http://yolo-v5-aggregation.default:7363 connection failed #459

Open
victorming666 opened this issue Dec 26, 2024 · 4 comments

Comments

@victorming666
Copy link

I rebuilt the docker images for federated learning job, the pod run ok on both cloud node and edge nodes:
image

the pod on cloud node:
image

but the pod on edge node gives errors:
image
anybody can help? many tks!

@victorming666
Copy link
Author

This is an issue of DNS failure on k8s+kubedge+edgemesh+sedna cluster. The info of the cluster:

  1. kubernetes: 1.24.16
  2. kubeedge: 1.13.0
  3. edgemesh: 1.13.0
  4. sedna: 0.6.0
    Two cloud nodes and two edge nodes:
    image
    Why the edge nodes can't find the dns server of cloud node? I test with edgemesh's tcp-echo examples and it works!

@victorming666
Copy link
Author

Is this project dead? Why no replies for all these issues?

@victorming666
Copy link
Author

btw, the cluster is ok. as edgemesh's test case 'cloud-edge echo' is passed:
cloud call edge:
image
edge call cloud:
image

@victorming666
Copy link
Author

At last I runned OK this test, following is the logs of cloud node:
image
and here is the log of one of edge nodes:
image
many touch stuffs...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant