Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary access to v1.Pod lead to timeout. #4955

Open
qin-nz opened this issue Dec 16, 2024 · 3 comments
Open

Unnecessary access to v1.Pod lead to timeout. #4955

qin-nz opened this issue Dec 16, 2024 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@qin-nz
Copy link

qin-nz commented Dec 16, 2024

What happened:

my config:

        --source=service
        --service-type-filter=LoadBalancer

my k8s cluster has too many pods, and apiserver can NOT return in 60 seconds.

It will error at:

time="2024-12-16T12:43:02Z" level=info msg="Created Kubernetes client https://10.0.0.1:443"
time="2024-12-16T12:44:02Z" level=fatal msg="failed to sync *v1.Pod: context deadline exceeded"

What you expected to happen:

Because I specifiy source=service and service-type-filter=LoadBalancer, So it should NEVER accecss api to v1.Pod.

But actually, NewServiceSource call waitForCacheSync which will get all pods regardless of service-type-filter.

func NewServiceSource(ctx context.Context, kubeClient kubernetes.Interface, namespace, annotationFilter string, fqdnTemplate string, combineFqdnAnnotation bool, compatibility string, publishInternal bool, publishHostIP bool, alwaysPublishNotReadyAddresses bool, serviceTypeFilter []string, ignoreHostnameAnnotation bool, labelSelector labels.Selector, resolveLoadBalancerHostname bool) (Source, error) {
tmpl, err := parseTemplate(fqdnTemplate)
if err != nil {
return nil, err
}
// Use shared informers to listen for add/update/delete of services/pods/nodes in the specified namespace.
// Set resync period to 0, to prevent processing when nothing has changed
informerFactory := kubeinformers.NewSharedInformerFactoryWithOptions(kubeClient, 0, kubeinformers.WithNamespace(namespace))
serviceInformer := informerFactory.Core().V1().Services()
endpointsInformer := informerFactory.Core().V1().Endpoints()
podInformer := informerFactory.Core().V1().Pods()
nodeInformer := informerFactory.Core().V1().Nodes()
// Add default resource event handlers to properly initialize informer.
serviceInformer.Informer().AddEventHandler(
cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) {
},
},
)
endpointsInformer.Informer().AddEventHandler(
cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) {
},
},
)
podInformer.Informer().AddEventHandler(
cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) {
},
},
)
nodeInformer.Informer().AddEventHandler(
cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) {
},
},
)
informerFactory.Start(ctx.Done())
// wait for the local cache to be populated.
if err := waitForCacheSync(context.Background(), informerFactory); err != nil {
return nil, err
}

So it became timeout because of hard code time.

ctx, cancel := context.WithTimeout(ctx, 60*time.Second)

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • External-DNS version (use external-dns --version): v0.14.2
  • DNS provider: rfc2136
  • Others:

** Releated issues**:

@qin-nz qin-nz added the kind/bug Categorizes issue or PR as related to a bug. label Dec 16, 2024
@qin-nz qin-nz changed the title Too many pod lead to timeout. Unnecessary access to v1.Pod lead to timeout. Dec 16, 2024
@qin-nz
Copy link
Author

qin-nz commented Dec 17, 2024

I try delete this line (and releated lines). There is no timeout anymore.

podInformer := informerFactory.Core().V1().Pods()

@dmarkhas
Copy link
Contributor

I've seen that this can happen due to permissions:
#4960

Make sure the account running external-dns is allowed to list pods.

@qin-nz
Copy link
Author

qin-nz commented Dec 19, 2024

@dmarkhas

  1. Yes, I am sure external-dns is allowed to list pods. Because when I delete thousands of pod. List pod can return in 50 seconds.
  2. When use --service-type-filter=LoadBalancer, list pod is unnecessary. So the code should NOT list pods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants