Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NATS cluster bound tokens randomly being deleted #310

Open
akubala opened this issue Feb 19, 2021 · 4 comments
Open

NATS cluster bound tokens randomly being deleted #310

akubala opened this issue Feb 19, 2021 · 4 comments

Comments

@akubala
Copy link

akubala commented Feb 19, 2021

Hello!

I am using the following setup on Kubernetes v1.18.9 (EKS):

nats-operator

  • image: natsio/nats-operator:0.8.2
  • we are using clusterScoped: true option

nats-cluster

  • image: nats:2.1.9
  • we are creating ServiceAccounts for auth
  • we are using config reloader
  • the cluster size is 3

nats-streaming

  • image: nats-streaming:0.19.0

I am unable to catch any logs related with errors or warns in nats-operator, nats-cluster and nats-streaming.

The deletion is being random - to restore proper config, I have to reload all affected NatsServiceRoles created for my services.
My services are using the following config for NetsServiceRole:

apiVersion: nats.io/v1alpha2
kind: NatsServiceRole
metadata:
  annotations:
    helm.fluxcd.io/antecedent: my-ns:helmrelease/company
  labels:
    nats_cluster: my-nats-cluster
  name: company
  namespace: my-nats-io
spec:
  permissions:
    publish:
    - '>'
    subscribe:
    - '>'

Moreover, the secrets for my services are being deleted, but for the nats-streaming, not.
Also configuration that is stored in nats-cluster secret (nats.conf) is not touched when bound tokens are deleted.
Please ping me back which information should I provide to create better description of the issue.

Thanks!

@hpdobrica
Copy link

hpdobrica commented Jun 15, 2021

having the exact same issue with the same config :(

@hpdobrica
Copy link

Probably worth mentioning that operator gives these logs when the problem occurs (usually just for one of the many disappeared secrets):

E0630 13:58:30.324121 1 generic.go:108] error syncing "nats-io/nats-cluster": failed to update auth data in config secret: secrets "some-nats-cluster-bound-token" not found

E0630 13:58:50.245055 1 generic.go:108] error syncing "nats-io/nats-cluster": failed to update auth data in config secret: Operation cannot be fulfilled on secrets "some-nats-cluster-bound-token": StorageError: invalid object, Code: 4, Key: /registry/secrets/app/some-nats-cluster-bound-token, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 1cc9f3e7-6791-48f8-8097-419d49a5783b, UID in object meta:

Currently on EKS 1.19

@gaja-hp
Copy link

gaja-hp commented Jan 26, 2022

Hello @hpdobrica did you find any solution for this issue? We are still stuck with this issue. thanks.

@hpdobrica
Copy link

Hey @gaja-hp, we didn't exactly "find a solution", but we mitigated the issue by moving away from service account authentication towards using basic authentication.

However, I might have an idea why the issue is occuring:

The deletion process is fueled by k8s ownerReferences - the idea is that once NatsServiceRole is deleted, the secret will be deleted as well because NatsServiceRole is its owner.

I believe the problem exists because ownerReference connection is not meant to function accross different namespaces
(see note in https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/#owner-references-in-object-specifications) but not totally sure, i might be missing something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants