Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More than 1 (healthy) master service found in Consul catalog #69

Open
arep opened this issue Dec 10, 2021 · 4 comments
Open

More than 1 (healthy) master service found in Consul catalog #69

arep opened this issue Dec 10, 2021 · 4 comments

Comments

@arep
Copy link

arep commented Dec 10, 2021

We have a redis replication cluster called redis-app-cache running in Nomad and using Consul.
We use Resec, and it registers services in consul called redis-app-cache with tags master and slave.

When all is OK, it looks like this in consul:
redis-node-1 -> slave
redis-node-2 -> master
redis-node-3 -> slave

After a network disconnect of 5 seconds (the master node lost network connectivity for some reason)

Status in consul after this is now:
redis-node-1 -> not in consul, but thinks it's slave (throught the resec debug USR1 signal)
redis-node-2 -> master in consul, but thinks it's slave
redis-node-3 -> master in consul, and thinks it's master

With two masters, the application connects to the wrong redis and it doesn't work obviously.

It seems that the redis-node-2 which was the master, switched to slave mode, tried to update consul but couldn't
(because the network issue still existed), but then didn't update consul when it got connectivity back.
And in the mean time the redis-node-3 has now become master, resulting in two masters in consul.

I'm not sure what is wrong, but I suspect a bug in Resec...
In the logs we see that redis-node-2 does a "Consul Lock successfully released", but I'm not sure if that
is an actual success, if it only releases the lock, or if it also tries to deregister the service.

Then to fix this, I restart Resec on the redis-node-2 (the original master), and all is well after that.

RESEC logs:

redis-node-3
time="2021-12-10T12:10:53Z" level=warning msg="No (healthy) master service found in Consul catalog" system=consul
redis-node-1
time="2021-12-10T12:10:53Z" level=warning msg="No (healthy) master service found in Consul catalog" system=consul
redis-node-2
time="2021-12-10T12:10:54Z" level=warning msg="No (healthy) master service found in Consul catalog" system=consul
redis-node-3
time="2021-12-10T12:10:54Z" level=warning msg="No (healthy) master service found in Consul catalog" system=consul
redis-node-2
time="2021-12-10T12:10:54Z" level=error msg="Consul Lock error channel was closed, we no longer hold the lock" system=consul
redis-node-2
time="2021-12-10T12:10:54Z" level=warning msg="No (healthy) master service found in Consul catalog" system=consul
redis-node-2
time="2021-12-10T12:10:54Z" level=info msg="Consul Lock successfully released" system=consul
redis-node-1
time="2021-12-10T12:10:54Z" level=warning msg="No (healthy) master service found in Consul catalog" system=consul
redis-node-1
time="2021-12-10T12:10:55Z" level=warning msg="No (healthy) master service found in Consul catalog" system=consul
redis-node-3
time="2021-12-10T12:10:55Z" level=warning msg="No (healthy) master service found in Consul catalog" system=consul
redis-node-2
time="2021-12-10T12:10:54Z" level=info msg="Reconfigure Redis as slave" state=consul_update_service system=reconciler
redis-node-2
time="2021-12-10T12:10:54Z" level=info msg="Reconciler state transitioned from 'consul_update_service' to 'run_as_slave'" old_state=consul_update_service state=run_as_slave system=reconciler
redis-node-2
time="2021-12-10T12:10:54Z" level=info msg="Enslaving redis to be slave of 10.0.0.3:6380" redis_addr="10.0.0.3:6380" system=redis
redis-node-2
time="2021-12-10T12:10:54Z" level=info msg="Enslaved redis to be slave of 10.0.0.3:6380" redis_addr="10.0.0.3:6380" role=slave system=redis
redis-node-2
time="2021-12-10T12:10:54Z" level=warning msg="Disconnecting all clients" redis_addr="10.0.0.3:6380" role=slave system=redis
redis-node-2
time="2021-12-10T12:10:54Z" level=warning msg="Disconnected 5 users" redis_addr="10.0.0.3:6380" role=slave system=redis
redis-node-2
time="2021-12-10T12:10:54Z" level=info msg="Reconfigure Redis as slave" state=run_as_slave system=reconciler
redis-node-2
time="2021-12-10T12:10:54Z" level=info msg="Enslaving redis to be slave of 10.0.0.3:6380" redis_addr="10.0.0.3:6380" role=slave system=redis
redis-node-2
time="2021-12-10T12:10:54Z" level=info msg="Enslaved redis to be slave of 10.0.0.3:6380" redis_addr="10.0.0.3:6380" role=slave system=redis
redis-node-2
time="2021-12-10T12:10:54Z" level=warning msg="Disconnecting all clients" redis_addr="10.0.0.3:6380" role=slave system=redis
redis-node-2
time="2021-12-10T12:10:54Z" level=warning msg="Disconnected 0 users" redis_addr="10.0.0.3:6380" role=slave system=redis
redis-node-2
time="2021-12-10T12:10:54Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:54Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:55Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:55Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:55Z" level=info msg="Reconciler state transitioned from 'run_as_slave' to 'consul_update_service'" old_state=run_as_slave state=consul_update_service system=reconciler
redis-node-2
time="2021-12-10T12:10:55Z" level=warning msg="No (healthy) master service found in Consul catalog" system=consul
redis-node-2
time="2021-12-10T12:10:55Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:55Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:55Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:55Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:55Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:55Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:56Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:56Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:56Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:56Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:56Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:56Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:56Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:56Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:57Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:57Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:57Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:57Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:57Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:57Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:58Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:58Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:58Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:58Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:58Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:58Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:58Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:58Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:59Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:59Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:59Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:59Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:59Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:59Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:10:59Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:10:59Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:00Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:00Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:00Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:00Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:00Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:00Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:00Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:00Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:01Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:01Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:01Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:01Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:01Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:01Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:01Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:01Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:02Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:02Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:02Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:02Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:02Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:02Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:02Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:02Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:03Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:03Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:03Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:03Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:03Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:03Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:03Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:03Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:04Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:04Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:04Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:04Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:04Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:04Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-2
time="2021-12-10T12:11:04Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:04Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-1
time="2021-12-10T12:11:05Z" level=warning msg="Master link is down, can't serve traffic" state=consul_update_service system=reconciler
redis-node-1
time="2021-12-10T12:11:05Z" level=info msg="Reconciler state transitioned from 'consul_update_service' to 'master_link_down'" old_state=consul_update_service state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:05Z" level=warning msg="No (healthy) master service found in Consul catalog" system=consul
redis-node-3
time="2021-12-10T12:11:05Z" level=warning msg="No (healthy) master service found in Consul catalog" system=consul
redis-node-2
time="2021-12-10T12:11:05Z" level=warning msg="No (healthy) master service found in Consul catalog" system=consul
redis-node-2
time="2021-12-10T12:11:05Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:11:05Z" level=error msg="Consul error: failed to create session: Unexpected response code: 500 (rpc error making call: Check 'serfHealth' is in critical state)" system=consul
redis-node-1
time="2021-12-10T12:11:05Z" level=warning msg="No (healthy) master service found in Consul catalog" system=consul
redis-node-3
time="2021-12-10T12:11:05Z" level=warning msg="Master link is down, can't serve traffic" state=consul_update_service system=reconciler
redis-node-3
time="2021-12-10T12:11:05Z" level=info msg="Reconciler state transitioned from 'consul_update_service' to 'master_link_down'" old_state=consul_update_service state=master_link_down system=reconciler
redis-node-3
time="2021-12-10T12:11:05Z" level=warning msg="No (healthy) master service found in Consul catalog" system=consul
redis-node-2
time="2021-12-10T12:11:05Z" level=warning msg="No (healthy) master service found in Consul catalog" system=consul
redis-node-2
time="2021-12-10T12:11:05Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-1
time="2021-12-10T12:11:06Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-3
time="2021-12-10T12:11:06Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-3
time="2021-12-10T12:11:06Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:06Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:07Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-3
time="2021-12-10T12:11:07Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:08Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-3
time="2021-12-10T12:11:08Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-3
time="2021-12-10T12:11:08Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:08Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:09Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-3
time="2021-12-10T12:11:09Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-3
time="2021-12-10T12:11:09Z" level=info msg="Lock successfully acquired" system=consul
redis-node-3
time="2021-12-10T12:11:09Z" level=info msg="Configure Redis as master" state=master_link_down system=reconciler
redis-node-3
time="2021-12-10T12:11:09Z" level=info msg="Reconciler state transitioned from 'master_link_down' to 'run_as_master'" old_state=master_link_down state=run_as_master system=reconciler
redis-node-3
time="2021-12-10T12:11:09Z" level=info msg="Registering redis-app-cache service in consul" system=consul
redis-node-3
time="2021-12-10T12:11:09Z" level=info msg="Promoted redis to Master" redis_addr="10.0.0.5:6380" role=master system=redis
redis-node-3
time="2021-12-10T12:11:09Z" level=info msg="Registered service redis-app-cache ([email protected]:6380) with address 10.0.0.5:6380" system=consul
redis-node-3
time="2021-12-10T12:11:09Z" level=error msg="More than 1 (healthy) master service found in Consul catalog" system=consul
redis-node-2
time="2021-12-10T12:11:09Z" level=error msg="More than 1 (healthy) master service found in Consul catalog" system=consul
redis-node-1
time="2021-12-10T12:11:09Z" level=error msg="More than 1 (healthy) master service found in Consul catalog" system=consul
redis-node-1
time="2021-12-10T12:11:10Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-2
time="2021-12-10T12:11:10Z" level=error msg="More than 1 (healthy) master service found in Consul catalog" system=consul
redis-node-1
time="2021-12-10T12:11:10Z" level=error msg="More than 1 (healthy) master service found in Consul catalog" system=consul
redis-node-3
time="2021-12-10T12:11:10Z" level=info msg="Reconciler state transitioned from 'run_as_master' to 'consul_update_service'" old_state=run_as_master state=consul_update_service system=reconciler
redis-node-3
time="2021-12-10T12:11:10Z" level=error msg="More than 1 (healthy) master service found in Consul catalog" system=consul
redis-node-1
time="2021-12-10T12:11:10Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:11Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:12Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:12Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:13Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:14Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:14Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:15Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:16Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:16Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:17Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:18Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:18Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:19Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:20Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:20Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:21Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:22Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:22Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:23Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:24Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:24Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:25Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:26Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:26Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:27Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:28Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:28Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:29Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:30Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:30Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:31Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:32Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:32Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:33Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:34Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:34Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:35Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:36Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler

In consul now:
redis-node-1 -> not in consul
redis-node-2 -> master
redis-node-3 -> master

redis-node-3

time="2021-12-10T12:11:36Z" level=warning msg="{
"Ready": true,
"Healthy": true,
"Master": true,
"MasterAddr": "10.0.0.3",
"MasterPort": 6380,
"Stopped": false
}" dump_state=consul state=consul_update_service system=reconciler
redis-node-3
time="2021-12-10T12:11:36Z" level=warning msg="{
"Healthy": true,
"Ready": true,
"Info": {
"Role": "master",
"Loading": false,
"MasterLinkUp": false,
"MasterLinkDownSince": 0,
"MasterSyncInProgress": false,
"MasterHost": "",
"MasterPort": 0
},
"InfoString": "# Server
redis_version:6.2.4
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:8a9acb02365e03e5
redis_mode:standalone
os:Linux 5.4.0-89-generic x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:10.2.1
process_id:1
process_supervised:no
run_id:af48ce2805dbb75e54eaec08214274469588d998
tcp_port:6379
server_time_usec:1639138270257837
uptime_in_seconds:2514
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:11748318
executable:/data/redis-server
config_file:/local/redis.conf
io_threads_active:0

Clients

connected_clients:1
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:32
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

Memory

used_memory:1926752
used_memory_human:1.84M
used_memory_rss:6066176
used_memory_rss_human:5.79M
used_memory_peak:2028864
used_memory_peak_human:1.93M
used_memory_peak_perc:94.97%
used_memory_overhead:1880024
used_memory_startup:809864
used_memory_dataset:46728
used_memory_dataset_perc:4.18%
allocator_allocated:2004064
allocator_active:2347008
allocator_resident:4792320
total_system_memory:8149606400
total_system_memory_human:7.59G
used_memory_lua:44032
used_memory_lua_human:43.00K
used_memory_scripts:656
used_memory_scripts_human:656B
number_of_cached_scripts:2
maxmemory:520093696
maxmemory_human:496.00M
maxmemory_policy:allkeys-lfu
allocator_frag_ratio:1.17
allocator_frag_bytes:342944
allocator_rss_ratio:2.04
allocator_rss_bytes:2445312
rss_overhead_ratio:1.27
rss_overhead_bytes:1273856
mem_fragmentation_ratio:3.22
mem_fragmentation_bytes:4182176
mem_not_counted_for_evict:0
mem_replication_backlog:1048576
mem_clients_slaves:0
mem_clients_normal:20512
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0

Persistence

loading:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:1660
rdb_bgsave_in_progress:0
rdb_last_save_time:1639135756
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

Stats

total_connections_received:169
total_commands_processed:6089
instantaneous_ops_per_sec:1
total_net_input_bytes:232910
total_net_output_bytes:12905528
instantaneous_input_kbps:0.03
instantaneous_output_kbps:2.67
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:0
dump_payload_sanitizations:0
total_reads_processed:4598
total_writes_processed:6002
io_threaded_reads_processed:0
io_threaded_writes_processed:0

Replication

role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:b2c01736f8ee3b6b111fefd5cac424e119d8ff04
master_replid2:8ae1b265ce319c1786d32ce56d2764b60fe97a78
master_repl_offset:164207
second_repl_offset:164208
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:959
repl_backlog_histlen:163249

CPU

used_cpu_sys:3.425058
used_cpu_user:3.473360
used_cpu_sys_children:0.000000
used_cpu_user_children:0.002087
used_cpu_sys_main_thread:3.330303
used_cpu_user_main_thread:3.387119

Modules

Errorstats

Cluster

cluster_enabled:0

Keyspace

db0:keys=8,expires=0,avg_ttl=0
",
"Stopped": false
}" dump_state=redis state=consul_update_service system=reconciler
redis-node-1
time="2021-12-10T12:11:36Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:37Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:38Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:38Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:39Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:40Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:40Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:41Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:42Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:42Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:43Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:44Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:44Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:45Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:46Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:46Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:47Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:48Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:48Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:49Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:50Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:50Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:51Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:52Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:52Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:53Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:54Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:54Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:55Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:55Z" level=warning msg="{
"Ready": true,
"Healthy": true,
"Master": false,
"MasterAddr": "10.0.0.3",
"MasterPort": 6380,
"Stopped": false
}" dump_state=consul state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:55Z" level=warning msg="{
"Healthy": true,
"Ready": true,
"Info": {
"Role": "slave",
"Loading": false,
"MasterLinkUp": false,
"MasterLinkDownSince": 61000000000,
"MasterSyncInProgress": false,
"MasterHost": "10.0.0.3",
"MasterPort": 6380
},
"InfoString": "# Server
redis_version:6.2.4
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:8a9acb02365e03e5
redis_mode:standalone
os:Linux 5.4.0-89-generic x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:10.2.1
process_id:1
process_supervised:no
run_id:61ac6daecff818b31d207aa1c544db041a77b989
tcp_port:6379
server_time_usec:1639138315117024
uptime_in_seconds:2591
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:11748363
executable:/data/redis-server
config_file:/local/redis.conf
io_threads_active:0

Clients

connected_clients:1
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:16
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

Memory

used_memory:1990344
used_memory_human:1.90M
used_memory_rss:5959680
used_memory_rss_human:5.68M
used_memory_peak:2030272
used_memory_peak_human:1.94M
used_memory_peak_perc:98.03%
used_memory_overhead:1900504
used_memory_startup:809864
used_memory_dataset:89840
used_memory_dataset_perc:7.61%
allocator_allocated:2044896
allocator_active:2379776
allocator_resident:4747264
total_system_memory:8149614592
total_system_memory_human:7.59G
used_memory_lua:44032
used_memory_lua_human:43.00K
used_memory_scripts:656
used_memory_scripts_human:656B
number_of_cached_scripts:2
maxmemory:520093696
maxmemory_human:496.00M
maxmemory_policy:allkeys-lfu
allocator_frag_ratio:1.16
allocator_frag_bytes:334880
allocator_rss_ratio:1.99
allocator_rss_bytes:2367488
rss_overhead_ratio:1.26
rss_overhead_bytes:1212416
mem_fragmentation_ratio:3.06
mem_fragmentation_bytes:4012104
mem_not_counted_for_evict:0
mem_replication_backlog:1048576
mem_clients_slaves:0
mem_clients_normal:40992
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0

Persistence

loading:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:1667
rdb_bgsave_in_progress:0
rdb_last_save_time:1639135724
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

Stats

total_connections_received:175
total_commands_processed:6207
instantaneous_ops_per_sec:0
total_net_input_bytes:235468
total_net_output_bytes:13300147
instantaneous_input_kbps:0.01
instantaneous_output_kbps:2.64
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:0
dump_payload_sanitizations:0
total_reads_processed:4712
total_writes_processed:6114
io_threaded_reads_processed:0
io_threaded_writes_processed:0

Replication

role:slave
master_host:10.0.0.3
master_port:6380
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:164207
master_link_down_since_seconds:61
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:8ae1b265ce319c1786d32ce56d2764b60fe97a78
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:164207
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:164207

CPU

used_cpu_sys:3.603388
used_cpu_user:3.122053
used_cpu_sys_children:0.000000
used_cpu_user_children:0.002097
used_cpu_sys_main_thread:3.493752
used_cpu_user_main_thread:3.055389

Modules

Errorstats

Cluster

cluster_enabled:0

Keyspace

db0:keys=8,expires=0,avg_ttl=0
",
"Stopped": false
}" dump_state=redis state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:56Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:56Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:57Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:58Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:58Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:11:59Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:12:00Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:12:00Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:12:01Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:12:02Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:12:02Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:12:03Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:12:04Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:12:04Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:12:05Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:12:06Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:12:06Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:12:07Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:12:08Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:12:08Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-2
time="2021-12-10T12:12:09Z" level=warning msg="{
"Ready": true,
"Healthy": true,
"Master": false,
"MasterAddr": "10.0.0.3",
"MasterPort": 6380,
"Stopped": false
}" dump_state=consul state=consul_update_service system=reconciler
redis-node-2
time="2021-12-10T12:12:09Z" level=warning msg="{
"Healthy": true,
"Ready": true,
"Info": {
"Role": "slave",
"Loading": false,
"MasterLinkUp": false,
"MasterLinkDownSince": -1000000000,
"MasterSyncInProgress": false,
"MasterHost": "10.0.0.3",
"MasterPort": 6380
},
"InfoString": "# Server
redis_version:6.2.4
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:8a9acb02365e03e5
redis_mode:standalone
os:Linux 5.4.0-90-generic x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:10.2.1
process_id:1
process_supervised:no
run_id:eb7dafc6a74dfe01dfb3d37918ee3cdf1f4965d8
tcp_port:6379
server_time_usec:1639138255193239
uptime_in_seconds:2557
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:11748303
executable:/data/redis-server
config_file:/local/redis.conf
io_threads_active:0

Clients

connected_clients:1
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:40
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

Memory

used_memory:878448
used_memory_human:857.86K
used_memory_rss:6205440
used_memory_rss_human:5.92M
used_memory_peak:2173424
used_memory_peak_human:2.07M
used_memory_peak_perc:40.42%
used_memory_overhead:831424
used_memory_startup:809864
used_memory_dataset:47024
used_memory_dataset_perc:68.56%
allocator_allocated:1062472
allocator_active:1404928
allocator_resident:5029888
total_system_memory:8149610496
total_system_memory_human:7.59G
used_memory_lua:40960
used_memory_lua_human:40.00K
used_memory_scripts:656
used_memory_scripts_human:656B
number_of_cached_scripts:2
maxmemory:520093696
maxmemory_human:496.00M
maxmemory_policy:allkeys-lfu
allocator_frag_ratio:1.32
allocator_frag_bytes:342456
allocator_rss_ratio:3.58
allocator_rss_bytes:3624960
rss_overhead_ratio:1.23
rss_overhead_bytes:1175552
mem_fragmentation_ratio:7.43
mem_fragmentation_bytes:5369736
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:20520
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0

Persistence

loading:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:996
rdb_bgsave_in_progress:0
rdb_last_save_time:1639135757
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:479232
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

Stats

total_connections_received:1188
total_commands_processed:12018
instantaneous_ops_per_sec:11
total_net_input_bytes:817977
total_net_output_bytes:13500380
instantaneous_input_kbps:0.42
instantaneous_output_kbps:2.77
rejected_connections:0
sync_full:2
sync_partial_ok:0
sync_partial_err:2
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:63
evicted_keys:0
keyspace_hits:1332
keyspace_misses:426
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:525
total_forks:2
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:2
dump_payload_sanitizations:0
total_reads_processed:12177
total_writes_processed:7854
io_threaded_reads_processed:0
io_threaded_writes_processed:0

Replication

role:slave
master_host:10.0.0.3
master_port:6380
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:1
master_link_down_since_seconds:-1
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:8ae1b265ce319c1786d32ce56d2764b60fe97a78
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:164207
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:164207

CPU

used_cpu_sys:4.213468
used_cpu_user:4.802098
used_cpu_sys_children:0.007998
used_cpu_user_children:0.001543
used_cpu_sys_main_thread:3.934551
used_cpu_user_main_thread:4.650770

Modules

Errorstats

errorstat_NOMASTERLINK:count=2

Cluster

cluster_enabled:0

Keyspace

db0:keys=8,expires=0,avg_ttl=0
",
"Stopped": false
}" dump_state=redis state=consul_update_service system=reconciler
redis-node-1
time="2021-12-10T12:12:09Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:12:10Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:12:10Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:12:11Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler

Restarting RESEC on the original master node

redis-node-1
time="2021-12-10T12:13:50Z" level=warning msg="Master link is down, can't serve traffic" state=master_link_down system=reconciler
redis-node-2
time="2021-12-10T12:13:50Z" level=warning msg="Caught signal, stopping reconciler loop" state=consul_update_service system=reconciler
redis-node-2
time="2021-12-10T12:13:50Z" level=info msg="Stop command sent to Redis" redis_addr="10.0.0.3:6380" role=slave system=redis
redis-node-2
time="2021-12-10T12:13:50Z" level=info msg="Shutting down Redis command runner" redis_addr="10.0.0.3:6380" role=slave system=redis
redis-node-2
time="2021-12-10T12:13:50Z" level=info msg="Reconfigure Redis as slave" state=consul_update_service system=reconciler
redis-node-2
time="2021-12-10T12:13:50Z" level=info msg="Reconciler state transitioned from 'consul_update_service' to 'run_as_slave'" old_state=consul_update_service state=run_as_slave system=reconciler
redis-node-1
time="2021-12-10T12:13:50Z" level=info msg="Reconfigure Redis as slave" state=master_link_down system=reconciler
redis-node-1
time="2021-12-10T12:13:50Z" level=info msg="Reconciler state transitioned from 'master_link_down' to 'run_as_slave'" old_state=master_link_down state=run_as_slave system=reconciler
redis-node-1
time="2021-12-10T12:13:50Z" level=info msg="Enslaving redis to be slave of 10.0.0.5:6380" redis_addr="10.0.0.4:6380" role=slave system=redis
redis-node-1
time="2021-12-10T12:13:50Z" level=info msg="Enslaved redis to be slave of 10.0.0.5:6380" redis_addr="10.0.0.4:6380" role=slave system=redis
redis-node-1
time="2021-12-10T12:13:50Z" level=warning msg="Disconnecting all clients" redis_addr="10.0.0.4:6380" role=slave system=redis
redis-node-1
time="2021-12-10T12:13:50Z" level=warning msg="Disconnected 0 users" redis_addr="10.0.0.4:6380" role=slave system=redis
redis-node-2
time="2021-12-10T12:13:50Z" level=info msg="Reconfigure Redis as slave" state=run_as_slave system=reconciler
redis-node-1
time="2021-12-10T12:13:50Z" level=info msg="Reconfigure Redis as slave" state=run_as_slave system=reconciler
redis-node-1
time="2021-12-10T12:13:50Z" level=info msg="Enslaving redis to be slave of 10.0.0.5:6380" redis_addr="10.0.0.4:6380" role=slave system=redis
redis-node-1
time="2021-12-10T12:13:50Z" level=info msg="Enslaved redis to be slave of 10.0.0.5:6380" redis_addr="10.0.0.4:6380" role=slave system=redis
redis-node-1
time="2021-12-10T12:13:50Z" level=warning msg="Disconnecting all clients" redis_addr="10.0.0.4:6380" role=slave system=redis
redis-node-1
time="2021-12-10T12:13:50Z" level=warning msg="Disconnected 0 users" redis_addr="10.0.0.4:6380" role=slave system=redis
redis-node-1
time="2021-12-10T12:13:51Z" level=info msg="Registering redis-app-cache service in consul" system=consul
redis-node-1
time="2021-12-10T12:13:51Z" level=info msg="Reconciler state transitioned from 'run_as_slave' to 'consul_update_service'" old_state=run_as_slave state=consul_update_service system=reconciler
redis-node-1
time="2021-12-10T12:13:51Z" level=info msg="Registered service redis-app-cache ([email protected]:6380) with address 10.0.0.4:6380" system=consul
redis-node-2
time="2021-12-10T12:13:51Z" level=info msg="Shutdown requested, stopping state loop" state=run_as_slave system=reconciler
redis-node-2
time="2021-12-10T12:13:51Z" level=info msg="Shutdown requested, stopping reconciler loop" state=run_as_slave system=reconciler
redis-node-2
time="2021-12-10T12:13:52Z" level=info msg="Starting ReSeC v.1.1.2"
redis-node-2
time="2021-12-10T12:13:52Z" level=warning msg="Redis still missing initial state" state=unknown system=reconciler
redis-node-2
time="2021-12-10T12:13:52Z" level=info msg="Reconciler state transitioned from 'unknown' to 'missing_state'" old_state=unknown state=missing_state system=reconciler
redis-node-2
time="2021-12-10T12:13:52Z" level=info msg="Trying to acquire consul lock" system=consul
redis-node-2
time="2021-12-10T12:13:52Z" level=warning msg="Redis still missing initial state" state=missing_state system=reconciler
redis-node-2
time="2021-12-10T12:13:53Z" level=info msg="Reconfigure Redis as slave" state=missing_state system=reconciler
redis-node-2
time="2021-12-10T12:13:53Z" level=info msg="Reconciler state transitioned from 'missing_state' to 'run_as_slave'" old_state=missing_state state=run_as_slave system=reconciler
redis-node-2
time="2021-12-10T12:13:53Z" level=info msg="Enslaving redis to be slave of 10.0.0.5:6380" redis_addr="10.0.0.3:6380" system=redis
redis-node-2
time="2021-12-10T12:13:53Z" level=info msg="Enslaved redis to be slave of 10.0.0.5:6380" redis_addr="10.0.0.3:6380" role=slave system=redis
redis-node-2
time="2021-12-10T12:13:53Z" level=warning msg="Disconnecting all clients" redis_addr="10.0.0.3:6380" role=slave system=redis
redis-node-2
time="2021-12-10T12:13:53Z" level=warning msg="Disconnected 0 users" redis_addr="10.0.0.3:6380" role=slave system=redis
redis-node-2
time="2021-12-10T12:13:54Z" level=info msg="Reconciler state transitioned from 'run_as_slave' to 'consul_update_service'" old_state=run_as_slave state=consul_update_service system=reconciler
redis-node-2
time="2021-12-10T12:13:54Z" level=info msg="Registering redis-app-cache service in consul" system=consul
redis-node-2
time="2021-12-10T12:13:54Z" level=info msg="Registered service redis-app-cache ([email protected]:6380) with address 10.0.0.3:6380" system=consul

@MattRiddell
Copy link

Hah I'm seeing the same thing and this was the only info I could find on the internet about it.

I don't suppose you managed to resolve it?

@arep
Copy link
Author

arep commented Jan 12, 2022

No, no solution yet.

@arep
Copy link
Author

arep commented Aug 23, 2022

We created a workaround that auto-fixes this issue when it happens. This way we don't need to do any manual intervention to get the multiple master issue resolved.
A script that runs beside resec and checks if there is a multi master state. If it is, the node that was master before will issue a restart command to the resec task.

This is the PHP script we use. Should be easy to rewrite to any language.

<?php
$lastRestart=null;
$isMaster=false;
while (true){
    sleep(5);
    $dnsArr = dns_get_record(getenv("MASTER_TAGS").".".getenv("CONSUL_SERVICE_NAME").".service.consul");
    $now = time();
    if (count($dnsArr)>1){
        //multi master
        echo "Found multi master configuration...\n";
        print_r($dnsArr);

        if ($lastRestart!==null && $now - $lastRestart < 30){
            echo "Too soon after last restart...doing nothing...\n";
            continue;
        }

        if (!$isMaster){
            echo "I am not master, doing nothing...\n";
            continue;
        }
        
        //Restart resec task
        $url = getenv("NOMAD_ADDR")."/v1/client/allocation/".getenv("NOMAD_ALLOC_ID")."/restart";
        $data = json_encode(
            array('TaskName' => 'resec')
        );
        echo "POST to $url with data\n";
        print_r($data);

        $ch = curl_init();
        curl_setopt($ch,CURLOPT_URL, $url);
        curl_setopt($ch,CURLOPT_POST, true);
        curl_setopt($ch,CURLOPT_POSTFIELDS, $data);
        curl_setopt($ch,CURLOPT_RETURNTRANSFER, true); 

        //execute post
        $result = curl_exec($ch);
        var_dump($result);

        $lastRestart = time();
    }else{
        if (count($dnsArr)>0){
            $firstDns = $dnsArr[0];
            if ($firstDns["ip"] == getenv("MY_OWN_IP")){
                $isMaster = true;
            }else{
                $isMaster = false;
            }
        }
    }
}

@bdossantos
Copy link

We created a workaround that auto-fixes this issue when it happens. This way we don't need to do any manual intervention to get the multiple master issue resolved. A script that runs beside resec and checks if there is a multi master state. If it is, the node that was master before will issue a restart command to the resec task.

This is the PHP script we use. Should be easy to rewrite to any language.

<?php
$lastRestart=null;
$isMaster=false;
while (true){
    sleep(5);
    $dnsArr = dns_get_record(getenv("MASTER_TAGS").".".getenv("CONSUL_SERVICE_NAME").".service.consul");
    $now = time();
    if (count($dnsArr)>1){
        //multi master
        echo "Found multi master configuration...\n";
        print_r($dnsArr);

        if ($lastRestart!==null && $now - $lastRestart < 30){
            echo "Too soon after last restart...doing nothing...\n";
            continue;
        }

        if (!$isMaster){
            echo "I am not master, doing nothing...\n";
            continue;
        }
        
        //Restart resec task
        $url = getenv("NOMAD_ADDR")."/v1/client/allocation/".getenv("NOMAD_ALLOC_ID")."/restart";
        $data = json_encode(
            array('TaskName' => 'resec')
        );
        echo "POST to $url with data\n";
        print_r($data);

        $ch = curl_init();
        curl_setopt($ch,CURLOPT_URL, $url);
        curl_setopt($ch,CURLOPT_POST, true);
        curl_setopt($ch,CURLOPT_POSTFIELDS, $data);
        curl_setopt($ch,CURLOPT_RETURNTRANSFER, true); 

        //execute post
        $result = curl_exec($ch);
        var_dump($result);

        $lastRestart = time();
    }else{
        if (count($dnsArr)>0){
            $firstDns = $dnsArr[0];
            if ($firstDns["ip"] == getenv("MY_OWN_IP")){
                $isMaster = true;
            }else{
                $isMaster = false;
            }
        }
    }
}

On my side, I hade the same issue, and I "fixed" it directly via my redis healthcheck.

When I detect more than 1 primary, I kill a redis to force resec to proceed to a new leader election.

      service {
        name = "${JOB}-${TASKGROUP}-${TASK}-stonith"
        port = "redis"
        tags = ["command", "stonith"]

        check {
          type    = "script"
          command = "/bin/bash"
          args = [
            "-c",
            "set -euo pipefail; if [[ $(getent hosts _$replace-by-service-name._primary.service.consul | wc -l) -ge 2 ]]; then sleep $((10 + RANDOM % 5))s; exit 2; else echo 'OK'; fi"
          ]
          interval  = "30s"
          timeout   = "5s"
          on_update = "ignore_warnings"

          check_restart {
            limit           = 50
            grace           = "60s"
            ignore_warnings = true
          }
        }
      }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants