-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scylla_cluster: fix handling of wait_other_notice #462
Conversation
@nyh seems like there are some places in dtest using |
Do we run dtests also on Cassandra? If we do, my new function won't work there :-( |
we could run with cassnadra, but we don't do that regularly we could overload |
78fe133
to
a5d16bf
Compare
Pushed new version with that "_" thing. |
11acc47
to
3d405ca
Compare
Pushed a new version, the REST API now also waits for nodes to know other nodes' tokens. |
Ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@bhalevy, as whom refactored it a few time, care to also give it a look ? |
for n in check: | ||
response = requests.get(url=url_tokens+n) | ||
if response.text == '[]': | ||
tofind.add(n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how this would work with topology-related tests that currently track the log to inject some event during e.g. bootstrap or replace.
Although they can pass wait_other_notice=False
and implement their own state tracking but they might break with this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what kind of tests you mean, but we can see if any tests starts failing with this patch.
I think that if a certain test wants to capture some exact moment during the middle of the boot process, it can't ask ccm to wait until the boot is ove.
@nyh let's do a run of this change without the wait for gossip change, to see that we aren't breaking any tests with it. And then we could merge it |
@eliransin - do you need to review this to get this moving forward? |
running this with a full suite of dtest, to verify it's not causing regressions (or not too many of them) |
it's a change that affects all of the tests, I would want it tested first (@nyh has tested it so far only with the |
After starting a multi-node cluster, it's important to wait until all nodes are aware that all other nodes are available, otherwise if the user sends a CL=ALL request to one node, it might not be aware that the other replicas are usable, and fail the request. For this reason a cluster's start() has a wait_other_notice=True option, and dtests correctly use it. However, the implementation doesn't wait for the right thing... To check if node A thinks that B is available, it checks that A's log contains the message "InetAddress B is now UP". But this message is unreliable - when it is printed, A still doesn't think that B is fully available - it can still think that B is in a "joining" state for a while longer. If wait_other_notice returns at this point, and the user sends a CL=ALL request to node A, it will fail. The solution I propose in this patch uses the REST API, instead of the log, to wait until node A thinks node B is both live and finished joining. This patch is needed if Scylla is modified to boot up faster. We start seeing dtests which use RF=ALL in the beginning of a test failing, because the node we contact doesn't know that the other nodes are usable. Fixes scylladb#461 Signed-off-by: Nadav Har'El <[email protected]>
In the previous patch, we introduced the wait_rest_for_alive() function that uses the REST API to wait for node A to be able to use node B. We waited for node A to consider node B "alive" and "not joining" (in nodetool status, this is called "UN" state), but for some tests this is not enough: If the test sends A a read with CL=ALL, it needs to know B's tokens to send it one of the reads. It turns out that a node A can think that B is alive before B has gossipped its tokens. So in this patch we modify wait_rest_for_alive() to use the REST API to **also** ensure that node A is aware of B's tokens. An example dtest that is fixed by this patch is cdc_test.py:: TestCdcWithCompactStorage::test_artificial_column_with_type_empty_is_missing Signed-off-by: Nadav Har'El <[email protected]>
When starting a node with "wait_other_notice", the existing code waited until all other nodes think the new node is ready. However, it's safer to also check that the new node thinks all the other nodes are ready. It should usually be the case (the new node gets information on all other nodes in the first gossip), but it doesn't strictly have to be true so better ensure that it is. Signed-off-by: Nadav Har'El <[email protected]>
3d405ca
to
c12ded5
Compare
pushed new version - rebased, and changed {} to set() as @bhalevy noticed. |
I think you did this above, but I don't know how to read the results (if I understood correctly, we can't expect zero failures?). |
@nyh only a handful of failures some are timing out the 2min timeout of currently introduced wait function. |
this test expect boot to failed, test should enable wait notice others fails a bit like failures in debug runs... missing data in node3, doesn't fail like that on master runs |
@nyh gating passed o.k. so we are left with 3 tests that seems to regress (and we didn't tested enterprise features, even that most of them are single node) @bhalevy @eliransin, it's your call if we are o.k. with fixing those tests later, this should be block until someone would be able to attend to those ? |
I don't see how it would be ok to introduce regressions |
It's not ok, but to be honest, I don't have the peace of mind to fix random buggy dtests while I keep getting requests to do other things at top priority, so I'm not going to fix this any time soon. |
I started a new run with this: https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/fruch/job/new-dtest-pytest-parallel/ The 3 test failures @fruch mentioned are directly related but looks quite easy to fix, so we can assign someone to fix. However, this PR by itself won't improve runtime and possibly even degrade runtime. |
Thanks.
The idea is to have a dtest PR which runs Scylla without huge sleeps. The branch is https://github.com/nyh/scylla-dtest/commits/quick-boot-2. There is a one-line patch https://github.com/nyh/scylla-dtest/commit/0b23742b4622c9b6e18f4613722ce433640cf41c plus one fix for one test's bug - https://github.com/nyh/scylla-dtest/commit/0b23742b4622c9b6e18f4613722ce433640cf41c. There will need to be more patches because a few dozen tests still fail (not as many as before the ccm patches, and but still, a non-zero list). The next step can be to either make the dtest patch an optional parameter (so developers can use it), but @fruch and @eliransin suggested that it can be the default, and just add a tag for tests that can't work in this "sleepless" mode and only they will be run in the slow mode. |
I'm handling the tests that need to be fixed, so this last run had 8 failure, 6 of them would be solved by https://github.com/scylladb/scylla-dtest/pull/3313. |
I was skeptical, so went to check: for the last 3 runs, we have only 2 failing tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved from my side.
We can merge this and the PR for dtest fixes.
@nyh This is incorrect. It may fix some problems but introduces others Consider the following:
The That's why the old check used the log markers --- the log markers ensure that we don't depend on obsolete state for We're actually now observing this scenario trying to unflake one dtest with @temichus (https://github.com/scylladb/scylla-dtest/pull/3991)... |
I've created PR where the issue can be reproduced https://github.com/scylladb/scylla-dtest/pull/4042/files |
You can stop A with wait_other_notice=True to ensure node B notices that A went down, before restarting it.
|
Which is only a workaround for the problem, the implementation of |
Then it needs to be fixed. Issue nr? |
I opened #563 |
…r_alive scylladb#462 introduced watch_rest_for_alive that replaced the calls to watch_log_for_alive on the scylla node(s) start path. But a node is killed and then restarted, and other nodes miss the kill event, `watch_rest_for_alive` will consider that node already as up as seen by the other nodes, while previously, watch_log_for_alive, waited until other nodes discovered this node as up again, based on markes taken right before (re)starting that node. This change brings this call back. Fixes scylladb#563 Signed-off-by: Benny Halevy <[email protected]>
…r_alive #462 introduced watch_rest_for_alive that replaced the calls to watch_log_for_alive on the scylla node(s) start path. But a node is killed and then restarted, and other nodes miss the kill event, `watch_rest_for_alive` will consider that node already as up as seen by the other nodes, while previously, watch_log_for_alive, waited until other nodes discovered this node as up again, based on markes taken right before (re)starting that node. This change brings this call back. Fixes #563 Signed-off-by: Benny Halevy <[email protected]>
After starting a multi-node cluster, it's important to wait until all nodes are aware that all other nodes are available, otherwise if the user sends a CL=ALL request to one node, it might not be aware that the other replicas are usable, and fail the request.
For this reason a cluster's start() has a wait_other_notice=True option, and dtests correctly use it. However, the implementation doesn't wait for the right thing... To check if node A thinks that B is available, it checks that A's log contains the message "InetAddress B is now UP". But this message is unreliable - when it is printed, A still doesn't think that B is fully available - it can still think that B is in a "joining" state for a while longer. If wait_other_notice returns at this point, and the user sends a CL=ALL request to node A, it will fail.
The solution I propose in this patch uses the REST API, instead of the log, to wait until node A thinks node B is both live and finished joining.
This patch is needed if Scylla is modified to boot up faster. We start seeing dtests which use RF=ALL in the beginning of a test failing, because the node we contact doesn't know that the other nodes are usable.
Fixes #461