Skip to content

Commit

Permalink
scylla_node: wait also for node to know of another node's token
Browse files Browse the repository at this point in the history
In the previous patch, we introduced the wait_rest_for_alive() function
that uses the REST API to wait for node A to be able to use node B.
We waited for node A to consider node B "alive" and "not joining" (in
nodetool status, this is called "UN" state), but for some tests this
is not enough: If the test sends A a read with CL=ALL, it needs to
know B's tokens to send it one of the reads. It turns out that a node
A can think that B is alive before B has gossipped its tokens.

So in this patch we modify wait_rest_for_alive() to use the REST API
to **also** ensure that node A is aware of B's tokens.

An example dtest that is fixed by this patch is cdc_test.py::
TestCdcWithCompactStorage::test_artificial_column_with_type_empty_is_missing

Signed-off-by: Nadav Har'El <[email protected]>
  • Loading branch information
nyh committed Jun 6, 2023
1 parent a5d16bf commit 8ebccab
Showing 1 changed file with 13 additions and 2 deletions.
15 changes: 13 additions & 2 deletions ccmlib/scylla_node.py
Original file line number Diff line number Diff line change
Expand Up @@ -1339,7 +1339,7 @@ def rollback(self, upgrade_to_version):
def watch_rest_for_alive(self, nodes, timeout=120):
"""
Use the REST API to wait until this node detects that the nodes listed
in "nodes" become fully operational (live and no longer "joining").
in "nodes" become fully operational and knows of its tokens.
This is similar to watch_log_for_alive but uses ScyllaDB's REST API
instead of the log file and waits for the node to be really useable,
not just "UP" (see issue #461)
Expand All @@ -1348,6 +1348,7 @@ def watch_rest_for_alive(self, nodes, timeout=120):
tofind = set([node.address() for node in tofind])
url_live = f"http://{self.address()}:10000/gossiper/endpoint/live"
url_joining = f"http://{self.address()}:10000/storage_service/nodes/joining"
url_tokens = f"http://{self.address()}:10000/storage_service/tokens/"
endtime = time.time() + timeout
while time.time() < endtime:
live = {}
Expand All @@ -1357,9 +1358,19 @@ def watch_rest_for_alive(self, nodes, timeout=120):
response = requests.get(url=url_joining)
if response.text:
live = live - set(response.json())
# Verify that node knows not only about the existance of the
# other node, but also its tokens:
if tofind.issubset(live):
# This node thinks that all given nodes are alive and not
# "joining", we're done.
# "joining", we're almost done, but still need to verify
# that the node knows the others' tokens.
check = tofind
tofind = set()
for n in check:
response = requests.get(url=url_tokens+n)
if response.text == '[]':
tofind.add(n)
if not tofind:
return
time.sleep(0.1)
raise TimeoutError(f"watch_rest_for_alive() timeout after {timeout} seconds")
Expand Down

0 comments on commit 8ebccab

Please sign in to comment.