From 67b2877d6edeb6ed633c529afdc40f954714654b Mon Sep 17 00:00:00 2001 From: Gorka Eguileor Date: Wed, 29 May 2024 12:00:06 +0200 Subject: [PATCH] Wait for DB writes to propagate (causality checks) Because we deploy the database in multi-master mode we can have cases where a service writes something in the database and when another one reads from the DB that data is not yet there. This is very problematic, because all the cinder code assumes that that can never happen. For example we've seen in CI jobs the following behavior: - Cinder api creates a worker registry in the DB when deleting a volume on DB node #1. - Cinder api makes an RPC call to cinder-volume to delete the volume. - Cinder volume tries to read the worker registry from DB node #2 but the data is not there yet, so it misbehaves. In this patch we change the default value on the DB engine of not waiting for writes to wait on read, update, and insert. This will have a performance impact, but the alternative is for cinder to misbehave. We use `mysql_wsrep_sync_wait` from oslo.db [1] setting it to 7 as per the documented values of this parameter in the DBMS [2][3]. [1]: https://opendev.org/openstack/oslo.db/commit/009d23df45969036c70e4cf59eb4019aaace9a55 [2]: https://mariadb.com/docs/server/ref/mdb/system-variables/wsrep_sync_wait/ [3]: https://galeracluster.com/library/documentation/mysql-wsrep-options.html --- templates/cinder/config/00-global-defaults.conf | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/templates/cinder/config/00-global-defaults.conf b/templates/cinder/config/00-global-defaults.conf index ca115fcb..bdd8086e 100644 --- a/templates/cinder/config/00-global-defaults.conf +++ b/templates/cinder/config/00-global-defaults.conf @@ -39,6 +39,11 @@ connection = {{ .DatabaseConnection }} max_retries = -1 db_max_retries = -1 +# Wait for writes to complete when doing a read, update, or insert +# Relevant for multi-master deployments so that workers table works as intended +# https://mariadb.com/docs/server/ref/mdb/system-variables/wsrep_sync_wait/ +mysql_wsrep_sync_wait = 7 + [os_brick] lock_path = /var/locks/openstack/os-brick