From c94cc4a61a6c213da21b4f161ec43ce1d6f067e7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Rados=C5=82aw=20Piliszek?= <radoslaw.piliszek@gmail.com>
Date: Fri, 15 Oct 2021 14:38:17 +0000
Subject: [PATCH] [mariadb] Start new nodes serially

There seems to be a bug in Galera that causes
TASK [mariadb : Check MariaDB service WSREP sync status]
to fail.
One (in case of 3-node cluster) or more (possible with
more-than-3-node clusters) nodes may "lose the race" and get stuck
in the "initialized" state of WSREP.
This is entirely random as is the case with most race issues.
MariaDB service restart on that node will fix the situation but
it's unwieldy.
The above may happen because Kolla Ansible starts and waits for
all new nodes at once.
This did not bother the old galera (galera 3) which figured out
the ordering for itself and let each node join the cluster properly.
The proposed workaround is to start and wait for nodes serially.

Change-Id: I449d4c2073d4e3953e9f09725577d2e1c9d563c9
Closes-Bug: #1947485
---
 ansible/roles/mariadb/handlers/main.yml              | 6 ++++++
 releasenotes/notes/bug-1947485-d059864252fb1813.yaml | 7 +++++++
 2 files changed, 13 insertions(+)
 create mode 100644 releasenotes/notes/bug-1947485-d059864252fb1813.yaml

diff --git a/ansible/roles/mariadb/handlers/main.yml b/ansible/roles/mariadb/handlers/main.yml
index 95c6b9398..fb76c9643 100644
--- a/ansible/roles/mariadb/handlers/main.yml
+++ b/ansible/roles/mariadb/handlers/main.yml
@@ -68,8 +68,14 @@
     - bootstrap_host is not defined or bootstrap_host != inventory_hostname
     - groups[mariadb_shard_group + '_port_alive_False'] is defined
     - inventory_hostname in groups[mariadb_shard_group + '_port_alive_False']
+    - groups[mariadb_shard_group + '_port_alive_False'].index(inventory_hostname) % 4 == item
     - kolla_action != "config"
   listen: restart mariadb
+  loop:
+    - 0
+    - 1
+    - 2
+    - 3
 
 - name: Ensure MariaDB is running normally on bootstrap host
   include_tasks: 'restart_services.yml'
diff --git a/releasenotes/notes/bug-1947485-d059864252fb1813.yaml b/releasenotes/notes/bug-1947485-d059864252fb1813.yaml
new file mode 100644
index 000000000..453a55f74
--- /dev/null
+++ b/releasenotes/notes/bug-1947485-d059864252fb1813.yaml
@@ -0,0 +1,7 @@
+---
+fixes:
+  - |
+    Fixes an issue with multinode MariaDB deployments which could fail
+    the playbook execution on WSREP check due to the new behaviour of
+    Galera 4.
+    `LP#1947485 <https://launchpad.net/bugs/1947485>`__.
-- 
GitLab