Commits · bdf60961f00531f4ef7e9e56ca114d2cc9988906 · Very Demiurge Very Mindful / Kolla Ansible

May 16, 2024

ceilometer: use template for custom pipeline.yaml · bdf60961

Also rename task to "Copying over custom pipeline.yaml file" for
clarity.

Change-Id: I04e3eb9620830a15781f9bab2549b557a9d1d9cb

bdf60961

Jul 21, 2023

Fix OpenSearch Dashboards health check · bacd6c7f

Doug Szumski authored 1 year ago

The OpenSearch Dashboards container does not have a health
check defined when created. This causes the container to always
restart when reconfigured, even if no change has been made.

Change-Id: I0b437a77aeb61bc5ae9238f900a1fa00cbc34e18
Partial-Bug: #2028362

bacd6c7f

Jun 28, 2023

haproxy: support single external frontend · 4bc410c6

Michal Nasiadka authored 3 years ago

Use case: exposing single external https frontend and
load balancing services using FQDNs.

Support different ports for internal and external endpoints.

Introduced kolla_url filter to normalize urls like:
- https://magnum.external:443/v1
- http://magnum.external:80/v1



Change-Id: I9fb03fe1cebce5c7198d523e015280c69f139cd0
Co-Authored-By: Jakub Darmach <jakub@stackhpc.com>

4bc410c6

loadbalancer: Add option to not define track script · a0e614ee

Michal Nasiadka authored 1 year ago

We've seen issues in CI when keepalived haproxy check script returns
an error and keepalived is switching to backup and then again to primary
on a single node environment.

Closes-Bug: #2025219

Change-Id: Iba62e76b3cf83f3ade6df81288d2d77129ffc725

a0e614ee

Jun 21, 2023

Fix issue with octavia security group rules creation · f1bb97dd

Michal Arbet authored 1 year ago

This patch fixing issue with octavia security group
rules creation when using IPv6 configuration for octavia
management network.

Closes-Bug: #2023502
Change-Id: I3f8fbb0632ec6ecdc9f3820ebbcf01480de59e1f

f1bb97dd

Jun 20, 2023

Use friendly prometheus instance labels · eef3ff30

Dawud authored 2 years ago


Replaces the instance label on prometheus metrics with the inventory
hostname as opposed to the ip address. The ip address is still used as
the target address which means that there is no issue of the hostname
being unresolvable. Can be optionally enabled or set to FQDNs by
changing the prometheus_instance_label variable as mentioned in the
release notes.

Co-Authored-By: Will Szumski <will@stackhpc.com>
Change-Id: I387c9d8f5c01baf6054381834ecf4e554d0fff35

eef3ff30

Jun 17, 2023

Refactor MariaDB and RabbitMQ restart procedure · 6c037790

Mark Goddard authored 1 year ago

Ansible 2.14.3 introduced a change that broke the method used for
restarting MariaDB and RabbitMQ serially [1][2]. In
I57425680a4cdbf0daeb9b2cc35920f1b933aa4a8 we limited to 2.14.2 to work
around this. Ansible upstream claim this behaviour was unintentional,
and will not fix it.

This change moves to a different approach where we use separate plays
with a 'serial' keyword to execute the restart.

This change also removes the restriction on the maximum supported
version of 2.14.2 on ansible-core - any 2.14 release is now supported.

[1] https://github.com/ansible/ansible/commit/65366f663de7d044f42ae6dd53368fd4c1f88b35
[2] https://github.com/ansible/ansible/issues/80848

Depends-On: https://review.opendev.org/c/openstack/kolla/+/884208

Change-Id: I5a12670d07077d24047aaff57ce8d33ccf7156ff

6c037790

Jun 14, 2023

Add support for multiple ceph files · fdf2385f

Michal Arbet authored 2 years ago

This patch is adding a feature for an option to copy different
ceph configuration files and corresponding keyrings for cinder,
glance, manila, gnocchi and nova services.

This is especially useful when the deployment uses availability
zones as below example.

  - Individual compute can read/write to individual ceph
    cluster in same AZ.
  - Cinder can write to several ceph clusters in several AZs.
  - Glance can use multistore and upload images to
    several ceph clusters in several AZs at once.

Change-Id: Ie4d8ab5a3df748137835cae1c943b9180cd10eb1

fdf2385f

Jun 12, 2023

opensearch-dashboard: fix permissions · 5aaab8dc

Mathias Fechner authored 1 year ago


Fix permissions for opensearch-dashboard data directory.

Closes-bug: #2020152

Change-Id: Ie4cec7649d89df5b8bb306563da2c62ea0cdd2c0
Signed-off-by: Mathias Fechner <fechner@osism.tech>

5aaab8dc

Jun 07, 2023

Fix the Cyborg service · e8250d28

Maksim Malchuk authored 1 year ago

According to the documentation [1] type of the Cyborg service should
be 'accelerator' and description 'Acceleration Service'. Also, this
change fixes incorrect endpoint URLs, and not configures an admin
endpoint [2] because the documentation [1] not updated yet.

1. https://docs.openstack.org/cyborg/latest/install/common.html


2. Icf3bf08deab2c445361f0a0124d87ad8b0e4e9d9

Closes-Bug: #2020080
Change-Id: I002db50cbad5a90e479498e605bdeab343e129c7
Signed-off-by: Maksim Malchuk <maksim.malchuk@gmail.com>

e8250d28

May 31, 2023

Fix passwords.yml permissions · 5fd81170

Maksim Malchuk authored 1 year ago


The kolla-genpwd, kolla-mergepwd, kolla-readpwd and kolla-writepwd
commands now creates or updates passwords.yml with correct
permissions. Also they display warning message about incorrect
permissions.

Closes-Bug: #2018338
Change-Id: I4b50053ced9150499d1d09fd4a0ec2e243cf938b
Signed-off-by: Maksim Malchuk <maksim.malchuk@gmail.com>

5fd81170

May 26, 2023

Update master for stable/2023.1 · b26d25eb

OpenStack Release Bot authored 1 year ago

Add file to the reno documentation build to show release notes for
stable/2023.1.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2023.1.

Sem-Ver: feature
Change-Id: I870c0569a1e175ac5df59fc495812ba81c5147e6

b26d25eb

May 19, 2023

neutron: Add neutron-ovn-agent support · 07815a21

Michal Nasiadka authored 2 years ago

Depends-On: https://review.opendev.org/c/openstack/neutron/+/878535
Change-Id: I05d8b29b59a7de76da488f68775547a8f0f11d0f

07815a21

May 18, 2023

ansible: bump min to 2.13 and max to 2.14.2 · 10fc1b74

Michal Nasiadka authored 1 year ago

We limit to 2.14.2 due to a regression in ansible-core [1] that breaks
conditional include_task loops in handlers. This is used for controlled
restarts of MariaDB and RabbitMQ.

[1]: https://github.com/ansible/ansible/commit/65366f663de7d044f42ae6dd53368fd4c1f88b35



Change-Id: I57425680a4cdbf0daeb9b2cc35920f1b933aa4a8
Co-Authored-By: Michal Nasiadka <michal@stackhpc.com>

10fc1b74

May 16, 2023

always add service_user section to nova.conf · ddadaa28

Sean Mooney authored 1 year ago


As of I3629b84d3255a8fe9d8a7cea8c6131d7c40899e8 nova
now requires the service_user section to be configured
to address CVE-2023-2088. This change adds
the service user section to the nova.conf template in
the nova and nova-cell roles.

Related-Bug: #2004555
Signed-off-by: Sven Kieske <kieske@osism.tech>
Change-Id: I2189dafca070accfd8efcd4b8cc4221c6decdc9f
(cherry picked from commit a77ea13ef1991543df29b7eea14b1f91ef26f858)
(cherry picked from commit 03c12abbcc107bfec451f4558bc97d14facae01c)
(cherry picked from commit cb105dc293ff1cdb11ab63fa3e3bf39fd17e0ee0)
(cherry picked from commit efe6650d09441b02cf93738a94a59723d84c5b19)

ddadaa28

May 04, 2023

Correct ovn-ctl --db-nb-pidfile usage in templates · 46c2b60d

Matt Crees authored 1 year ago

The flags ``--db-nb-pid`` and ``--db-sb-pid`` are corected to be
``--db-nb-pidfile`` and ``--db-sb-pidfile`` respectively. See here for
reference:
https://github.com/ovn-org/ovn/blob/6c6a7ad1c64a21923dc9b5bea7069fd88bcdd6a8/utilities/ovn-ctl#L1045

Closes-Bug: #2018436
Change-Id: Ic1e8768374566eb2198302807ecc644a19cd3062

46c2b60d

Apr 26, 2023

Deprecate Sahara and Vitrage · c899ff26

Sven Kieske authored 1 year ago

as agreed in the Kolla meeting:

https://meetings.opendev.org/meetings/kolla/2023/kolla.2023-04-19-13.00.html



Signed-off-by: Sven Kieske <kieske@osism.tech>

Change-Id: I099a5328e0837e1f5dcf7f21b7fd7bea1748456d

Unverified

c899ff26

Apr 20, 2023

Fix faulty precheck for RabbitMQ · fdacf9d1

Magnus Lööf authored 2 years ago

When using externally managed certificates, according to [1],
one should set `kolla_externally_managed_cert: yes` and ensure
that the certificates are in the correct place.

However, RabbitMQ precheck still expects the certificates to be
available on the controller node. This is incorrect.

Fix by not running the tasks in question when `kolla_externally_managed_cert: yes`

[1] https://docs.openstack.org/kolla-ansible/latest/admin/tls.html



Closes-Bug: 1999081
Related-Bug: 1940286
Signed-off-by: Magnus Lööf <magnus.loof@basalt.se>
Change-Id: I9f845a7bdf5055165e199ab1887ed3ccbfb9d808

fdacf9d1

Revert "ansible: bump min to 2.13 and max to 2.14" · b98a71e5

Dr. Jens Harbott authored 1 year ago

This reverts commit 9867060b.

Reason for revert: seems this broke some jobs

Change-Id: I1ca81214ece403351c0a522ea05bf07802e4c4c0

b98a71e5

Apr 17, 2023

Configure coordination in default for masakari-api · 842adf6d

Michal Arbet authored 2 years ago

This patch introduces distributed lock for masakari-api
service when handle the concurrent notifications for the same
host failure from multiple masakari-hostmonitor services.

Change-Id: I46985202dc8da22601357eefe2727599e7a413e5

842adf6d

Apr 13, 2023

ansible: bump min to 2.13 and max to 2.14 · 9867060b
Michal Nasiadka authored 2 years ago
```
Change-Id: Ibc9cc91f64b0450de3cae6e2830b4ff2c52c0395
```
9867060b

Remove RabbitMQ ha-all policy when not required · c85b64d1

Matt Crees authored 2 years ago

With the addition of the variable
`om_enable_rabbitmq_high_availability`, this feature in the upgrade
task should be brought back. It is also now used in the deploy task. The
`ha-all` policy is cleared only when
`om_enable_rabbitmq_high_availability` is set to `false`.

Change-Id: Ia056aa40e996b1f0fed43c0f672466c7e4a2f547

c85b64d1

Apr 12, 2023

RabbitMQ use maintenance mode on container restart · e709599f

Matt Crees authored 2 years ago

Puts the RabbitMQ node into maintenance mode before restarting the
container. This will make the node shutdown less disruptive. For details
on what maintenance mode does, see:
https://www.rabbitmq.com/upgrade.html#maintenance-mode

Change-Id: Ia61573f3fb95fe8fcde6b789ca77ef5b45fe0a65

e709599f

rabbitmq: Do not stop containers on upgrade · b30c7bc8

Michal Nasiadka authored 2 years ago

Since RMQ 3.8 we can use rolling upgrade [1].

Depends-On: https://review.opendev.org/c/openstack/kolla/+/872393

[1]: https://www.rabbitmq.com/upgrade.html#rolling-upgrades

Change-Id: If6a7c6c12d9226a2406728108b3c87b3485ac55f

b30c7bc8

Apr 08, 2023

Fix create sasl account before config file is ready · 46415123

gamerslouis authored 1 year ago

Add checking for container readiness before create sasl user

Closes-Bug: #2015589
Change-Id: Ic650ba6be1f192e3cbeaa94de3d00507636c1c92

46415123

Mar 29, 2023

Add LimitRequestBody configuration for Horizon · d907790f

Maksim Malchuk authored 2 years ago

Since CVE-2022-29404 is fixed [1,2] the default value for the
LimitRequestBody directive in the Apache HTTP Server has been changed
from 0 (unlimited) to 1 GiB. This limits the size of images (for
example) uploaded in Horizon. This change add the ability to
configure the limit.

1. https://access.redhat.com/articles/6975397
2. https://ubuntu.com/security/CVE-2022-29404



Closes-Bug: #2012588
Change-Id: I4cd9dd088cbcf38ff6f8d188ebcc56be7d9ea1c9
Signed-off-by: Maksim Malchuk <maksim.malchuk@gmail.com>

d907790f

Mar 28, 2023

Use the upgraded image to run Nova upgrade checks · e34fbb17

Matt Crees authored 2 years ago

When upgrading Nova, we sometimes hit an error where an old hypervisor
that hasn’t been upgraded recently (for example due to broken hardware)
is preventing Nova API from starting properly. This can be detected
using the tool ``nova-status upgrade check`` to make sure that there are
no ``nova-compute`` that are older than N-1 releases. This is already
used in the Kolla Ansible upgrade task for Nova. However, this task uses
the current ``nova-api`` container, so computes which will be too old
after the upgrade are not caught.

This patch changes Kolla Ansible so that the upgraded ``nova-api`` image
is used to run the upgrade checks, allowing computes that will be too
old to be detected before the upgrades are performed.

Depends-On: https://review.opendev.org/c/openstack/kolla/+/878744



Closes-Bug: #1957080
Co-Authored-By: Pierre Riteau <pierre@stackhpc.com>
Change-Id: I3a899411001834a0c88e37f45a756247ee11563d

e34fbb17

Mar 21, 2023

Set RabbitMQ message TTL and queue expiry · fd30dfb8

John Garbutt authored 3 years ago

Following ideas here:
https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit

Make sure old messages with no consumer are dropped after the message
TTL of 10 mins, longer than the 1 min RPC timeout.
Also ensure queues expire after an hour of inactivity, so queues from
removed nodes or renamed nodes don't grow over time.

Change-Id: Ifb28ac68b6328adb604a7474d01e5f7a47b2e788

fd30dfb8

Add flags for RabbitMQ message TTL & queue expiry · dae2cbca

Matt Crees authored 2 years ago

Adds two new flags to alter behaviour in RabbitMQ:
    * `rabbitmq_message_ttl_ms`, which lets you set a TTL on messages.
    * `rabbitmq_queue_expiry_ms`, which lets you set an expiry time on queues.
See https://www.rabbitmq.com/ttl.html for more information on both.

Change-Id: I51ca37ffbb1bb5c07f2d39873f0f33ca20263f2a

dae2cbca

Set RabbitMQ ha-promote-on-shutdown=always · a87810db

Matt Crees authored 2 years ago

Changes the default value of `rabbitmq-ha-promote-on-shutdown` to
`"always"`.

We are seeing issues with RabbitMQ automatically recovering when nodes
are restarted. https://www.rabbitmq.com/ha.html#cluster-shutdown

Rather than waiting for operator interventions, it is better we allow
recovery to happen, even if that means we may loose some messages.
A few failed and timed out operations is better than a totaly broken
cloud. This is achieved using ha-promote-on-shutdown=always.

Note, when a node failure is detected, this is already the default
behaviour from 3.7.5 onwards:
https://www.rabbitmq.com/ha.html#promoting-unsynchronised-mirrors

Related-Bug: #1954925
Change-Id: I484a81163f703fa27112df22473d657e2a9ab964

a87810db

Mar 06, 2023

mariadb: add mariadb_datadir_volume parameter · b327ae4a

Christian Berendt authored 2 years ago

With the parameter ``mariadb_datadir_volume`` it is possible
to use a directory as volume for the mariadb service. By default,
a volume named mariadb is used (the previous default).

Change-Id: Ic61fe981825c5fa6f50e53c9555b6a102f42f522

b327ae4a

Add neutron_ovn_availability_zones parameter · 6768b760

Christian Berendt authored 2 years ago

With the new ``neutron_ovn_availability_zones`` parameter it is possible
to define network availability zones for OVN. Further details can be found
in the Neutron OVN documentation:
https://docs.openstack.org/neutron/latest/admin/ovn/availability_zones.html#how-to-configure-it

Change-Id: I203e0d400a3218d0b4a41f2a948207032c4febec

6768b760

Mar 02, 2023

Set the etcd internal hostname and cacert for tls internal enabled · 5d3eed23

Matthew N Heler authored 2 years ago

deployments

This allows services to work with etcd when coordination is enabled
for TLS internal deployments. Without this fix, we fail to connect to
etcd with the coordination backend and the service itself crashes.

Change-Id: I0c1d6b87e663e48c15a846a2774b0a4531a3ca68

5d3eed23

Feb 14, 2023

Fix deploy/genconfig in check mode · 572ff2f8

Mark Goddard authored 2 years ago

Previously, when running one of the following commands:

  kolla-ansible deploy --check
  kolla-ansible genconfig --check

deployment or configuration generation fails for various reasons.

MariaDB fails to lookup the existing cluster.

Keystone fails to generate cron config.

Nova-cell fails to get the cell settings.

Closes-Bug: #2002661
Change-Id: I5e765f498ae86d213d0a4379ca5d473db1499962

572ff2f8

Improve RabbitMQ performance by reducing ha replicas · 6cf22b0c

John Garbutt authored 3 years ago

Currently we do not follow the RabbitMQ advice on replicas here:
https://www.rabbitmq.com/ha.html#replication-factor

Here we reduce the number of replicas to n // 2 + 1 as advised
above. The hope it this helps speed up recovery from rabbit
issues.

Related-Bug: #1954925
Change-Id: Ib6bcb26c499c9884faa4a0cd51abaec00cacb096

6cf22b0c

Add flag to change RabbitMQ ha-mode definition · e13072a9

Matt Crees authored 2 years ago

Adds the flag `rabbitmq_ha_replica_count` to change how many different
nodes a queue should be mirrored across. If the value is not set, then
it defaults to "ha-mode":"all". This value is unset by default to avoid
any unexpected changes to the RabbitMQ definitions.json file, as that
would trigger an unexpected restart of RabbitMQ during the next deploy.

Change-Id: Iee98cd937197a73a3b04aa8501fa325e8ecfff24

e13072a9

Use loadbalancer to connect to etcd · e2c7dace

Will Szumski authored 2 years ago

Hardcoding the first etcd host creates a single point of failure.

Change-Id: I0f83030fcd84ddcdc4bf2226e76605c7cab84cbb

e2c7dace

Feb 13, 2023

Put etcd behind HTTP loadbalancer · 6f536a4f

Will Szumski authored 2 years ago


etcd-compatible tooz drivers do not support multiple endpoints via
backend_url. We can put a loadbalancer in front of etcd and configure
backend_url to use the VIP instead. The issue with hard coding the first
host is that we break coordination if we take this host offline. In the
case of cinder, we would not be able to perform any volume related
operations.

Co-Authored-By: Mark Goddard <mark@stackhpc.com>
Change-Id: Ib684501ba03c386dc5ac71e5cbea05c99f191665

6f536a4f

Feb 09, 2023

RabbitMQ: Support setting ha-promote-on-shutdown · 94f3ce0c

John Garbutt authored 3 years ago

By default ha-promote-on-shutdown=when-synced. However we are seeing
issues with RabbitMQ automatically recovering when nodes are restarted.
https://www.rabbitmq.com/ha.html#cluster-shutdown

Rather than waiting for operator interventions, it is better we allow
recovery to happen, even if that means we may loose some messages.
A few failed and timed out operations is better than a totaly broken
cloud. This is achieved using ha-promote-on-shutdown=always.

Note, when a node failure is detected, this is already the default
behaviour from 3.7.5 onwards:
https://www.rabbitmq.com/ha.html#promoting-unsynchronised-mirrors

This patch adds the option to change the ha-promote-on-shutdown
definition, using the flag `rabbitmq_ha_promote_on_shutdown`. This
value is unset by default to avoid any unexpected changes to the
RabbitMQ definitions.json file, as that would trigger an unexpected
restart of RabbitMQ during the next deploy.

Related-Bug: #1954925

Change-Id: I2146bda2c72ddac2c9923c6941b0596395fd9ab5

94f3ce0c

Feb 04, 2023

Fix kolla_docker module · 63b9fa56

Michal Arbet authored 2 years ago

This patch fixes kolla_docker module
as it did not take into account common_options
parameter. From patchset it's visible that module's
default values are used always - even if user overrided
some param in common_options dict.

Closes-Bug: #2003079

Change-Id: I677fde708dd004decaff4bd39f2173d8d81052fb

63b9fa56