- Jan 17, 2024
-
-
Matt Crees authored
Shard allocation is disabled at the start of the OpenSearch upgrade task. This is set as a transient setting, meaning it will be removed once the containers are restarted. However, if there is not change in the OpenSearch container it will not be restarted so the cluster is left in a broken state: unable to allocate shards. This patch moves the pre-upgrade tasks to within the handlers, so shard allocation and the flush are only performed when the OpenSearch container is going to be restarted. Closes-Bug: #2049512 Change-Id: Ia03ba23bfbde7d50a88dc16e4f117dec3c98a448
-
- Jan 11, 2024
-
-
wu.chunyang authored
This change fixes the trove failed to discover swift endpoint by adding service_credentials in guest-agent.conf Closes-Bug: #2048829 Change-Id: I185484d2a0d0a2d4016df6acf8a6b0a7f934c237
-
wu.chunyang authored
This change fixes the trove guest instance failed to connect to RabbitMQ by adding quorum queues support to oslo_messaging_rabbit section in guest-agent.conf. Closes-Bug: #2048822 Change-Id: I94908f8e20981f20fbe4dc18e2091d3798f8b801
-
wu.chunyang authored
This change fixes the trove guest instance failed to connect to RabbitMQ by adding durable queues support to oslo_messaging_rabbit section in guest-agent.conf. Partial-Bug: #2048822 Change-Id: I8efc3c92e861816385e6cda3b231a950a06bf57d
-
- Jan 08, 2024
-
-
Pierre Riteau authored
The addition of an instance resize operation [1] to CI testing is triggering a failure in kolla-ansible-debian-ovn jobs, which are using a nodeset with multiple nodes: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. Command: scp -r /var/lib/nova/instances/8ca2c7e8-acae-404c-af7d-6cac38e354b8_resize/disk 192.0.2.2:/var/lib/nova/instances/8ca2c7e8-acae-404c-af7d-6cac38e354b8/disk Exit code: 255 Stdout: '' Stderr: "Warning: Permanently added '[192.0.2.2]:8022' (ED25519) to the list of known hosts.\r\nsubsystem request failed on channel 0\r\nscp: Connection closed\r\n" This is not seen on Ubuntu Jammy, which uses OpenSSH 8.9, while Debian Bookworm uses OpenSSH 9.2. This is likely related to this change in OpenSSH 9.0 [2]: This release switches scp(1) from using the legacy scp/rcp protocol to using the SFTP protocol by default. Configure sftp subsystem like on RHEL9 derivatives. Even though it is not yet required for Ubuntu, we also configure it so we are ready for the Noble release. [1] https://review.opendev.org/c/openstack/kolla-ansible/+/904249 [2] https://www.openssh.com/txt/release-9.0 Closes-Bug: #2048700 Change-Id: I9f1129136d7664d5cc3b57ae5f7e8d05c499a2a5
-
Michal Arbet authored
This patch sets URL to glance worker. If this is set, other glance workers will know how to contact this one directly if needed. For image import, a single worker stages the image and other workers need to be able to proxy the import request to the right one. With current setup glance image import just not working. Closes-Bug: #2048525 Change-Id: I4246dc8a80038358cd5b6e44e991b3e2ed72be0e
-
- Jan 05, 2024
-
-
Mark Goddard authored
The prometheus_cadvisor container has high CPU usage. On various production systems I checked it sits around 13-16% on controllers, averaged over the prometheus 1m scrape interval. When viewed with top we can see it is a bit spikey and can jump over 100%. There are various bugs about this, but I found https://github.com/google/cadvisor/issues/2523 which suggests reducing the per-container housekeeping interval. This defaults to 1s, which provides far greater granularity than we need with the default prometheus scrape interval of 60s. Reducing the housekeeping interval to 60s on a production controller reduced the CPU usage from 13% to 3.5% average. This still seems high, but is more reasonable. Change-Id: I89c62a45b1f358aafadcc0317ce882f4609543e7 Closes-Bug: #2048223
-
Michal Arbet authored
Some containers exiting with 143 instead of 0, but this is still OK. This patch just allows ExitCode 143 (SIGTERM) as fix. Details in bugreport. Services which exited with 143 (SIGTERM): kolla-cron-container.service kolla-designate_producer-container.service kolla-keystone_fernet-container.service kolla-letsencrypt_lego-container.service kolla-magnum_api-container.service kolla-mariadb_clustercheck-container.service kolla-neutron_l3_agent-container.service kolla-openvswitch_db-container.service kolla-openvswitch_vswitchd-container.service kolla-proxysql-container.service Partial-Bug: #2048130 Change-Id: Ia8c85d03404cfb368e4013066c67acd2a2f68deb
-
- Jan 04, 2024
-
-
Michal Nasiadka authored
These were missed in I081aa1345603fa27c390e4e09231a5ff226bcb39 Change-Id: I2884bca3c06ff98004e318757a20b60c12375924
-
- Jan 03, 2024
-
-
Mark Goddard authored
This reduces code duplication. Change-Id: Ie529875aaa42435835417468868250bbe4fcf649
-
- Jan 02, 2024
-
-
Michal Nasiadka authored
I35317ea0343f0db74ddc0e587862e95408e9e106 changed certificate path but omitted single frontend template. Change-Id: I638ba32e97234900745df62056710dcc37e7db77
-
Michal Nasiadka authored
Closes-Bug: #2047360 Change-Id: I73490d84da39a74ea7ac493c7dd41fe7bfe2f578
-
- Dec 28, 2023
-
-
Michal Nasiadka authored
Change-Id: I27028ffae26a57d510e1a78c38ead2f925396e81
-
Michal Nasiadka authored
Change-Id: I081aa1345603fa27c390e4e09231a5ff226bcb39
-
- Dec 21, 2023
-
-
Doug Szumski authored
We previously used ElasticSearch Curator for managing log retention. Now that we have moved to OpenSearch, we can use the Index State Management (ISM) plugin which is bundled with OpenSearch. This change adds support for automating the configuration of the ISM plugin via the OpenSearch API. By default, it has similar behaviour to the previous ElasticSearch Curator default policy. Closes-Bug: #2047037 Change-Id: I5c6d938f2bc380f1575ee4f16fe17c6dca37dcba
-
Alex-Welsh authored
Removed a comment suggesting we use nova-manage db sync --local_cell when bootstrapping the nova service, since that suggestion has now been implemented in Kolla. See [1] for more details. [1]: https://review.opendev.org/c/openstack/kolla/+/902057 Related-Bug: #2045558 Depends-On: Ic64eb51325b3503a14ebab9b9ff2f4d9caec734a Change-Id: I591f83c4886f5718e36011982c77c0ece6c4cbd7
-
- Dec 20, 2023
-
-
Michal Nasiadka authored
Change-Id: Ia6db7d6a41ddbda8fcbf563dc55a0c65ef8db9be
-
- Dec 19, 2023
-
-
Michal Nasiadka authored
Change-Id: Ic9bd25a09b860838910dbe3d55f94421a0461c57
-
Michal Nasiadka authored
Change-Id: Ibf9a9a0c18938f638c8e8b00b6017c64f1523b23
-
- Dec 18, 2023
-
-
Sven Kieske authored
Signed-off-by:
Sven Kieske <kieske@osism.tech> Change-Id: I81a9b2dab7e9a4e2c8facaa0f32538f2884e3ca9
-
- Dec 14, 2023
-
-
Pierre Riteau authored
The wrong process name was being used. Closes-Bug: #2046268 Change-Id: I5a5d4f227205e811732331ee6e020ccea67b6fab
-
- Dec 13, 2023
-
-
Matt Crees authored
Adds a precheck to fail if non-quorum queues are found in RabbitMQ. Currently excludes fanout and reply queues, pending support in oslo.messaging [1]. [1]: https://review.opendev.org/c/openstack/oslo.messaging/+/888479 Closes-Bug: #2045887 Change-Id: Ibafdcd58618d97251a3405ef9332022d4d930e2b
-
- Dec 05, 2023
-
-
Andrey Kurilin authored
Starting with ansible-core 2.13, list concatenation format is changed and does not support concatenation operations outside of the jinja template. The format change: "[1] + {{ [2] }}" -> "{{ [1] + [2] }}" This affects the horizon role that iterates over existing policy files to override and concatenate them into a single variable. Co-Authored-By:
Dr. Jens Harbott <harbott@osism.tech> Closes-Bug: #2045660 Change-Id: I91a2101ff26cb8568f4615b4cdca52dcf09e6978
-
Mark Goddard authored
This allows us to continue execution until a certain proportion of hosts to fail. This can be useful at scale, where failures are common, and restarting a deployment is time-consuming. The default max failure percentage is 100, keeping the default behaviour. A global max failure percentage may be set via kolla_max_fail_percentage, and individual services may define a max failure percentage via <service>_max_fail_percentage. Note that all hosts in the inventory must be reachable for fact gathering, even those not included in a --limit. Closes-Bug: #1833737 Change-Id: I808474a75c0f0e8b539dc0421374b06cea44be4f
-
- Dec 02, 2023
-
-
Maksim Malchuk authored
Followup on Id6eae798784126d4dd53adef15bdce6b47b4601f to fix an issue when a client with provided port set tries to connect 'localhost', so while we switch to TCP/IP we need to explicitly provide the host too. Partial-Bug: #2024554 Change-Id: Ib08c159dadd69a1f44924d658f4afe1e794a18b0 Signed-off-by:
Maksim Malchuk <maksim.malchuk@gmail.com>
-
- Dec 01, 2023
-
-
Christian Berendt authored
If a file {{ node_custom_config }}/magnum/kubeconfig exists, it is copied to /var/lib/magnum/.kube/config in all Magnum Service Containers. At this location, the vexxhost/magnum-cluster-api will loo for the Kubeconfig configuration file to control the Cluster API Control Plane. If the vexxhost/magnum-cluster-api is installed in the Magnum container images, control of a cluster API control plane can then take place via the Magnum API. Depends-On: https://review.opendev.org/c/openstack/kolla/+/902101 Change-Id: I986c5192fe96b9c480a2d8fa87d719a50ce78186
-
Michal Nasiadka authored
podman_image_info returns Config dict, not ContainerConfig. Change-Id: I9f813c90b42246c4835d7d7b18476a021d80548b
-
- Nov 30, 2023
-
-
Sven Kieske authored
This implements a global toggle `om_enable_rabbitmq_quorum_queues` to enable quorum queues for each service in RabbitMQ, similar to what was done for HA[0]. Quorum Queues are enabled by default. Quorum queues are more reliable, safer, simpler and faster than replicated mirrored classic queues[1]. Mirrored classic queues are deprecated and scheduled for removal in RabbitMQ 4.0[2]. Notice, that we do not need a new policy in the RabbitMQ definitions template, because their usage is enabled on the client side and can't be set using a policy[3]. Notice also, that quorum queues are not yet enabled in oslo.messaging for the usage of reply_ and fanout_ queues (transient queues). This will change once[4] is merged. [0]: https://review.opendev.org/c/openstack/kolla-ansible/+/867771 [1]: https://www.rabbitmq.com/quorum-queues.html [2]: https://blog.rabbitmq.com/posts/2021/08/4.0-deprecation-announcements/ [3]: https://www.rabbitmq.com/quorum-queues.html#declaring [4]: https://review.opendev.org/c/openstack/oslo.messaging/+/888479 Signed-off-by:
Sven Kieske <kieske@osism.tech> Change-Id: I6c033d460a5c9b93c346e9e47e93b159d3c27830
-
- Nov 29, 2023
-
-
Jan Gutter authored
* Updates etcd to v3.4 * Updated the config to use v3.4's logging mechanism * Deprecated etcd CA parameters aren't used, so we are not affected by their removal. * Note that we are not currently guarding against skip-version updates for etcd. Notable non-voting jobs exercising some of this: * kolla-ansible-ubuntu-upgrade-cephadm (cinder->tooz->etcd3gw->etcd) * kolla-ansible-ubuntu-zun (see https://review.opendev.org/c/openstack/openstack-ansible/+/883194 ) Depends-On: https://review.opendev.org/c/openstack/kolla/+/890464 Change-Id: I086e7bbc7db64421445731a533265e7056fbdb43
-
Jan Gutter authored
* etcd service containers usually have a set of environment parameters required to boot the container. * The short-lived etcd bootstrap containers pass extra ETCD_INITIAL_* environment variables, but still need to pass the ones that the service containers use. * This uses ansible's `combine` filter to cut down on the duplication. * This is intended to be just a straightforward refactor. Change-Id: I04e95f92a8f365553afd618d58b99de595d48312
-
- Nov 28, 2023
-
-
Jan Gutter authored
This commit addresses a few shortcomings in the etcd service: * Adding or removing etcd nodes required manual intervention. * The etcd service would have brief outages during upgrades or reconfigures because restarts weren't always serialised. This makes the etcd service follow a similar pattern to mariadb: * There is now a distiction between bootstrapping the cluster and adding / removing another member. * This more closely follows etcd's upstream bootstrapping guidelines. * The etcd role now serialises restarts internally so the kolla_serial pattern is no longer appropriate (or necessary). This does not remove the need for manual intervention in all failure modes: the documentation has been updated to address the most common issues. Note that there's repetition in the container specifications: this is somewhat deliberate. In a future cleanup, it's intended to reduce the duplication. Change-Id: I39829ba0c5894f8e549f9b83b416e6db4fafd96f
-
Michal Nasiadka authored
Depends-On: https://review.opendev.org/c/openstack/kolla/+/901508 Change-Id: I8c7d3de95d0f1f8e57a993b8c3417d90459e19be
-
Doug Szumski authored
Like other WSGI services in Kolla Ansible, the Horizon WSGI application handles log output via the `wsgi.errors` object. See [1] for further information. The problem is that this log output is written to a file called `horizon.log`, causing it to processed as an 'Oslo log' in the Fluentd processing pipeline. Since the log format doesn't match the expected format, this results in parsing errors. This fix renames the log file and adjusts the format to match other WSGI applications. The logs are then processed in the same way as other WSGI application logs, resolving the issue. [1] https://modwsgi.readthedocs.io/en/master/user-guides/debugging-techniques.html Change-Id: I93777d1c53920f5470c78356e6b3a4064fbe04b4 Closes-Bug: #1898174
-
Matt Crees authored
This reverts commit b86c304a. Reason for revert: We want to enable Quorum Queues by default in Caracal, without requiring two queue migrations between releases. See etherpad for details: https://etherpad.opendev.org/p/kolla-ansible-rmq-quorum-queues-proposal Change-Id: Ia19ab97f538125475297976347c5da332a7fdda7
-
- Nov 22, 2023
-
-
Michal Arbet authored
The patch [1] mentioned below added the jobboard functionality to the octavia role, but unfortunately it incorrectly implemented the functionality of users and rules for proxysql. This patch fixes this bug. [1] https://review.opendev.org/c/openstack/kolla-ansible/+/888588 Closes-Bug: #2044293 Change-Id: I6524fabad19b438113db4affe05f5586db99dff4
-
Will Szumski authored
Closes-Bug: #2043831 Change-Id: I010fabd255d93d5329de82af2b5d21c8fa7d93c4
-
Pierre Riteau authored
Closes-Bug: #2044226 Change-Id: I5e17152584b758c9ca4f1cc14520337f979584b7
-
- Nov 21, 2023
-
-
Pierre Riteau authored
This avoids generating an empty [oslo_policy] section in nova.conf when no custom policy file is defined. Change-Id: I23fae8387573e7f37eda0f2a09cd937239afd93f
-
- Nov 17, 2023
-
-
Will Szumski authored
Closes-Bug: #2043829 Change-Id: Ic4cbaf592a2699d9c0312c575f68613c8681239f
-
Will Szumski authored
See: https://grafana.com/docs/grafana/latest/administration/provisioning/ Closes-Bug: #2043828 Change-Id: I9ed07dc8c995adddf6d89838cd515af93d10bd00
-