Skip to content
Snippets Groups Projects
  1. Jan 17, 2024
    • Matt Crees's avatar
      Fix OpenSearch upgrade tasks idempotency · e502b65b
      Matt Crees authored
      Shard allocation is disabled at the start of the OpenSearch upgrade
      task. This is set as a transient setting, meaning it will be removed
      once the containers are restarted. However, if there is not change in
      the OpenSearch container it will not be restarted so the cluster is left
      in a broken state: unable to allocate shards.
      
      This patch moves the pre-upgrade tasks to within the handlers, so shard
      allocation and the flush are only performed when the OpenSearch
      container is going to be restarted.
      
      Closes-Bug: #2049512
      Change-Id: Ia03ba23bfbde7d50a88dc16e4f117dec3c98a448
      e502b65b
  2. Jan 11, 2024
  3. Jan 08, 2024
    • Pierre Riteau's avatar
      Fix Nova scp failures on Debian Bookworm · bfa9dd97
      Pierre Riteau authored
      The addition of an instance resize operation [1] to CI testing is
      triggering a failure in kolla-ansible-debian-ovn jobs, which are using a
      nodeset with multiple nodes:
      
          oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
          Command: scp -r /var/lib/nova/instances/8ca2c7e8-acae-404c-af7d-6cac38e354b8_resize/disk 192.0.2.2:/var/lib/nova/instances/8ca2c7e8-acae-404c-af7d-6cac38e354b8/disk
          Exit code: 255
          Stdout: ''
          Stderr: "Warning: Permanently added '[192.0.2.2]:8022' (ED25519) to the list of known hosts.\r\nsubsystem request failed on channel 0\r\nscp: Connection closed\r\n"
      
      This is not seen on Ubuntu Jammy, which uses OpenSSH 8.9, while Debian
      Bookworm uses OpenSSH 9.2. This is likely related to this change in
      OpenSSH 9.0 [2]:
      
          This release switches scp(1) from using the legacy scp/rcp protocol
          to using the SFTP protocol by default.
      
      Configure sftp subsystem like on RHEL9 derivatives. Even though it is
      not yet required for Ubuntu, we also configure it so we are ready for
      the Noble release.
      
      [1] https://review.opendev.org/c/openstack/kolla-ansible/+/904249
      [2] https://www.openssh.com/txt/release-9.0
      
      Closes-Bug: #2048700
      Change-Id: I9f1129136d7664d5cc3b57ae5f7e8d05c499a2a5
      bfa9dd97
    • Michal Arbet's avatar
      Enable glance proxying behaviour · 9ecfcf5a
      Michal Arbet authored
      This patch sets URL to glance worker.
      If this is set, other glance workers will know how to contact this one
      directly if needed. For image import, a single worker stages the image
      and other workers need to be able to proxy the import request to the
      right one.
      
      With current setup glance image import just not working.
      
      Closes-Bug: #2048525
      
      Change-Id: I4246dc8a80038358cd5b6e44e991b3e2ed72be0e
      9ecfcf5a
  4. Jan 05, 2024
    • Mark Goddard's avatar
      cadvisor: Set housekeeping interval to Prometheus scrape interval · 97e5c0e9
      Mark Goddard authored
      The prometheus_cadvisor container has high CPU usage. On various
      production systems I checked it sits around 13-16% on controllers,
      averaged over the prometheus 1m scrape interval. When viewed with top we
      can see it is a bit spikey and can jump over 100%.
      
      There are various bugs about this, but I found
      https://github.com/google/cadvisor/issues/2523 which suggests reducing
      the per-container housekeeping interval. This defaults to 1s, which
      provides far greater granularity than we need with the default
      prometheus scrape interval of 60s.
      
      Reducing the housekeeping interval to 60s on a production controller
      reduced the CPU usage from 13% to 3.5% average. This still seems high,
      but is more reasonable.
      
      Change-Id: I89c62a45b1f358aafadcc0317ce882f4609543e7
      Closes-Bug: #2048223
      97e5c0e9
    • Michal Arbet's avatar
      Fix long service restarts while using systemd · b1fd2b40
      Michal Arbet authored
      Some containers exiting with 143 instead of 0, but
      this is still OK. This patch just allows
      ExitCode 143 (SIGTERM) as fix. Details in
      bugreport.
      
      Services which exited with 143 (SIGTERM):
      
      kolla-cron-container.service
      kolla-designate_producer-container.service
      kolla-keystone_fernet-container.service
      kolla-letsencrypt_lego-container.service
      kolla-magnum_api-container.service
      kolla-mariadb_clustercheck-container.service
      kolla-neutron_l3_agent-container.service
      kolla-openvswitch_db-container.service
      kolla-openvswitch_vswitchd-container.service
      kolla-proxysql-container.service
      
      Partial-Bug: #2048130
      Change-Id: Ia8c85d03404cfb368e4013066c67acd2a2f68deb
      b1fd2b40
  5. Jan 04, 2024
  6. Jan 03, 2024
  7. Jan 02, 2024
  8. Dec 28, 2023
  9. Dec 21, 2023
    • Doug Szumski's avatar
      Set a log retention policy for OpenSearch · 5e5a2dca
      Doug Szumski authored
      We previously used ElasticSearch Curator for managing log
      retention. Now that we have moved to OpenSearch, we can use
      the Index State Management (ISM) plugin which is bundled with
      OpenSearch.
      
      This change adds support for automating the configuration of
      the ISM plugin via the OpenSearch API. By default, it has
      similar behaviour to the previous ElasticSearch Curator
      default policy.
      
      Closes-Bug: #2047037
      
      Change-Id: I5c6d938f2bc380f1575ee4f16fe17c6dca37dcba
      5e5a2dca
    • Alex-Welsh's avatar
      Remove nova cell sync comment · e9e7362f
      Alex-Welsh authored
      Removed a comment suggesting we use nova-manage db sync --local_cell
      when bootstrapping the nova service, since that suggestion has now been
      implemented in Kolla. See [1] for more details.
      
      [1]: https://review.opendev.org/c/openstack/kolla/+/902057
      
      Related-Bug: #2045558
      Depends-On: Ic64eb51325b3503a14ebab9b9ff2f4d9caec734a
      Change-Id: I591f83c4886f5718e36011982c77c0ece6c4cbd7
      e9e7362f
  10. Dec 20, 2023
  11. Dec 19, 2023
  12. Dec 18, 2023
  13. Dec 14, 2023
  14. Dec 13, 2023
  15. Dec 05, 2023
    • Andrey Kurilin's avatar
      Fix broken list concatenation in horizon role · 97cd1731
      Andrey Kurilin authored
      
      Starting with ansible-core 2.13, list concatenation format is changed
      and does not support concatenation operations outside of the jinja template.
      
      The format change:
      
        "[1] + {{ [2] }}" -> "{{ [1] + [2] }}"
      
      This affects the horizon role that iterates over existing policy files to
      override and concatenate them into a single variable.
      
      Co-Authored-By: default avatarDr. Jens Harbott <harbott@osism.tech>
      
      Closes-Bug: #2045660
      Change-Id: I91a2101ff26cb8568f4615b4cdca52dcf09e6978
      97cd1731
    • Mark Goddard's avatar
      Support Ansible max_fail_percentage · af6e1ca4
      Mark Goddard authored
      This allows us to continue execution until a certain proportion of hosts
      to fail. This can be useful at scale, where failures are common, and
      restarting a deployment is time-consuming.
      
      The default max failure percentage is 100, keeping the default
      behaviour. A global max failure percentage may be set via
      kolla_max_fail_percentage, and individual services may define a max
      failure percentage via <service>_max_fail_percentage.
      
      Note that all hosts in the inventory must be reachable for fact
      gathering, even those not included in a --limit.
      
      Closes-Bug: #1833737
      Change-Id: I808474a75c0f0e8b539dc0421374b06cea44be4f
      af6e1ca4
  16. Dec 02, 2023
  17. Dec 01, 2023
    • Christian Berendt's avatar
      magnum: support kubeconfig configuration file · c939504d
      Christian Berendt authored
      If a file {{ node_custom_config }}/magnum/kubeconfig exists, it is
      copied to /var/lib/magnum/.kube/config in all Magnum Service Containers.
      At this location, the vexxhost/magnum-cluster-api will loo for the Kubeconfig
      configuration file to control the Cluster API Control Plane. If the
      vexxhost/magnum-cluster-api is installed in the Magnum container images,
      control of a cluster API control plane can then take place via the Magnum API.
      
      Depends-On: https://review.opendev.org/c/openstack/kolla/+/902101
      Change-Id: I986c5192fe96b9c480a2d8fa87d719a50ce78186
      c939504d
    • Michal Nasiadka's avatar
      fluentd: Fix getting podman labels · bdd2aa37
      Michal Nasiadka authored
      podman_image_info returns Config dict, not ContainerConfig.
      
      Change-Id: I9f813c90b42246c4835d7d7b18476a021d80548b
      bdd2aa37
  18. Nov 30, 2023
  19. Nov 29, 2023
    • Jan Gutter's avatar
      etcd: update to v3.4 · ccfa2a6c
      Jan Gutter authored
      * Updates etcd to v3.4
      * Updated the config to use v3.4's logging mechanism
      * Deprecated etcd CA parameters aren't used, so we are not affected
        by their removal.
      * Note that we are not currently guarding against skip-version updates for
        etcd.
      
      Notable non-voting jobs exercising some of this:
      * kolla-ansible-ubuntu-upgrade-cephadm (cinder->tooz->etcd3gw->etcd)
      * kolla-ansible-ubuntu-zun (see
        https://review.opendev.org/c/openstack/openstack-ansible/+/883194 )
      
      Depends-On: https://review.opendev.org/c/openstack/kolla/+/890464
      Change-Id: I086e7bbc7db64421445731a533265e7056fbdb43
      ccfa2a6c
    • Jan Gutter's avatar
      etcd: deduplicate environments for containers · ae21f317
      Jan Gutter authored
      * etcd service containers usually have a set of
        environment parameters required to boot the container.
      * The short-lived etcd bootstrap containers pass extra
        ETCD_INITIAL_* environment variables, but still need to
        pass the ones that the service containers use.
      * This uses ansible's `combine` filter to cut down on the
        duplication.
      * This is intended to be just a straightforward refactor.
      
      Change-Id: I04e95f92a8f365553afd618d58b99de595d48312
      ae21f317
  20. Nov 28, 2023
    • Jan Gutter's avatar
      etcd: Add support for more scenarios · ed3b27cc
      Jan Gutter authored
      This commit addresses a few shortcomings in the etcd service:
        * Adding or removing etcd nodes required manual intervention.
      
        * The etcd service would have brief outages during upgrades or
          reconfigures because restarts weren't always serialised.
      
      This makes the etcd service follow a similar pattern to mariadb:
        * There is now a distiction between bootstrapping the cluster
          and adding / removing another member.
      
        * This more closely follows etcd's upstream bootstrapping
          guidelines.
      
        * The etcd role now serialises restarts internally so the
          kolla_serial pattern is no longer appropriate (or necessary).
      
      This does not remove the need for manual intervention in all
      failure modes: the documentation has been updated to address the
      most common issues.
      
      Note that there's repetition in the container specifications: this
      is somewhat deliberate. In a future cleanup, it's intended to reduce
      the duplication.
      
      Change-Id: I39829ba0c5894f8e549f9b83b416e6db4fafd96f
      ed3b27cc
    • Michal Nasiadka's avatar
      fluentd: Use labels for transition to v5 · 06baa8f6
      Michal Nasiadka authored
      Depends-On: https://review.opendev.org/c/openstack/kolla/+/901508
      Change-Id: I8c7d3de95d0f1f8e57a993b8c3417d90459e19be
      06baa8f6
    • Doug Szumski's avatar
      Fix Horizon WSGI application log parsing · 4168b46c
      Doug Szumski authored
      Like other WSGI services in Kolla Ansible, the Horizon WSGI application
      handles log output via the `wsgi.errors` object. See [1] for further
      information. The problem is that this log output is written to a file called
      `horizon.log`, causing it to processed as an 'Oslo log' in the Fluentd
      processing pipeline. Since the log format doesn't match the expected format,
      this results in parsing errors.
      
      This fix renames the log file and adjusts the format to match other WSGI
      applications. The logs are then processed in the same way as other WSGI
      application logs, resolving the issue.
      
      [1] https://modwsgi.readthedocs.io/en/master/user-guides/debugging-techniques.html
      
      Change-Id: I93777d1c53920f5470c78356e6b3a4064fbe04b4
      Closes-Bug: #1898174
      4168b46c
    • Matt Crees's avatar
      Revert "Enable RabbitMQ HA queues by default" · cdda49ec
      Matt Crees authored
      This reverts commit b86c304a.
      
      Reason for revert: We want to enable Quorum Queues by default in Caracal, without requiring two queue migrations between releases. See etherpad for details: https://etherpad.opendev.org/p/kolla-ansible-rmq-quorum-queues-proposal
      
      Change-Id: Ia19ab97f538125475297976347c5da332a7fdda7
      cdda49ec
  21. Nov 22, 2023
  22. Nov 21, 2023
  23. Nov 17, 2023
Loading