Commits · 02ba8bb847fe01a88da6aaee9c20a64ec30ba08a · Very Demiurge Very Mindful / Kolla Ansible

Aug 19, 2020

Add workaround for keystonemiddleware/neutron memcached issue · 5a52d8e4

Pierre Riteau authored 4 years ago

There is an issue where keystonemiddleware connections to memcached from
neutron-server grow beyond configured values [1], eventually reaching
the maximum number of connections accepted by memcached servers. Other
services do not appear to be affected by this issue.

A workaround is to use the advanced memcached pool. Despite its
documentation claiming to only work with Python 2, it appears to work
fine on Python 3.

[1] https://bugs.launchpad.net/keystonemiddleware/+bug/1883659

Change-Id: Ifbbc2022839cbc575848d830600241c61603c80b
Closes-Bug: #1892210

5a52d8e4

Add cinder auth config to nova-cell nova.conf.j2 · de16013b

Jegor van Opdorp authored 4 years ago

Fixes an issue during deleting evacuated instances with encrypted block
devices.

Change-Id: I9b9b689ef7e1e41b597e2c5f6b96f3ed011193c5
Closes-Bug: 1891462
Related-Bug: 1850279

de16013b

Fix ownership and permissions of admin-openrc.sh · 16f97867

likui authored 4 years ago


Previously the post-deploy.yml playbook was executed with become: true,
and the admin-openrc.sh file templated without an owner or mode
specified. This resulted in admin-openrc.sh being owned by root with 644
permissions.

This change creates the file without become: true, and explicitly sets
the owner to the user executing Ansible, and the mode to 600.

Co-Authored-By: Mark Goddard <mark@stackhpc.com>

Closes-Bug: #1891704

Change-Id: Iadf43383a7f2bf377d4666a55a38d92bd70711aa

16f97867

Aug 17, 2020

Add support to use bifrost-deploy behind proxy · 9da39345

Bartosz Bezak authored 4 years ago

Change-Id: If90c2dfd32c8bc50671f6dd38e5a82b434c07151
Depends-On: https://review.opendev.org/#/c/720338

9da39345

Deprecate kolla_internal_address variable · b4603d92

chenxing authored 4 years ago

The "kolla_internal_address" variable is not documented or defined
anywhere.  When "kolla_internal_vip_address" is undefined, the error
message is about "kolla_internal_address", which will confuse operators.

This change deprecates "kolla_internal_address", and adds a default
value for "kolla_internal_vip_address" when "kolla_internal_address" is
undefined.

Change-Id: I09694b38420ea67896bb8cf4ffd7ce6f131af10e
Closes-Bug: #1864206

b4603d92

Aug 13, 2020

Deploy neutron-mlnx-agent and neutron-eswitchd containers · 4809462f
Bharat Kunwar authored 4 years ago
```
Change-Id: I173669bdf92b1f2ea98907ba16808ca3c914944c
```
4809462f

Prevent overwriting existing Keystone Fernet keys · 8389140f

Mark Goddard authored 4 years ago

Steps to reproduce:

* Deploy a cloud
* Add another controller to the inventory
* Deploy to the new controller using --limit:

kolla-ansible deploy --limit new-controller

Expected results:

The new controller uses the cluster's existing fernet keys.

Actual results:

New fernet keys are generated on the new controller, and pushed out to
the existing controllers. This invalidates tokens created from those
keys.

This change prevents the above scenario from happening, by failing the
deployment if there are no hosts with existing Ferney keys to
distribute, and not all Keystone hosts are in the target host list.

Closes-Bug: #1891364

Change-Id: If0c0e038b77fc010a3a017f9841a674d53b16457

8389140f

Add Keep Alive Timeout for httpd · 19b028e6

James Kirsch authored 4 years ago

This patch introduces a global keep alive timeout value for services
that leverage httpd + wsgi to handle http/https requests. The default
value is one minute.

Change-Id: Icf7cb0baf86b428a60a7e9bbed642999711865cd
Partially-Implements: blueprint add-ssl-internal-network

19b028e6

Aug 12, 2020

Revert "Fix post-deploy mode" · 137f79e4

Radosław Piliszek authored 4 years ago

This fix was premature as it completely ignores
the previously-respected umask.

Let's discuss a proper fix and revert this one
since CI is fixed elsewhere [1].

[1] https://review.opendev.org/743502

This reverts commit 87efdce2.

Change-Id: If38adbf124e793574a21ae986f9ee146d587f820

137f79e4

Aug 11, 2020

Fix post-deploy mode · 87efdce2

Radosław Piliszek authored 4 years ago

Ansible changed the default mode for files, even in stable
releases. [1]

This change restores the previous default (with the common
umask).

[1] https://github.com/ansible/ansible/pull/70221

Change-Id: I0f81214b4f95fe8a378844745ebc77f3c43027ab
Closes-Bug: #1891145

87efdce2

Aug 10, 2020

Add trove-guestagent.conf · 38881963

likui authored 4 years ago

Add trove-guestagent.conf templates for trove-guestagent service.
Default the Guest Agent config file to be injected during instance creation.

Change-Id: Id0750b84fef8e19658b27f8ae16a857e1394216e

38881963

Aug 06, 2020

Enable glance role to copy extra configuration · 6033b71d

nikparasyr authored 4 years ago


Glance role copies glance-image-import.conf
when enabled to allow configuration of
glance interoperable image import. Property
protection can be enabled and file is copied.

Change-Id: I5106675da5228a5d7e630871f0882269603e6571
Closesl-Bug: #1889272
Signed-off-by: nikparasyr <nik.parasyr@protonmail.com>

6033b71d

Jul 30, 2020

Fix Masakari role missing deploy-containers · 5d3ca8b0

Radosław Piliszek authored 4 years ago

Masakari was introduced parallelly to deploy-containers action and
so we missed to add this functionality to it.

Change-Id: Ibef198d20d481bc92b38af786cdf0292b246bb12
Closes-Bug: #1889611

5d3ca8b0

linuxbridge: Fix name of securitygroup section · 07f67f1b

Nick Jones authored 4 years ago


With an incorrectly named section, whatever's defined in here is
actually ignored which can result in unexpected behaviour.

Closes-Bug: 1889455

Change-Id: Ib2e2b53e9a3c0e62a2e997881c0cd1f92acfb39c
Signed-off-by: Nick Jones <nick@dischord.org>

07f67f1b

Jul 28, 2020

Add timesync prechecks · 3018199f

Radosław Piliszek authored 4 years ago

If not running containerised chrony, we need to check that host
has its own means of system clock synchronization.

Change-Id: I31b3e9ed625d63a4bf82c674593522268c20ec4c
Partial-Bug: #1885689

3018199f

Jul 27, 2020

Remove Hyper-V integration · 6eb02245
Christian Berendt authored 4 years ago
```
Change-Id: I2e22ec47f644de2f1509a0111c9e1fffe8da0a1a
```
6eb02245

[docker] Added a new flag to disable default iptables rules · fc7ce6ca

Dincer Celik authored 5 years ago

Docker is manipulating iptables rules by default to provide network
isolation, and this might cause problems if the host already has an
iptables-based firewall.

This change introduces docker_disable_default_iptables_rules to
disable the iptables manipulation by putting "iptables: false" [1] to
daemon.json

For better defaults, this feature will be enabled by default in
Victoria.

[1] https://docs.docker.com/network/iptables/

Closes-Bug: #1849275

Change-Id: I165199fc98fb98f227f2a20284e1bab03ef65b5b

fc7ce6ca

Improve Grafana DB bootstrap · 2c730590

Doug Szumski authored 4 years ago

This fixes an issue where multiple Grafana instances would race
to bootstrap the Grafana DB. The following changes are made:

- Only start additional Grafana instances after the DB has been
  configured.

- During upgrade, don't allow old instances to run with an
  upgraded DB schema.

Change-Id: I3e0e077ba6a6f43667df042eb593107418a06c39
Closes-Bug: #1888681

2c730590

Set Kafka default replication factor · a273e28e

Doug Szumski authored 4 years ago

This ensures that when using automatic Kafka topic creation, with more than one
node in the Kafka cluster, all partitions in the topic are automatically
replicated. When a single node goes down in a >=3 node cluster, these topics will
continue to accept writes providing there are at least two insync replicas.

In a two node cluster, no failures are tolerated. In a three node cluster, only a
single node failure is tolerated. In a larger cluster the configuration may need
manual tuning.

This configuration follows advice given here:

[1] https://docs.cloudera.com/documentation/kafka/1-2-x/topics/kafka_ha.html#xd_583c10bfdbd326ba-590cb1d1-149e9ca9886--6fec__section_d2t_ff2_lq

Closes-Bug: #1888522

Change-Id: I7d38c6ccb22061aa88d9ac6e2e25c3e095fdb8c3

a273e28e

fluentd: log to a file instead of stdout · 696533f2

Michal Nasiadka authored 4 years ago

fluentd logs currently to stdout, which is known to produce big docker logs
in /var/lib/docker. This change makes fluentd to log to /var/log/kolla/fluentd.

Closes-Bug: #1888852
Change-Id: I8fe0e54cb764a26d26c6196cef68aadc6fd57b90

696533f2

Jul 23, 2020

Masakari: copy TLS certificates into containers · 0b4c8a3c

Mark Goddard authored 4 years ago

From Ussuri, if CA certificates are copied into
/etc/kolla/certificates/ca/, these should be copied into all containers.
This is not being done for masakari currently.

Additionally, we are not setting the [DEFAULT] nova_ca_certificates_file
option in masakari.conf. This depends on masakari bug 1873736 being
fixed to work.

This change fixes these issues.

Change-Id: I9a3633f58e5eb734fa32edc03a3022a500761bbb
Closes-Bug: #1888655

0b4c8a3c

Jul 22, 2020

Fix some CloudKitty API responses when behind SSL · cd55c8f4

Pierre Riteau authored 4 years ago

Some CloudKitty API responses include a Location header using http
instead of https. Seen with `openstack rating module enable hashmap`.

Change-Id: I11158bbfd2006e3574e165b6afc9c223b018d4bc
Closes-Bug: #1888544

cd55c8f4

Jul 21, 2020

Configure prometheus-openstack-exporter to use internal endpoints · cf97aeeb
Pierre Riteau authored 4 years ago
```
Change-Id: Ia134a518b63bb59cfad631cc488181f5245160e6
```
cf97aeeb

fix deploy freezer failed when kolla_dev_mod enabled · 7dc47132

wu.chunyang authored 4 years ago

we should clone freezer code before run bootstray,
otherwise, the directory /opt/stack/freezer which is empty will
mount into freezer_api container.

Closes-Bug: #1888242

Change-Id: I7c22dd380fd5b1dff7b421109c4ae37bab11834a

7dc47132

Jul 17, 2020

Make /dev/kvm permissions handling more robust · 202365e7

Radosław Piliszek authored 4 years ago

This makes use of udev rules to make it smarter and override
host-level packages settings.
Additionally, this masks Ubuntu-only service that is another
pain point in terms of /dev/kvm permissions.
Fingers crossed for no further surprises.

Change-Id: I61235b51e2e1325b8a9b4f85bf634f663c7ec3cc
Closes-bug: #1681461

202365e7

Jul 15, 2020

Logstash 6 support · 17d83326

Bartosz Bezak authored 4 years ago


Co-Authored-By: Doug Szumski <doug@stackhpc.com>
Closes-Bug: #1884090
Depends-On: https://review.opendev.org/#/c/736768

Change-Id: If2d0dd1739e484b14e3c15a185a236918737b0ab

17d83326

Jul 10, 2020

Evaluate PASSWORDS_FILE later · bf985930

Michal Nasiadka authored 4 years ago


Currently seting --configdir on kolla-ansible CLI doesn't set properly the path
for the passwords file.

Change-Id: I38d215b721ec256be6cfdd6313b5ffb90c2a3f4c
Closes-Bug: #1887180
Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>

bf985930

Jul 09, 2020

Fix Barbican client (Castellan) with TLS · 0e9a81fd

ramboman authored 4 years ago

The Castellan (Barbican client) has different parameters to control
the used CA file.
This patch uses them.
Moreover, this aligns Barbican with other services by defaulting
its client config to the internal endpoint.

See also [1].

[1] https://bugs.launchpad.net/castellan/+bug/1876102

Closes-Bug: #1886615

Change-Id: I6a174468bd91d214c08477b93c88032a45c137be

0e9a81fd

Jul 08, 2020

Remove the ml2_conf.ini merging for agents · c7d92ed6
gugug authored 4 years ago
```
planned removal

Change-Id: Ib37ea4d42f82096a682cebc724c45c9dd39c8b47
```
c7d92ed6

Load br_netfilter module in nova-cell role · 2f91be9f

Mark Goddard authored 4 years ago

The nova-cell role sets the following sysctls on compute hosts, which
require the br_netfilter kernel module to be loaded:

    net.bridge.bridge-nf-call-iptables
    net.bridge.bridge-nf-call-ip6tables

If it is not loaded, then we see the following errors:

    Failed to reload sysctl:
    sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory
    sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-ip6tables: No such file or directory

Loading the br_netfilter module resolves this issue.

Typically we do not see this since installing Docker and configuring it
to manage iptables rules causes the br_netfilter module to be loaded.
There are good reasons [1] to disable Docker's iptables management
however, in which case we are likely to hit this issue.

This change loads the br_netfilter module in the nova-cell role for
compute hosts.

[1] https://bugs.launchpad.net/kolla-ansible/+bug/1849275



Co-Authored-By: Dincer Celik <hello@dincercelik.com>

Change-Id: Id52668ba8dab460ad4c33fad430fc8611e70825e

2f91be9f

Jul 07, 2020

Fix incorrect value of [storage]/ceph_keyring in gnocchi.conf · 9a0f8c31

Pierre Riteau authored 4 years ago

The value should be the full path to the keyring file, not just the
name. Without this fix Gnocchi fails to connect to Ceph.

Change-Id: Iaa69b2096b09a448345de50911e21436875d48d6
Closes-Bug: #1886711

9a0f8c31

Performance: Run common role in a separate play · 56ae2db7

Mark Goddard authored 4 years ago

The common role was previously added as a dependency to all other roles.
It would set a fact after running on a host to avoid running twice. This
had the nice effect that deploying any service would automatically pull
in the common services for that host. When using tags, any services with
matching tags would also run the common role. This could be both
surprising and sometimes useful.

When using Ansible at large scale, there is a penalty associated with
executing a task against a large number of hosts, even if it is skipped.
The common role introduces some overhead, just in determining that it
has already run.

This change extracts the common role into a separate play, and removes
the dependency on it from all other roles. New groups have been added
for cron, fluentd, and kolla-toolbox, similar to other services. This
changes the behaviour in the following ways:

* The common role is now run for all hosts at the beginning, rather than
  prior to their first enabled service
* Hosts must be in the necessary group for each of the common services
  in order to have that service deployed. This is mostly to avoid
  deploying on localhost or the deployment host
* If tags are specified for another service e.g. nova, the common role
  will *not* automatically run for matching hosts. The common tag must
  be specified explicitly

The last of these is probably the largest behaviour change. While it
would be possible to determine which hosts should automatically run the
common role, it would be quite complex, and would introduce some
overhead that would probably negate the benefit of splitting out the
common role.

Partially-Implements: blueprint performance-improvements

Change-Id: I6a4676bf6efeebc61383ec7a406db07c7a868b2a

56ae2db7

Jul 03, 2020

Remove policy file from nova-conductor config.json template · c40e8065

Pierre Riteau authored 4 years ago

Change I810aad7d49db3f5a7fd9a2f0f746fd912fe03917 for supporting multiple
Nova cells updated the list of containers that require a policy file to
only include nova-api, nova-compute, and nova-compute-ironic.

The nova-conductor config.json template was left unchanged and fails to
copy the nova policy file into its container. This can be seen on a
fresh deployment, but might be missed on an upgrade if an older policy
file is still available in /etc/kolla/nova-conductor.

This commit removes the nova_policy_file block from the nova-conductor
config.json template, as it shouldn't be required.

Backport: ussuri, train
Change-Id: I17256b182d207aeba3f92c65a6d7cf3611180558
Closes-Bug: #1886170

c40e8065

Jul 02, 2020

Set a chunk size for Fluentd bulk log upload to Monasca · 2c919bc6

Stig Telfer authored 4 years ago


In Fluentd v0.12, both the in memory and file buffer chunk size default
to 8MB. In v1.0 the file buffer defaults to 256MB. This can exceed the
Monasca Log or Unified API maximum chunk size which is set to 10MB.
This can result in logs being rejected and filling the local buffer
on disk.

Change-Id: I9c495773db726a3c5cd94b819dff4141737a1d6e
Closes-Bug: #1885885
Co-Authored-By: Sebastian Luna Valero <sebastian.luna.valero@gmail.com>

2c919bc6

Jul 01, 2020

Use public interface for Magnum client and trustee Keystone interface · 78bb5942

Bharat Kunwar authored 4 years ago

While all other clients should use internalURL, the Magnum client itself
and Keystone interface for trustee credentials should be publicly
accessible (upstream default when no config is specified) since
instances need to be able to reach them.

Closes-Bug: #1885420
Change-Id: I74359cec7147a80db24eb4aa4156c35d31a026bf

78bb5942

Jun 30, 2020

Add support for encrypting etcd service · e2b9b206

James Kirsch authored 4 years ago

This patch introduces an optional backend encryption for etcd service.

Change-Id: Ia259f7844b868dbc418ace595c87eb1b278d3d38

e2b9b206

Fix the Elasticsearch Curator cron schedule run · 852c7a32

Radosław Piliszek authored 4 years ago

There were two issues with it. Lack of /usr/local/bin in PATH
for CentOS and wrong crontab path for Ubuntu/Debian.
This patch mirrors how it is handled in keystone.

Change-Id: Ib54b261e12c409d66b792648807646015826e83c
Closes-Bug: #1885732

852c7a32

Jun 29, 2020
- Add support for the Neutron service plugin "trunk" · 3ff15a8e
  Christian Berendt authored 4 years ago
  
  Change-Id: Ia22f286e85be90983ca79291b3a54596bba30d6c
  3ff15a8e
Jun 27, 2020

Fix etcd protocol configuration · a1584322

James Kirsch authored 4 years ago

The etcd service protocol is currently configured with internal_protocol.
The etcd service is not load balanced by a HAProxy container, so
there is no proxy layer to do TLS termination when internal_protocol
is configured to be "https".

Until the etcd service is configured to deploy with native TLS
termination, the etcd uses should be independent of
internal_protocol, and "http" by default.

Change-Id: I730c02331514244e44004aa06e9399c01264c65d
Closes-Bug: 1884137

a1584322

Jun 25, 2020

openvswitch: Use ansible_hostname for system-id · cecdb6a1

Michal Nasiadka authored 4 years ago

Currently openvswitch sets system-id based on inventory_hostname, but when
Ansible inventory contains ip addresses - then it will only take first ip
octet - resulting in multiple OVN chassis being named i.e. "10".
Then Neutron and OVN have problems functioning, because a chassis named "10"
will be created and deleted multiple times per second - this ends up in
ovsdb and neutron-server processes using up to 100% CPU.

Adding openvswitch role to ovn CI job triggers.

Change-Id: Id22eb3e74867230da02543abd93234a5fb12b31d
Closes-Bug: #1884734

cecdb6a1