Commits · 16d0d4c3616054437f0b19b1e217c7cd535f01d5 · Very Demiurge Very Mindful / Kolla Ansible

Sep 09, 2019

Update python_path for "source" install type · 16d0d4c3

chenxing authored 5 years ago

Both ubuntu source and binary install type support python3 now,
python_path should be updated.

Depends-On: https://review.opendev.org/675581
Partially Implements: blueprint python3-support

Change-Id: I4bf721b44220bde2d25d4d985f5ca411699a5a72

16d0d4c3

Aug 08, 2019

Fix FWaaS service provider (v2, Stein issue) · 85a5fb55

Radosław Piliszek authored 5 years ago

Because we merged both [1] and [2] in master,
we got broken FWaaS.
This patch unbreaks it and is required to backport
to Stein due to [2] backport waiting for merge,
while [1] is already backported.

[1] https://review.opendev.org/661704
[2] https://review.opendev.org/668406



Change-Id: I74427ce9b937c42393d86574614603bd788606af
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

85a5fb55

Support namespacing RabbitMQ logs · 339ea2bd

Doug Szumski authored 5 years ago

The RabbitMQ role supports namespacing the service via the
project_name. For example, if you change the project_name, the
container name and config directory will be renamed accordingly. However
the log folder is currently fixed, even though the service tries to
write to one named after the project_name. This change fixes that.

Whilst you might generally use vhosts, running multiple RabbitMQ
services on a single node is useful at the very least for testing,
or for running 'outward RabbitMQ' on the same node.

This change is part of the work to support Cells v2.

Partially Implements: blueprint support-nova-cells
Change-Id: Ied2c24c01571327ea532ba0aaf2fc5e89de8e1fb

339ea2bd

Aug 07, 2019

Add support for sha256 in ceph key distribution · ad9e8786

Michal Nasiadka authored 5 years ago

- add support for sha256 in bslurp module
- change sha1 to sha256 in ceph-mon ansible role

Depends-On: https://review.opendev.org/655623
Change-Id: I25e28d150f2a8d4a7f87bb119d9fb1c46cfe926f
Closes-Bug: #1826327

ad9e8786

Stop using MountFlags=shared in Docker configuration · 35941738

Marcin Juszkiewicz authored 5 years ago

According to Docker upstream release notes [1] MountFlags should be
empty.

1. https://docs.docker.com/engine/release-notes/#18091

"Important notes about this release

In Docker versions prior to 18.09, containerd was managed by the Docker
engine daemon. In Docker Engine 18.09, containerd is managed by systemd.
Since containerd is managed by systemd, any custom configuration to the
docker.service systemd configuration which changes mount settings (for
example, MountFlags=slave) breaks interactions between the Docker Engine
daemon and containerd, and you will not be able to start containers.

Run the following command to get the current value of the MountFlags
property for the docker.service:

sudo systemctl show --property=MountFlags docker.service
MountFlags=

Update your configuration if this command prints a non-empty value for
MountFlags, and restart the docker service."

Closes-bug: #1833835

Change-Id: I4f4cbb09df752d00073a606463c62f0a6ca6c067

35941738

Aug 06, 2019

Remove support for Docker legacy packages · f63e3678

Mark Goddard authored 5 years ago

Docker is now always installed using the community edition (CE)
packages.

Change-Id: I8c3fe44fd9d2da99b5bb1c0ec3472d7e1b5fb295

f63e3678

Aug 05, 2019

Support mon and osd to be named with hostname · cd519db1

wangwei authored 5 years ago

In the current deployment of ceph, the node name of osd and the name
of mon are both IP, and other daemons use hostname.

This commit adds support for naming mon and osd nodes using hostname,
and does not change the default ip-named way.

Change-Id: I22bef72dcd8fc8bcd391ae30e4643520250fd556

cd519db1

Add Kafka input to telegraf config · 93e86836
pangliye authored 5 years ago
```
Change-Id: I9a8d3dc5f311d4ea4e5d9b03d522632abc66a7ac
```
93e86836

Do not require EPEL repo on RHEL-based target hosts · 67cedb7a

Radosław Piliszek authored 5 years ago

This change makes kolla-ansible more compatible with
RHEL which does not provide epel-release package.

EPEL was required to install simplejson from rpm
which was an ansible requirement when used python
version was below 2.5 ([1]). This has been obsolete for
quite a time so it's a good idea to get rid of it.

This change includes update of docs to read more properly.

[1] https://docs.ansible.com/ansible/2.3/intro_installation.html



Change-Id: I825431d41fbceb824baff27130d64dabe4475d33
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

67cedb7a

ceph: fixes to deployment and upgrade · 826f6850

Radosław Piliszek authored 5 years ago

1) ceph-nfs (ganesha-ceph) - use NFSv4 only
This is recommended upstream.
v3 and UDP require portmapper (aka rpcbind) which we
do not want, except where Ubuntu ganesha version (2.6)
forces it by requiring enabled UDP, see [1].
The issue has been fixed in 2.8, included in CentOS.
Additionally disable v3 helper protocols and kerberos
to avoid meaningless warnings.

2) ceph-nfs (ganesha-ceph) - do not export host dbus
It is not in use. This avoids the temptation to try
handling it on host.

3) Properly handle ceph services deploy and upgrade
Upgrade runs deploy.
The order has been corrected - nfs goes after mds.
Additionally upgrade takes care of rgw for keystone
(for swift emulation).

4) Enhance ceph keyring module with error detection
Now it does not blindly try to create a keyring after
any failure. This used to hide real issue.

5) Retry ceph admin keyring update until cluster works
Reordering deployment caused issue with ceph cluster not being
fully operational before taking actions on it.

6) CI: Remove osd df from collected logs as it may hang CI
Hangs are caused by healthy MON and no healthy MGR.
A descriptive note is left in its place.

7) CI: Add 5s timeout to ceph informational commands
This decreases the timeout from the default 300s.

[1] https://review.opendev.org/669315



Change-Id: I1cf0ad10b80552f503898e723f0c4bd00a38f143
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

826f6850

Aug 02, 2019

Remove unnecessary option from group_vars/all.yml · a1ab06d2

chenxing authored 5 years ago

We often specific the project name after "{{ node_config_directory }}",
for example,
``{{ node_config_directory }}/cinder-api/:{{ container_config_directory }}/:ro``.
As the  "{{ project }}" option is not configured, This line was
generated with:
``/etc/kolla//cinder-api/:...``
There would be double slash exists. It's OK, but confusing.

Change-Id: I82e6a91b2c541e38cf8e97896842149b31244688
Closes-Bug: #1838259

a1ab06d2

Jul 30, 2019

Fix fluentd monasca pos path for Debian/Ubuntu x86_64 · 19b345de

Radosław Piliszek authored 5 years ago


Change-Id: I6d205fe327f198e699519ebe9d589b9ee77a62d2
Closes-Bug: #1837274
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

19b345de

Jul 23, 2019

Remove FWaaS v1 related options · 1537f6ed

Jeffrey Zhang authored 5 years ago

Neutron FWaaS v1 is deprecated and removed since stein cycle by [0]. So
remove related options in kolla.

[0] https://review.opendev.org/616410

Change-Id: Ia03e7979dd48bafb34c11edd08c2a2a87b949e0e

1537f6ed

Jul 18, 2019

Do not recreate Blazar DB if using preconfigured · 7d284761

Jason authored 6 years ago

Most other services already gate the DB bootstrap operations with the
'use_preconfigured_databases' variable; Blazar did not.

Change-Id: I772b1cb92612c7e6936f052ed9947f93582f264c

Unverified

7d284761

[gnocchi] Don't recursively modify file perms on start · 464fefb1

Jason Anderson authored 5 years ago

For deployments with a lot of Gnocchi data, this is a non-starter
(literally... the service basically can't start.) There maybe needs to
be a way to configure this, or only do it during deploy/bootstrap?
Unclear, but disabling for now; users can `chown -R gnocchi:gnocchi`
themselves in the meantime if need be.

Change-Id: I0bae6dfbbee9f63506c89bd6b392e7be07fd5930

Unverified

464fefb1

Fix glance bootstrap with file backend · 1abd15d4

Mark Goddard authored 5 years ago

Change https://review.opendev.org/#/c/670247/ attempted to fix glance
deployment with the file backend. However, it added a new bug by being
more strict about only generating configuration where the container will
be deployed. This means that the current method of running the glance
bootstrap container on any host in glance-api group could be broken,
since it needs the container configuration.

This change only runs the bootstrap container on hosts in the
glance_api_hosts list, which in the case of the file backend typically
only contains one host.

This change also fixes up some logic during rolling upgrade, where we
might not generate new configuration for the bootstrap host.

Change-Id: I83547cd83b06ddefb3a9e1f39844537bdb32bd7f
Related-Bug: #1836151

1abd15d4

Fix handling of docker restart policy · 6a737b19

Radosław Piliszek authored 5 years ago

Docker has no restart policy named 'never'. It has 'no'.
This has bitten us already (see [1]) and might bite us again whenever
we want to change the restart policy to 'no'.

This patch makes our docker integration honor all valid restart policies
and only valid restart policies.
All relevant docker restart policy usages are patched as well.

I added some FIXMEs around which are relevant to kolla-ansible docker
integration. They are not fixed in here to not alter behavior.

[1] https://review.opendev.org/667363



Change-Id: I1c9764fb9bbda08a71186091aced67433ad4e3d6
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

6a737b19

Jul 16, 2019

ceph-nfs: Add rpcbind to Ubuntu host bootstrap · efcaf400

Michal Nasiadka authored 5 years ago

* Ubuntu ships with nfs-ganesha 2.6.0, which requires to do an rpcbind
udp test on startup (was fixed later)
* Add rpcbind package to be installed by kolla-ansible bootstrap when
ceph_nfs is enabled
* Update Ceph deployment docs with a note

Change-Id: Ic19264191a0ed418fa959fdc122cef543446fbe5

efcaf400

Jul 12, 2019

Fix ironic inspector iPXE boot with UEFI · 7b939756

Mark Goddard authored 5 years ago

The ironic inspector iPXE configuration includes the following kernel
argument:

initrd=agent.ramdisk

However, the ramdisk is actually called ironic-agent.initramfs, so the
argument should be:

initrd=ironic-agent.initramfs

In BIOS boot mode this does not cause a problem, but for compute nodes
with UEFI enabled, it seems to be more strict about this, and fails to
boot.

Change-Id: Ic84f3b79fdd3cd1730ca2fb79c11c7a4e4d824de
Closes-Bug: #1836375

7b939756

During deploy, always sync DB · d5e5e885

Mark Goddard authored 5 years ago

A common class of problems goes like this:

* kolla-ansible deploy
* Hit a problem, often in ansible/roles/*/tasks/bootstrap.yml
* Re-run kolla-ansible deploy
* Service fails to start

This happens because the DB is created during the first run, but for some
reason we fail before performing the DB sync. This means that on the second run
we don't include ansible/roles/*/tasks/bootstrap_service.yml because the DB
already exists, and therefore still don't perform the DB sync. However this
time, the command may complete without apparent error.

We should be less careful about when we perform the DB sync, and do it whenever
it is necessary. There is an argument for not doing the sync during a
'reconfigure' command, although we will not change that here.

This change only always performs the DB sync during 'deploy' and
'reconfigure' commands.

Change-Id: I82d30f3fcf325a3fdff3c59f19a1f88055b566cc
Closes-Bug: #1823766
Closes-Bug: #1797814

d5e5e885

Jul 11, 2019

Fix glance with file backend · 602f89ba

Mark Goddard authored 5 years ago

Since https://review.opendev.org/647699/, we lost the logic to only
deploy glance-api on a single host when using the file backend.

This code was always a bit custom, and would be better supported by
using the 'host_in_groups' pattern we have in a few other places where a
single group name does not describe the placement of containers for a
service.

Change-Id: I21ce4a3b0beee0009ac69fecd0ce24efebaf158d
Closes-Bug: #1836151

602f89ba

Jul 10, 2019

Do not require valid migration_interface for controllers · b166d255

Radosław Piliszek authored 5 years ago

Controllers lacking compute should not be required to provide
valid migration_interface as it is not used there (and prechecks
do not check that either).

Inclusion of libvirt conf section is now conditional on service type.
libvirt conf section has been moved to separate included file to
avoid evaluation of the undefined variable (conditional block did not
prevent it and using 'default' filter may hide future issues).
See https://github.com/ansible/ansible/issues/58835


Additionally this fixes the improper nesting of 'if' blocks for libvirt.

Change-Id: I77af534fbe824cfbe95782ab97838b358c17b928
Closes-Bug: #1835713
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

b166d255

Enhance merge_* action plugins to allow expected relative includes · 0c00915c

Radosław Piliszek authored 5 years ago


This mimics behavior of core 'template' module to allow relative
includes from the same dir as merged template, base dir of
playbook/role (usually role for us) and its 'templates' subdir.

Additionally old unused code was removed.

Change-Id: I83804d3cf5f17eb2302a2dfe49229c6277b1e25f
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

0c00915c

Add 'allow *' to getting ceph mds keyring · 4e3054b5

Michal Nasiadka authored 5 years ago

* Sometimes getting/creating ceph mds keyring fails, similar to https://tracker.ceph.com/issues/16255

Change-Id: I47587cbeb8be0e782c13ba7f40367409e2daa8a8

4e3054b5

Jul 08, 2019

Fix nova deploy with Ansible<2.8 · 5be093ac

Mark Goddard authored 5 years ago

Due to a bug in ansible, kolla-ansible deploy currently fails in nova
with the following error when used with ansible earlier than 2.8:

TASK [nova : Waiting for nova-compute services to register themselves]
*********
task path:
/home/zuul/src/opendev.org/openstack/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml:30
fatal: [primary]: FAILED! => {
    "failed": true,
    "msg": "The field 'vars' has an invalid value, which
        includes an undefined variable. The error was:
        'nova_compute_services' is undefined\n\nThe error
        appears to have been in
        '/home/zuul/src/opendev.org/openstack/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml':
        line 30, column 3, but may\nbe elsewhere in the file
        depending on the exact syntax problem.\n\nThe
        offending line appears to be:\n\n\n- name: Waiting
        for nova-compute services to register themselves\n ^
            here\n"
}

Example:
http://logs.openstack.org/00/669700/1/check/kolla-ansible-centos-source/81b65b9/primary/logs/ansible/deploy

This was caused by
https://review.opendev.org/#/q/I2915e2610e5c0b8d67412e7ec77f7575b8fe9921,
which hits upon an ansible bug described here:
https://github.com/markgoddard/ansible-experiments/tree/master/05-referencing-registered-var-do-until.

We can work around this by not using an intermediary variable.

Change-Id: I58f8fd0a6e82cb614e02fef6e5b271af1d1ce9af
Closes-Bug: #1835817

5be093ac

Handle more return codes from nova-status upgrade check · c68ed4dd

Mariusz authored 5 years ago

In a single controller scenario, the "Upgrade status check result"
does nothing because the previous task can only succeed when
`nova-status upgrade check` returns code 0. This change allows this
command to fail, so that the value of returned code stored in
`nova_upgrade_check_stdout` can then be analysed.

This change also allows for warnings (rc 1) to pass.
Closes-Bug: 1834647

Change-Id: I6f5e37832f43f23604920b9d890cc505ca924ff9

c68ed4dd

Jul 05, 2019

Fixes for MariaDB bootstrap and recovery · 86f373a1

Mark Goddard authored 5 years ago

* Fix wsrep sequence number detection. Log message format is
  'WSREP: Recovered position: <UUID>:<seqno>' but we were picking out
  the UUID rather than the sequence number. This is as good as random.

* Add become: true to log file reading and removal since
  I4a5ebcedaccb9261dbc958ec67e8077d7980e496 added become: true to the
  'docker cp' command which creates it.

* Don't run handlers during recovery. If the config files change we
  would end up restarting the cluster twice.

* Wait for wsrep recovery container completion (don't detach). This
  avoids a potential race between wsrep recovery and the subsequent
  'stop_container'.

* Finally, we now wait for the bootstrap host to report that it is in
  an OPERATIONAL state. Without this we can see errors where the
  MariaDB cluster is not ready when used by other services.

Change-Id: Iaf7862be1affab390f811fc485fd0eb6879fd583
Closes-Bug: #1834467

86f373a1

Add mon address to ceph release version check · b2c17d60
Michal Nasiadka authored 5 years ago
```
Change-Id: Ia4801d09ff1165c44561fd286fc57e903da2c602
```
b2c17d60

Jul 04, 2019

Deprecate Ceph deployment · e6d0e610

Mark Goddard authored 5 years ago

There are now several good tools for deploying Ceph, including Ceph
Ansible and ceph-deploy. Maintaining our own Ceph deployment is a
significant maintenance burden, and we should focus on our core mission
to deploy OpenStack. Given that this is a significant part of kolla
ansible currently we will need a long deprecation period and a migration
path to another tool.

Change-Id: Ic603c85c04d8794580a19f9efaa7a8589565f4f6
Partially-Implements: blueprint remove-ceph

e6d0e610

Add parameters to configure number of processes and threads of horizon · dc3489df
Christian Berendt authored 5 years ago
```
Change-Id: Ib5490d504a5b7c9a37dda7babf1257aa661c11de
```
dc3489df

Wait for all compute services before cell discovery · c38dd767

Mark Goddard authored 5 years ago

There is a race condition during nova deploy since we wait for at least
one compute service to register itself before performing cells v2 host
discovery.  It's quite possible that other compute nodes will not yet
have registered and will therefore not be discovered. This leaves them
not mapped into a cell, and results in the following error if the
scheduler picks one when booting an instance:

Host 'xyz' is not mapped to any cell

The problem has been exacerbated by merging a fix [1][2] for a nova race
condition, which disabled the dynamic periodic discovery mechanism in
the nova scheduler.

This change fixes the issue by waiting for all expected compute services
to register themselves before performing host discovery. This includes
both virtualised compute services and bare metal compute services.

[1] https://bugs.launchpad.net/kolla-ansible/+bug/1832987
[2] https://review.opendev.org/665554

Change-Id: I2915e2610e5c0b8d67412e7ec77f7575b8fe9921
Closes-Bug: #1835002

c38dd767

Jul 02, 2019

Cloudkitty InfluxDB Storage backend via Kolla-ansible · 97cb30cd

Rafael Weingärtner authored 5 years ago

This proposal will add support to Kolla-Ansible for Cloudkitty
 InfluxDB storage system deployment. The feature of InfluxDB as the
 storage backend for Cloudkitty was created with the following commit
 https://github.com/openstack/cloudkitty/commit/
 c4758e78b49386145309a44623502f8095a2c7ee

Problem Description
===================

With the addition of support for InfluxDB in Cloudkitty, which is
achieving general availability via Stein release, we need a method to
easily configure/support this storage backend system via Kolla-ansible.

Kolla-ansible is already able to deploy and configure an InfluxDB
system. Therefore, this proposal will use the InfluxDB deployment
configured via Kolla-ansible to connect to CloudKitty and use it as a
storage backend.

If we do not provide a method for users (operators) to manage
Cloudkitty storage backend via Kolla-ansible, the user has to execute
these changes/configurations manually (or via some other set of
automated scripts), which creates distributed set of configuration
files, "configurations" scripts that have different versioning schemas
and life cycles.

Proposed Change
===============

Architecture
------------

We propose a flag that users can use to make Kolla-ansible configure
CloudKitty to use InfluxDB as the storage backend system. When
enabling this flag, Kolla-ansible will also enable the deployment of
the InfluxDB via Kolla-ansible automatically.

CloudKitty will be configured accordingly to [1] and [2]. We will also
externalize the "retention_policy", "use_ssl", and "insecure", to
allow fine granular configurations to operators. All of these
configurations will only be used when configured; therefore, when they
are not set, the default value/behavior defined in Cloudkitty will be
used. Moreover, when we configure "use_ssl" to "true", the user will
be able to set "cafile" to a custom trusted CA file. Again, if these
variables are not set, the default ones in Cloudkitty will be used.

Implementation
--------------
We need to introduce a new variable called
`cloudkitty_storage_backend`. Valid options are `sqlalchemy` or
`influxdb`. The default value in Kolla-ansible is `sqlalchemy` for
backward compatibility. Then, the first step is to change the
definition for the following variable:
`/ansible/group_vars/all.yml:enable_influxdb: "{{ enable_monasca |
bool }}"`

We also need to enable InfluxDB when CloudKitty is configured to use
it as the storage backend. Afterwards, we need to create tasks in
CloudKitty configurations to create the InfluxDB schema and configure
the configuration files accordingly.

Alternatives
------------
The alternative would be to execute the configurations manually or
handle it via a different set of scripts and configurations files,
which can become cumbersome with time.

Security Impact
---------------
None identified by the author of this spec

Notifications Impact
--------------------
Operators that are already deploying CloudKitty with InfluxDB as
storage backend would need to convert their configurations to
Kolla-ansible (if they wish to adopt Kolla-ansible to execute these
tasks).

Also, deployments (OpenStack environments) that were created with
Cloudkitty using storage v1 will need to migrate all of their data to
V2 before enabling InfluxDB as the storage system.

Other End User Impact
---------------------
None.

Performance Impact
------------------
None.

Other Deployer Impact
---------------------
New configuration options will be available for CloudKitty.
* cloudkitty_storage_backend
* cloudkitty_influxdb_retention_policy
* cloudkitty_influxdb_use_ssl
* cloudkitty_influxdb_cafile
* cloudkitty_influxdb_insecure_connections
* cloudkitty_influxdb_name

Developer Impact
----------------
None

Implementation
==============

Assignee
--------
* `Rafael Weingärtner <rafaelweingartne>`

Work Items
----------
 * Extend InfluxDB "enable/disable" variable
 * Add new tasks to configure Cloudkitty accordingly to these new
 variables that are presented above
 * Write documentation and release notes

Dependencies
============
None

Documentation Impact
====================
New documentation for the feature.

References
==========
[1] `https://docs.openstack.org/cloudkitty/latest/admin/configuration/storage.html#influxdb-v2`
[2] `https://docs.openstack.org/cloudkitty/latest/admin/configuration/collector.html#metric-collection`



Change-Id: I65670cb827f8ca5f8529e1786ece635fe44475b0
Signed-off-by: Rafael Weingärtner <rafael@apache.org>

97cb30cd

Add upgrade-bifrost command · 9cac1137

Mark Goddard authored 5 years ago

This performs the same as a deploy-bifrost, but first stops the
bifrost services and container if they are running.

This can help where a docker stop may lead to an ungraceful shutdown,
possibly due to running multiple services in one container.

Change-Id: I131ab3c0e850a1d7f5c814ab65385e3a03dfcc74
Implements: blueprint bifrost-upgrade
Closes-Bug: #1834332

9cac1137

Jul 01, 2019

Bump minimum Ansible version to 2.5 · 0a769dc3

Mark Goddard authored 5 years ago

This is necessary for some Ansible tests which were renamed in 2.5 -
including 'version' and 'successful'.

Change-Id: Iacf88ef5589c7571fcf56ba8b99d3dbe76975195

0a769dc3

Jun 28, 2019

Specify endpoint when creating monasca user · 9074da56

Will Szumski authored 5 years ago

otherwise I'm seeing:

TASK [monasca : Creating the monasca agent user] ****************************************************************************************************************************
fatal: [monitor1]: FAILED! => {"changed": false, "module_stderr": "Shared connection to 172.16.3.24 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n F
ile \"/tmp/ansible_I0RmxQ/ansible_module_kolla_toolbox.py\", line 163, in <module>\r\n main()\r\n File \"/tmp/ansible_I0RmxQ/ansible_module_kolla_toolbox.py\", line 141,
in main\r\n output = client.exec_start(job)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/decorators.py\", line 19, in wrapped\r\n
return f(self, resource_id, *args, **kwargs)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/api/exec_api.py\", line 165, in exec_start\r\
n return self._read_from_socket(res, stream, tty)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/api/client.py\", line 377, in _read_from_
socket\r\n return six.binary_type().join(gen)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/socket.py\", line 75, in frames_iter\r\
n n = next_frame_size(socket)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/socket.py\", line 62, in next_frame_size\r\n data = read_exactly(socket, 8)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/socket.py\", line 47, in read_exactly\r\n next_data = read(socket, n - len(data))\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/socket.py\", line 31, in read\r\n return socket.recv(n)\r\nsocket.timeout: timed out\r\n", "msg": "MODULE FAILURE", "rc": 1}

when the monitoring nodes aren't on the public API network.

Change-Id: I7a93f69da0e02c9264da0b081d2e60626f899e3a

9074da56

Jun 27, 2019

Simplify handler conditionals · de00bf49

Mark Goddard authored 5 years ago

Currently, we have a lot of logic for checking if a handler should run,
depending on whether config files have changed and whether the
container configuration has changed. As rm_work pointed out during
the recent haproxy refactor, these conditionals are typically
unnecessary - we can rely on Ansible's handler notification system
to only trigger handlers when they need to run. This removes a lot
of error prone code.

This patch removes conditional handler logic for all services. It is
important to ensure that we no longer trigger handlers when unnecessary,
because without these checks in place it will trigger a restart of the
containers.

Implements: blueprint simplify-handlers

Change-Id: I4f1aa03e9a9faaf8aecd556dfeafdb834042e4cd

de00bf49

Add extra volumes support for services that were not previously supported · e610a73e

ZijianGuo authored 5 years ago

We don't add extra volumes support for all services in patch [1].
In order to unify the management of the volume, so we need add extra volumes
support for these services.

[1] https://opendev.org/openstack/kolla-ansible/commit/12ff28a69351cf8ab4ef3390739e04862ba76983



Change-Id: Ie148accdd8e6c60df6b521d55bda12b850c0d255
Partially-Implements: blueprint support-extra-volumes
Signed-off-by: ZijianGuo <guozijn@gmail.com>

e610a73e

Add support for neutron custom dnsmasq.conf · a3f1ded3
Christian Berendt authored 5 years ago
```
Change-Id: Ia7041be384ac07d0a790c2c5c68b1b31ff0e567a
```
a3f1ded3

Restart all nova services after upgrade · e6d2b922

Mark Goddard authored 5 years ago

During an upgrade, nova pins the version of RPC calls to the minimum
seen across all services. This ensures that old services do not receive
data they cannot handle. After the upgrade is complete, all nova
services are supposed to be reloaded via SIGHUP to cause them to check
again the RPC versions of services and use the new latest version which
should now be supported by all running services.

Due to a bug [1] in oslo.service, sending services SIGHUP is currently
broken. We replaced the HUP with a restart for the nova_compute
container for bug 1821362, but not other nova services. It seems we need
to restart all nova services to allow the RPC version pin to be removed.

Testing in a Queens to Rocky upgrade, we find the following in the logs:

Automatically selected compute RPC version 5.0 from minimum service
version 30

However, the service version in Rocky is 35.

There is a second issue in that it takes some time for the upgraded
services to update the nova services database table with their new
version. We need to wait until all nova-compute services have done this
before the restart is performed, otherwise the RPC version cap will
remain in place. There is currently no interface in nova available for
checking these versions [2], so as a workaround we use a configurable
delay with a default duration of 30 seconds. Testing showed it takes
about 10 seconds for the version to be updated, so this gives us some
headroom.

This change restarts all nova services after an upgrade, after a 30
second delay.

[1] https://bugs.launchpad.net/oslo.service/+bug/1715374
[2] https://bugs.launchpad.net/nova/+bug/1833542

Change-Id: Ia6fc9011ee6f5461f40a1307b72709d769814a79
Closes-Bug: #1833069
Related-Bug: #1833542

e6d2b922

Don't rotate keystone fernet keys during deploy · 09e29d0d

Mark Goddard authored 5 years ago

When running deploy or reconfigure for Keystone,
ansible/roles/keystone/tasks/deploy.yml calls init_fernet.yml,
which runs /usr/bin/fernet-rotate.sh, which calls keystone-manage
fernet_rotate.

This means that a token can become invalid if the operator runs
deploy or reconfigure too often.

This change splits out fernet-push.sh from the fernet-rotate.sh
script, then calls fernet-push.sh after the fernet bootstrap
performed in deploy.

Change-Id: I824857ddfb1dd026f93994a4ac8db8f80e64072e
Closes-Bug: #1833729

09e29d0d