Commits · 70b515bf1225e56b7df81677043d75be4bbb1ab4 · Very Demiurge Very Mindful / Kolla Ansible

Sep 16, 2019

Catch errors and changes in kolla_toolbox module · 70b515bf

Mark Goddard authored 5 years ago

The kolla_toolbox Ansible module executes as-hoc ansible commands in the
kolla_toolbox container, and parses the output to make it look as if
ansible-playbook executed the command. Currently however, this module
sometimes fails to catch failures of the underlying command, and also
sometimes shows tasks as 'ok' when the underlying command was changed.
This has been tested both before and after the upgrade to ansible 2.8.

This change fixes this issue by configuring ansible to emit output in
JSON format, to make parsing simpler. We can now pick up errors and
changes, and signal them to the caller.

This change also adds an ansible playbook, tests/test-kolla-toolbox.yml,
that can be executed to test the module. It's not currently integrated
with any CI jobs.

Note that this change cannot be backported as the JSON output callback
plugin was added in Ansible 2.5.

Change-Id: I8236dd4165f760c819ca972b75cbebc62015fada
Closes-Bug: #1844114

70b515bf

Sep 10, 2019

Configure Zun for Placement (Train+) · 0f5e0658

Hongbin Lu authored 5 years ago

After the integration with placement [1], we need to configure how
zun-compute is going to work with nova-compute.

* If zun-compute and nova-compute run on the same compute node,
  we need to set 'host_shared_with_nova' as true so that Zun
  will use the resource provider (compute node) created by nova.
  In this mode, containers and VMs could claim allocations against
  the same resource provider.
* If zun-compute runs on a node without nova-compute, no extra
  configuration is needed. By default, each zun-compute will create
  a resource provider in placement to represent the compute node
  it manages.

[1] https://blueprints.launchpad.net/zun/+spec/use-placement-resource-management

Change-Id: I2d85911c4504e541d2994ce3d48e2fbb1090b813

0f5e0658

Sep 05, 2019

Modernize the way of configuring Docker daemon · a5808ad8

Marcin Juszkiewicz authored 5 years ago

Instead of changing Docker daemon command line let's change config
for Docker instead. In /etc/docker/daemon.json file as it should be.

Custom Docker options can be set with 'docker_custom_config' variable.

Old 'docker_custom_option' is still present but should be avoided.

Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Change-Id: I1215e04ec15b01c0b43bac8c0e81293f6724f278

a5808ad8

Aug 22, 2019

Use fluentd image labels · 4180bee0

Michal Nasiadka authored 5 years ago

In order to orchestrate smooth transition to fluentd 0.14.x
aka 1.0 stable branch aka td-agent 3
from td-agent repository - use image labels (fluentd_version
and fluentd_binary).

Depends-On: https://review.opendev.org/676411
Change-Id: Iab8518c34ef876056c6abcdb5f2e9fc9f1f7dbdd

4180bee0

Aug 16, 2019

Check for CRITICAL, WARNING and ERROR log messages in CI · a14eee24
Mark Goddard authored 5 years ago
```
At the end of a CI run, check all log files.

Change-Id: I99afc1c5207757e35beabf7daebd86c56151c96d
```
a14eee24

CI: Zun jobs · d4de1d75

Radosław Piliszek authored 5 years ago

- Test Zun on CentOS too
- Make etcd change also trigger Zun jobs (like kuryr and zun)
- Test multinode Zun deployments instead of AIO
  (more likely to break)
- In Zun scenario, stop configuring docker for legacy swarm mode
  (Zun is no swarm)
- Separate test-zun.sh testing script
- Show appcontainer to see which node it has been started on

Change-Id: I289b1009fe00aedb9b78cbd83298b14da5fd9670
Depends-On: https://review.opendev.org/676736


Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

d4de1d75

CI: Add docker inspect output to docker_info logs · 8cf24bcc
Michal Nasiadka authored 5 years ago
```
Change-Id: I081f2f4762651bca935f08a67b20f21946aaf051
```
8cf24bcc

Aug 14, 2019

Testing Masakari role in gate · fbac54c5

Kien Nguyen authored 6 years ago

Add Masakari testing into the Gate.

Change-Id: I52df33f963e7d2ae4059887df3d24d9e6642134e
Depends-On: https://review.opendev.org/#/c/615469/
Depends-On: https://review.opendev.org/#/c/615715


Implements: blueprint ansible-masakari
Co-Authored-By: Gaëtan Trellu <gaetan.trellu@incloudus.com>

fbac54c5

Aug 06, 2019

CI: Sanity check that nodepool.private_ipv4 is assigned · eac1e479

Mark Goddard authored 5 years ago

During the MariaDB testing we saw a number of cases where this IP
address was not assigned to one or more hosts, which caused various
issues later on.

Change-Id: I61b54483e4553b926e9ddc0a8848b2daa6bc49f1

eac1e479

Aug 05, 2019

ceph: fixes to deployment and upgrade · 826f6850

Radosław Piliszek authored 5 years ago

1) ceph-nfs (ganesha-ceph) - use NFSv4 only
This is recommended upstream.
v3 and UDP require portmapper (aka rpcbind) which we
do not want, except where Ubuntu ganesha version (2.6)
forces it by requiring enabled UDP, see [1].
The issue has been fixed in 2.8, included in CentOS.
Additionally disable v3 helper protocols and kerberos
to avoid meaningless warnings.

2) ceph-nfs (ganesha-ceph) - do not export host dbus
It is not in use. This avoids the temptation to try
handling it on host.

3) Properly handle ceph services deploy and upgrade
Upgrade runs deploy.
The order has been corrected - nfs goes after mds.
Additionally upgrade takes care of rgw for keystone
(for swift emulation).

4) Enhance ceph keyring module with error detection
Now it does not blindly try to create a keyring after
any failure. This used to hide real issue.

5) Retry ceph admin keyring update until cluster works
Reordering deployment caused issue with ceph cluster not being
fully operational before taking actions on it.

6) CI: Remove osd df from collected logs as it may hang CI
Hangs are caused by healthy MON and no healthy MGR.
A descriptive note is left in its place.

7) CI: Add 5s timeout to ceph informational commands
This decreases the timeout from the default 300s.

[1] https://review.opendev.org/669315



Change-Id: I1cf0ad10b80552f503898e723f0c4bd00a38f143
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

826f6850

Jul 26, 2019

CI: Fix multinode job glance issues · d0317260

Radosław Piliszek authored 5 years ago


This actually replaces two ad-hoc fixes with a more unified
solution (with comment for posterity).

Change-Id: I62f57cb489c900f68a0c7aeb3e20e4715c0e2661
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

d0317260

CI: fix checks for upgrade and multinode jobs · 93ac16ae

Radosław Piliszek authored 5 years ago


Multinode jobs did not run sanity checks for all the hosts,
only primary. Now they check all.

Additionally upgrades are now checked using the proper
(pre-upgrade) scripts (not that it matters too much as they
are the same atm) and both checks are done, not only failures,
but also config.

Change-Id: I10552e256edbddd5b1f8a8a7f8805262e72ce8d8
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

93ac16ae

Jul 18, 2019

Fix handling of docker restart policy · 6a737b19

Radosław Piliszek authored 5 years ago

Docker has no restart policy named 'never'. It has 'no'.
This has bitten us already (see [1]) and might bite us again whenever
we want to change the restart policy to 'no'.

This patch makes our docker integration honor all valid restart policies
and only valid restart policies.
All relevant docker restart policy usages are patched as well.

I added some FIXMEs around which are relevant to kolla-ansible docker
integration. They are not fixed in here to not alter behavior.

[1] https://review.opendev.org/667363



Change-Id: I1c9764fb9bbda08a71186091aced67433ad4e3d6
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

6a737b19

Jul 16, 2019

CI: clean up requirements installation · 8a543098

Radosław Piliszek authored 5 years ago


We install kolla-ansible requirements in Zuul's Ansible playbooks.
This patch cleans up the installation in scripts so that they are
only concerned with auxiliary requirements:
- ansible (since we do not track it in requirements)
- ara (for log summaries)
- openstack clients (for first init and tests after deployment)

Additionally this patch installs openstack clients in a separate
virtualenv.
Note that all kolla-ansible requirements, ansible and ara are still
installed system-wide.

Change-Id: Iac04082ad39a9d823c515ba11c5db9af50ed225f
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

8a543098

Add ceph-mds/rgw/nfs to gate · a77b0f62

Michal Nasiadka authored 5 years ago

Depends-On: https://review.opendev.org/669315
Change-Id: I6946290cd890f74c59ed5394e8382a8b75c0c4cd

a77b0f62

Jul 09, 2019

Trivial fix: log stderr of init-runonce as well · 53ea3fe4

Radosław Piliszek authored 5 years ago


Missed by me in a recent merge.

TrivialFix
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

Change-Id: I83b1e84a43f014ce20be8677868be3f66017e3c2

53ea3fe4

Jul 04, 2019

CI: Pull images before upgrade · f11d3c69

Mark Goddard authored 5 years ago

This is the documented procedure.

Change-Id: I09ca99e92b112621d66b564a88b13658632242f5

f11d3c69

Jul 03, 2019

CI: Collect docker and systemd configs · 2430c290

Radosław Piliszek authored 5 years ago


Change-Id: I59a05e8a0a2656596d2cced61bd98f2aa790d60b
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

2430c290

Jul 02, 2019

CI: Keep stderr in ansible logs · b9aa8b38

Radosław Piliszek authored 5 years ago


Otherwise ara had only the stderr part and logs only the
stdout part which made ordered analysis harder.

Additionally add -vvv for the bootstrap-servers run.

Change-Id: Ia42ac9b90a17245e9df277c40bda24308ebcd11d
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

b9aa8b38

Jul 01, 2019

CI: Use template-overrides.j2 from kolla · 20ab480c

Radosław Piliszek authored 5 years ago

Some kolla-ansible jobs failed due to using external mirrors
instead of local ones.
This was due to not using the template override provided by kolla.
This patch fixes that.

Depends-On: https://review.opendev.org/668226


Change-Id: I27f714fdf05e521aa8ce25c5683a452ceb35eeb8
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

20ab480c

Add note to CI config regarding registry during upgrade · a0bdc366

Radosław Piliszek authored 5 years ago


Change-Id: Ifc898015b9b523ef4c50fc969e464f05762f2151
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

a0bdc366

Revert "CI - remove unnecessary logic when building images for upgrade" · acac1279
Mark Goddard authored 5 years ago
```
This reverts commit 8ce5ffd0.

Change-Id: I81ce7c007ff267ebbbb721bcdb7eebc0dd575bf8
```
acac1279

Jun 28, 2019

Exit on failure in init-runonce · bc08b44f

Mark Goddard authored 5 years ago

Previously we sourced this script in tests/deploy.sh, but this was
recently changed. Following that change we lost the errexit setting,
meaning we ignore errors in init-runonce.

Adding errexit in the script itself means that all callers get error
handling.

Also log init-runonce output.

TrivialFix

Change-Id: I9b35bd5f0f76eec26ddd968d093a3a5fd55a7ce2

bc08b44f

Jun 27, 2019

Fix conditionals in CI playbook · 3b218fd0

Mark Goddard authored 5 years ago

These were not templated, so always evaluated to true. This shouldn't be
causing any issues.

Change-Id: I7b8e407e688ba201c4f7d1a94bbd41af0918e7df

3b218fd0

Jun 21, 2019

CI - remove unnecessary logic when building images for upgrade · 8ce5ffd0

Radosław Piliszek authored 5 years ago


Docker registry being insecure is handled by docker_registry_insecure
which is set to true by default when docker_registry is set.
The removed code had no effect because docker_registry is not changed
anyway for base (pre-upgrade) install.

This change makes config more readable and also prevents a potential
conflict with the zun profile if ever used in upgrade mode.

Change-Id: I9b5ae8c5b534fa6cce9dbaca8af191e2ca79d19f
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

8ce5ffd0

Jun 16, 2019

Remove nova-consoleauth · 4e032923

Jeffrey Zhang authored 5 years ago

The nova-consoleauth service was deprecated during the Rocky release [1]
and has not been necessary since unless you're using cells v1. As Kolla
has never supported cells v1, which is finally being removed during
Train [2], we can get ahead of the curve and stop deploying
nova-consoleauth immediately.

[1] https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/convert-consoles-to-objects.html
[2] https://blueprints.launchpad.net/nova/+spec/remove-cells-v1/

Change-Id: I099080979f5497537e390f531005a517ab12aa7a

4e032923

Jun 11, 2019

Add CI job for ironic · 845040ad

Mark Goddard authored 6 years ago

Adds four new CI jobs for testing centos/ubuntu binary/source deploys
with ironic enabled. These are run only when there are changes to the
ironic role.

Performs some simple testing by creating a node using the fake-hardware
hardware type and creating a server.

Change-Id: Ie669e57ce2af53257b4ca05f45193cb73f48827a
Depends-On: https://review.opendev.org/664011

845040ad

Jun 07, 2019

Remove Neutron LBaaS support · f427920d

Carlos Goncalves authored 5 years ago

The project has been retired and there will be no Train release [1].
This patch removes Neutron LBaaS support in Kolla.

[1] https://review.opendev.org/#/c/658494/

Change-Id: Ic0d3da02b9556a34d8c27ca21a1ebb3af1f5d34c

f427920d

Add support for idempotent container stop and removal · d2ae42ce

Mark Goddard authored 5 years ago

This is useful when removing a container that is no longer supported.

Change-Id: I08d79ce7dd2f3d11e466930de85412017cd5f747

d2ae42ce

Jun 03, 2019

Test Ceph upgrade in CI · 78ee0287

Mark Goddard authored 5 years ago

Add CI jobs for testing an upgrade of a multinode system with Ceph
enabled. As for the existing upgrade job, we upgrade from the previous
release to the current release.

Change-Id: I931772ca4c63757769467a57c80dc0726a11167a
Depends-On: https://review.opendev.org/658163

78ee0287

May 31, 2019

Adds Qinling Ansible role · edb34898

Gaetan Trellu authored 5 years ago

Qinling is an OpenStack project to provide "Function as a Service".
This project aims to provide a platform to support serverless functions.

Change-Id: I239a0130f8c8b061b531dab530d65172b0914d7c
Implements: blueprint ansible-qinling-support
Story: 2005760
Task: 33468

edb34898

May 17, 2019

Fix keystone fernet key rotation scheduling · 6c1442c3

Mark Goddard authored 5 years ago

Right now every controller rotates fernet keys. This is nice because
should any controller die, we know the remaining ones will rotate the
keys. However, we are currently over-rotating the keys.

When we over rotate keys, we get logs like this:

 This is not a recognized Fernet token <token> TokenNotFound

Most clients can recover and get a new token, but some clients (like
Nova passing tokens to other services) can't do that because it doesn't
have the password to regenerate a new token.

With three controllers, in crontab in keystone-fernet we see the once a day
correctly staggered across the three controllers:

ssh ctrl1 sudo cat /etc/kolla/keystone-fernet/crontab
0 0 * * * /usr/bin/fernet-rotate.sh
ssh ctrl2 sudo cat /etc/kolla/keystone-fernet/crontab
0 8 * * * /usr/bin/fernet-rotate.sh
ssh ctrl3 sudo cat /etc/kolla/keystone-fernet/crontab
0 16 * * * /usr/bin/fernet-rotate.sh

Currently with three controllers we have this keystone config:

[token]
expiration = 86400 (although, keystone default is one hour)
allow_expired_window = 172800 (this is the keystone default)

[fernet_tokens]
max_active_keys = 4

Currently, kolla-ansible configures key rotation according to the following:

   rotation_interval = token_expiration / num_hosts

This means we rotate keys more quickly the more hosts we have, which doesn't
make much sense.

Keystone docs state:

   max_active_keys =
     ((token_expiration + allow_expired_window) / rotation_interval) + 2

For details see:
https://docs.openstack.org/keystone/stein/admin/fernet-token-faq.html

Rotation is based on pushing out a staging key, so should any server
start using that key, other servers will consider that valid. Then each
server in turn starts using the staging key, each in term demoting the
existing primary key to a secondary key. Eventually you prune the
secondary keys when there is no token in the wild that would need to be
decrypted using that key. So this all makes sense.

This change adds new variables for fernet_token_allow_expired_window and
fernet_key_rotation_interval, so that we can correctly calculate the
correct number of active keys. We now set the default rotation interval
so as to minimise the number of active keys to 3 - one primary, one
secondary, one buffer.

This change also fixes the fernet cron job generator, which was broken
in the following cases:

* requesting an interval of more than 1 day resulted in no jobs
* requesting an interval of more than 60 minutes, unless an exact
  multiple of 60 minutes, resulted in no jobs

It should now be possible to request any interval up to a week divided
by the number of hosts.

Change-Id: I10c82dc5f83653beb60ddb86d558c5602153341a
Closes-Bug: #1809469

6c1442c3

Add unit test for keystone fernet cron generator · 25ac955a

Mark Goddard authored 5 years ago

Before making changes to this script, document its behaviour with a unit
test.

There are two major issues:

* requesting an interval of more than 1 day results in no jobs
* requesting an interval of more than 60 minutes, unless an exact
  multiple of 60 minutes, results in no jobs

Change-Id: I655da1102dfb4ca12437b7db0b79c9a61568f79e
Related-Bug: #1809469

25ac955a

Apr 19, 2019

OpenDev Migration Patch · 92d8d22c

OpenDev Sysadmins authored 5 years ago

This commit was bulk generated and pushed by the OpenDev sysadmins
as a part of the Git hosting and code review systems migration
detailed in these mailing list posts:

http://lists.openstack.org/pipermail/openstack-discuss/2019-March/003603.html
http://lists.openstack.org/pipermail/openstack-discuss/2019-April/004920.html

Attempts have been made to correct repository namespaces and
hostnames based on simple pattern matching, but it's possible some
were updated incorrectly or missed entirely. Please reach out to us
via the contact information listed at https://opendev.org/ with any
questions you may have.

92d8d22c

Apr 14, 2019

Fix periodic CI jobs · 2b7a9dc2

Mark Goddard authored 5 years ago

Periodic jobs don't have zuul.change defined, since there is no change
being tested. This causes an early failure when referencing zuul.change
to set the image tag for built images. In periodic jobs we'll never need
to build images because there is no dependent kolla change under test.

Change-Id: I6d9d81cf17b7d0d7aaf87cd96418c904c46681f2

2b7a9dc2

Apr 10, 2019

Remove RabbitMQ support from Bifrost · 33564a00

Mark Goddard authored 5 years ago

During the Train cycle, Bifrost switched to using JSON-RPC by default
for Ironic's internal communication [1], avoiding the need to install
RabbitMQ. This simplifies things, so we may as well remove our custom
configuration of RabbitMQ.

[1] https://review.openstack.org/645093

Change-Id: I3107349530aa753d68fd59baaf13eb7dd5485ae6

33564a00

Apr 08, 2019

Do some Train TODOs · bb9d51e2

Mark Goddard authored 5 years ago

Make an early start on the TODOs for the Train cycle.

1. Remove the task that removes the vitrage_collector container, which
   was added in the Stein cycle to clean up this container which is no
   longer deployed.

2. Remove globals.yml configuration in CI to disable Heat for upgrade
   jobs. Heat is now enabled in the previous release (Stein).

3. Remove the deprecated variable cinder_iscsi_helper, which was renamed
   to cinder_target_helper in Stein.

Change-Id: I774bf395e0bdd4db9c20c6289a22cf059fa42e1a

bb9d51e2

Apr 03, 2019

Check configuration file permissions in CI · 8c4ab41f

Mark Goddard authored 6 years ago

Typically, non-executable files should have 660 or 600 and executable
files and directories should have 770. All should be owned by the
'config_owner_user' and 'config_owner_group' variables.

This change adds a script to check the owner and permissions of config
files under /etc/kolla, and runs it at the end of CI jobs.

Change-Id: Icdbabf36e284b9030017a0dc07b9dc81a37758ab
Related-Bug: #1821579

8c4ab41f

Mar 27, 2019

Test upgrades in CI · c23c9b2c

Mark Goddard authored 6 years ago

This patch adds two new jobs:

* kolla-ansible-centos-source-upgrade
* kolla-ansible-ubuntu-source-upgrade

These jobs first deploy a control plane using the previous release of
Kolla Ansible, then upgrade to the current release.

Because we can't change the branch of the git repository on the Zuul
executor, we change the branch of the kolla-ansible repository on the
primary node to the branch of the previous release, in this case
stable/rocky. A new remote-template role has been added that supports
generating templates using a remote template source, to generate config
files using the previous kolla-ansible branch.

If the change being tested depends on a kolla change for the current
branch, then we build images. Rather than using the current
kolla-ansible version to tag the images, we now tag them with
change_<gerrit change ID>. This is because the version of kolla-ansible
will change from the previous release to the current one as we upgrade
the system.

Finally, it should be noted that the 'previous_release' variable in the
Zuul config needs to be updated with each release, since this sets the
release of kolla-ansible that is installed initially.

Depends-On: https://review.openstack.org/645089/
Depends-On: https://review.openstack.org/644250/
Depends-On: https://review.openstack.org/645816/
Depends-On: https://review.openstack.org/645840/
Change-Id: If301e0affcd55360fefe3b105f023ae5c47b0853

c23c9b2c

Mar 21, 2019

Wait for cinder volume to become available in CI · e956cd87

Mark Goddard authored 6 years ago

Fixes a race condition where sometimes a volume would still be in the
'creating' state when trying to attach it to a server.

Invalid volume: Volume <id> status must be available or downloading to
reserve, but the current status is creating.

Change-Id: I0687ddfd78c384650cb361ff07aa64c5c3806a93

e956cd87