Commits · bb7fc2e5b1120d0b8e72843b0dc6d010e93c6b09 · Very Demiurge Very Mindful / Kolla Ansible

Sep 20, 2019

CI: collect more system configs (name resolution) · e7d5c584


This patch adds configs relevant to name resolution.

Change-Id: I7ebc2409e9ec0bd875abf0bf4e452bc89efe940d
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

e7d5c584

Sep 19, 2019

CI: Use VXLAN overlay network · 8e406291

Mark Goddard authored 5 years ago


VXLAN is necessary to run HA in CI (due to floating VIP
address handled by keepalived).
It also turned out to be required to have private
IPv6 address assignments.
This patch is based on linux bridge rather than OVS
to avoid problems with OVS deployed in containers.

This patch enables haproxy in multinode jobs.

Includes saving of linux networking details.

Makes DASHBOARD_URL agree with OS_AUTH_URL - properly uses the
pre-upgrade value for testing.

Co-authored-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Depends-on: https://review.opendev.org/683068
Depends-on: https://review.opendev.org/682957
Change-Id: I66888712da80c3d6f84ee4949762961664d3adea

8e406291

Sep 18, 2019

CI: Configure the upgrade jobs from the current branch · e2f511b7

Radosław Piliszek authored 5 years ago


This lets us control the upgrade process entirely from the
current branch.

Change-Id: Ic8c39e415846596c23dae93c2839375a24e8b888
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

e2f511b7

Adding Prometheus blackbox exporter · b22375eb

Scott Solkhon authored 5 years ago

This commit follows up the work in Kolla to provide deploy and configure the
Prometheus blackbox exporter.

An example blackbox-exporter module has been added (disabled by default)
called os_endpoint. This allows for the probing of endpoints over HTTP
and HTTPS. This can be used to monitor that OpenStack endpoints return a status
code of either 200 or 300, and the word 'versions' in the payload.

This change introduces a new variable `prometheus_blackbox_exporter_endpoints`.
Currently no defaults are specified because the configuration is heavily
dependent on the deployment.

Co-authored-by: Jack Heskett <Jack.Heskett@gresearch.co.uk>
Change-Id: I36ad4961078d90e2fd70c9a3368f5157d6fd89cd

b22375eb

Sep 14, 2019

CI: Test accessing dashboard · 8722c787

Mark Goddard authored 5 years ago

Also slightly refactor test-openstack.sh script.

Change-Id: I7f10f073e89d2b66367bbb700201b3cd412fc433
Depends-On: https://review.opendev.org/#/c/674241
Depends-On: https://review.opendev.org/#/c/668410
Depends-On: https://review.opendev.org/#/c/668409

8722c787

Sep 10, 2019

Configure Zun for Placement (Train+) · 0f5e0658

Hongbin Lu authored 5 years ago

After the integration with placement [1], we need to configure how
zun-compute is going to work with nova-compute.

* If zun-compute and nova-compute run on the same compute node,
  we need to set 'host_shared_with_nova' as true so that Zun
  will use the resource provider (compute node) created by nova.
  In this mode, containers and VMs could claim allocations against
  the same resource provider.
* If zun-compute runs on a node without nova-compute, no extra
  configuration is needed. By default, each zun-compute will create
  a resource provider in placement to represent the compute node
  it manages.

[1] https://blueprints.launchpad.net/zun/+spec/use-placement-resource-management

Change-Id: I2d85911c4504e541d2994ce3d48e2fbb1090b813

0f5e0658

Sep 05, 2019

Modernize the way of configuring Docker daemon · a5808ad8

Marcin Juszkiewicz authored 5 years ago

Instead of changing Docker daemon command line let's change config
for Docker instead. In /etc/docker/daemon.json file as it should be.

Custom Docker options can be set with 'docker_custom_config' variable.

Old 'docker_custom_option' is still present but should be avoided.

Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Change-Id: I1215e04ec15b01c0b43bac8c0e81293f6724f278

a5808ad8

Aug 22, 2019

Use fluentd image labels · 4180bee0

Michal Nasiadka authored 5 years ago

In order to orchestrate smooth transition to fluentd 0.14.x
aka 1.0 stable branch aka td-agent 3
from td-agent repository - use image labels (fluentd_version
and fluentd_binary).

Depends-On: https://review.opendev.org/676411
Change-Id: Iab8518c34ef876056c6abcdb5f2e9fc9f1f7dbdd

4180bee0

Aug 16, 2019

Check for CRITICAL, WARNING and ERROR log messages in CI · a14eee24
Mark Goddard authored 5 years ago
```
At the end of a CI run, check all log files.

Change-Id: I99afc1c5207757e35beabf7daebd86c56151c96d
```
a14eee24

CI: Zun jobs · d4de1d75

Radosław Piliszek authored 5 years ago

- Test Zun on CentOS too
- Make etcd change also trigger Zun jobs (like kuryr and zun)
- Test multinode Zun deployments instead of AIO
  (more likely to break)
- In Zun scenario, stop configuring docker for legacy swarm mode
  (Zun is no swarm)
- Separate test-zun.sh testing script
- Show appcontainer to see which node it has been started on

Change-Id: I289b1009fe00aedb9b78cbd83298b14da5fd9670
Depends-On: https://review.opendev.org/676736


Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

d4de1d75

CI: Add docker inspect output to docker_info logs · 8cf24bcc
Michal Nasiadka authored 5 years ago
```
Change-Id: I081f2f4762651bca935f08a67b20f21946aaf051
```
8cf24bcc

Aug 14, 2019

Testing Masakari role in gate · fbac54c5

Kien Nguyen authored 6 years ago

Add Masakari testing into the Gate.

Change-Id: I52df33f963e7d2ae4059887df3d24d9e6642134e
Depends-On: https://review.opendev.org/#/c/615469/
Depends-On: https://review.opendev.org/#/c/615715


Implements: blueprint ansible-masakari
Co-Authored-By: Gaëtan Trellu <gaetan.trellu@incloudus.com>

fbac54c5

Aug 06, 2019

CI: Sanity check that nodepool.private_ipv4 is assigned · eac1e479

Mark Goddard authored 5 years ago

During the MariaDB testing we saw a number of cases where this IP
address was not assigned to one or more hosts, which caused various
issues later on.

Change-Id: I61b54483e4553b926e9ddc0a8848b2daa6bc49f1

eac1e479

Aug 05, 2019

ceph: fixes to deployment and upgrade · 826f6850

Radosław Piliszek authored 5 years ago

1) ceph-nfs (ganesha-ceph) - use NFSv4 only
This is recommended upstream.
v3 and UDP require portmapper (aka rpcbind) which we
do not want, except where Ubuntu ganesha version (2.6)
forces it by requiring enabled UDP, see [1].
The issue has been fixed in 2.8, included in CentOS.
Additionally disable v3 helper protocols and kerberos
to avoid meaningless warnings.

2) ceph-nfs (ganesha-ceph) - do not export host dbus
It is not in use. This avoids the temptation to try
handling it on host.

3) Properly handle ceph services deploy and upgrade
Upgrade runs deploy.
The order has been corrected - nfs goes after mds.
Additionally upgrade takes care of rgw for keystone
(for swift emulation).

4) Enhance ceph keyring module with error detection
Now it does not blindly try to create a keyring after
any failure. This used to hide real issue.

5) Retry ceph admin keyring update until cluster works
Reordering deployment caused issue with ceph cluster not being
fully operational before taking actions on it.

6) CI: Remove osd df from collected logs as it may hang CI
Hangs are caused by healthy MON and no healthy MGR.
A descriptive note is left in its place.

7) CI: Add 5s timeout to ceph informational commands
This decreases the timeout from the default 300s.

[1] https://review.opendev.org/669315



Change-Id: I1cf0ad10b80552f503898e723f0c4bd00a38f143
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

826f6850

Jul 26, 2019

CI: Fix multinode job glance issues · d0317260

Radosław Piliszek authored 5 years ago


This actually replaces two ad-hoc fixes with a more unified
solution (with comment for posterity).

Change-Id: I62f57cb489c900f68a0c7aeb3e20e4715c0e2661
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

d0317260

CI: fix checks for upgrade and multinode jobs · 93ac16ae

Radosław Piliszek authored 5 years ago


Multinode jobs did not run sanity checks for all the hosts,
only primary. Now they check all.

Additionally upgrades are now checked using the proper
(pre-upgrade) scripts (not that it matters too much as they
are the same atm) and both checks are done, not only failures,
but also config.

Change-Id: I10552e256edbddd5b1f8a8a7f8805262e72ce8d8
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

93ac16ae

Jul 18, 2019

Fix handling of docker restart policy · 6a737b19

Radosław Piliszek authored 5 years ago

Docker has no restart policy named 'never'. It has 'no'.
This has bitten us already (see [1]) and might bite us again whenever
we want to change the restart policy to 'no'.

This patch makes our docker integration honor all valid restart policies
and only valid restart policies.
All relevant docker restart policy usages are patched as well.

I added some FIXMEs around which are relevant to kolla-ansible docker
integration. They are not fixed in here to not alter behavior.

[1] https://review.opendev.org/667363



Change-Id: I1c9764fb9bbda08a71186091aced67433ad4e3d6
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

6a737b19

Jul 16, 2019

CI: clean up requirements installation · 8a543098

Radosław Piliszek authored 5 years ago


We install kolla-ansible requirements in Zuul's Ansible playbooks.
This patch cleans up the installation in scripts so that they are
only concerned with auxiliary requirements:
- ansible (since we do not track it in requirements)
- ara (for log summaries)
- openstack clients (for first init and tests after deployment)

Additionally this patch installs openstack clients in a separate
virtualenv.
Note that all kolla-ansible requirements, ansible and ara are still
installed system-wide.

Change-Id: Iac04082ad39a9d823c515ba11c5db9af50ed225f
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

8a543098

Add ceph-mds/rgw/nfs to gate · a77b0f62

Michal Nasiadka authored 5 years ago

Depends-On: https://review.opendev.org/669315
Change-Id: I6946290cd890f74c59ed5394e8382a8b75c0c4cd

a77b0f62

Jul 09, 2019

Trivial fix: log stderr of init-runonce as well · 53ea3fe4

Radosław Piliszek authored 5 years ago


Missed by me in a recent merge.

TrivialFix
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

Change-Id: I83b1e84a43f014ce20be8677868be3f66017e3c2

53ea3fe4

Jul 04, 2019

CI: Pull images before upgrade · f11d3c69

Mark Goddard authored 5 years ago

This is the documented procedure.

Change-Id: I09ca99e92b112621d66b564a88b13658632242f5

f11d3c69

Jul 03, 2019

CI: Collect docker and systemd configs · 2430c290

Radosław Piliszek authored 5 years ago


Change-Id: I59a05e8a0a2656596d2cced61bd98f2aa790d60b
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

2430c290

Jul 02, 2019

CI: Keep stderr in ansible logs · b9aa8b38

Radosław Piliszek authored 5 years ago


Otherwise ara had only the stderr part and logs only the
stdout part which made ordered analysis harder.

Additionally add -vvv for the bootstrap-servers run.

Change-Id: Ia42ac9b90a17245e9df277c40bda24308ebcd11d
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

b9aa8b38

Jul 01, 2019

CI: Use template-overrides.j2 from kolla · 20ab480c

Radosław Piliszek authored 5 years ago

Some kolla-ansible jobs failed due to using external mirrors
instead of local ones.
This was due to not using the template override provided by kolla.
This patch fixes that.

Depends-On: https://review.opendev.org/668226


Change-Id: I27f714fdf05e521aa8ce25c5683a452ceb35eeb8
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

20ab480c

Add note to CI config regarding registry during upgrade · a0bdc366

Radosław Piliszek authored 5 years ago


Change-Id: Ifc898015b9b523ef4c50fc969e464f05762f2151
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

a0bdc366

Revert "CI - remove unnecessary logic when building images for upgrade" · acac1279
Mark Goddard authored 5 years ago
```
This reverts commit 8ce5ffd0.

Change-Id: I81ce7c007ff267ebbbb721bcdb7eebc0dd575bf8
```
acac1279

Jun 28, 2019

Exit on failure in init-runonce · bc08b44f

Mark Goddard authored 5 years ago

Previously we sourced this script in tests/deploy.sh, but this was
recently changed. Following that change we lost the errexit setting,
meaning we ignore errors in init-runonce.

Adding errexit in the script itself means that all callers get error
handling.

Also log init-runonce output.

TrivialFix

Change-Id: I9b35bd5f0f76eec26ddd968d093a3a5fd55a7ce2

bc08b44f

Jun 27, 2019

Fix conditionals in CI playbook · 3b218fd0

Mark Goddard authored 6 years ago

These were not templated, so always evaluated to true. This shouldn't be
causing any issues.

Change-Id: I7b8e407e688ba201c4f7d1a94bbd41af0918e7df

3b218fd0

Jun 21, 2019

CI - remove unnecessary logic when building images for upgrade · 8ce5ffd0

Radosław Piliszek authored 5 years ago


Docker registry being insecure is handled by docker_registry_insecure
which is set to true by default when docker_registry is set.
The removed code had no effect because docker_registry is not changed
anyway for base (pre-upgrade) install.

This change makes config more readable and also prevents a potential
conflict with the zun profile if ever used in upgrade mode.

Change-Id: I9b5ae8c5b534fa6cce9dbaca8af191e2ca79d19f
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

8ce5ffd0

Jun 16, 2019

Remove nova-consoleauth · 4e032923

Jeffrey Zhang authored 5 years ago

The nova-consoleauth service was deprecated during the Rocky release [1]
and has not been necessary since unless you're using cells v1. As Kolla
has never supported cells v1, which is finally being removed during
Train [2], we can get ahead of the curve and stop deploying
nova-consoleauth immediately.

[1] https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/convert-consoles-to-objects.html
[2] https://blueprints.launchpad.net/nova/+spec/remove-cells-v1/

Change-Id: I099080979f5497537e390f531005a517ab12aa7a

4e032923

Jun 11, 2019

Add CI job for ironic · 845040ad

Mark Goddard authored 6 years ago

Adds four new CI jobs for testing centos/ubuntu binary/source deploys
with ironic enabled. These are run only when there are changes to the
ironic role.

Performs some simple testing by creating a node using the fake-hardware
hardware type and creating a server.

Change-Id: Ie669e57ce2af53257b4ca05f45193cb73f48827a
Depends-On: https://review.opendev.org/664011

845040ad

Jun 07, 2019

Remove Neutron LBaaS support · f427920d

Carlos Goncalves authored 5 years ago

The project has been retired and there will be no Train release [1].
This patch removes Neutron LBaaS support in Kolla.

[1] https://review.opendev.org/#/c/658494/

Change-Id: Ic0d3da02b9556a34d8c27ca21a1ebb3af1f5d34c

f427920d

Add support for idempotent container stop and removal · d2ae42ce

Mark Goddard authored 5 years ago

This is useful when removing a container that is no longer supported.

Change-Id: I08d79ce7dd2f3d11e466930de85412017cd5f747

d2ae42ce

Jun 03, 2019

Test Ceph upgrade in CI · 78ee0287

Mark Goddard authored 5 years ago

Add CI jobs for testing an upgrade of a multinode system with Ceph
enabled. As for the existing upgrade job, we upgrade from the previous
release to the current release.

Change-Id: I931772ca4c63757769467a57c80dc0726a11167a
Depends-On: https://review.opendev.org/658163

78ee0287

May 31, 2019

Adds Qinling Ansible role · edb34898

Gaetan Trellu authored 5 years ago

Qinling is an OpenStack project to provide "Function as a Service".
This project aims to provide a platform to support serverless functions.

Change-Id: I239a0130f8c8b061b531dab530d65172b0914d7c
Implements: blueprint ansible-qinling-support
Story: 2005760
Task: 33468

edb34898

May 17, 2019

Fix keystone fernet key rotation scheduling · 6c1442c3

Mark Goddard authored 5 years ago

Right now every controller rotates fernet keys. This is nice because
should any controller die, we know the remaining ones will rotate the
keys. However, we are currently over-rotating the keys.

When we over rotate keys, we get logs like this:

 This is not a recognized Fernet token <token> TokenNotFound

Most clients can recover and get a new token, but some clients (like
Nova passing tokens to other services) can't do that because it doesn't
have the password to regenerate a new token.

With three controllers, in crontab in keystone-fernet we see the once a day
correctly staggered across the three controllers:

ssh ctrl1 sudo cat /etc/kolla/keystone-fernet/crontab
0 0 * * * /usr/bin/fernet-rotate.sh
ssh ctrl2 sudo cat /etc/kolla/keystone-fernet/crontab
0 8 * * * /usr/bin/fernet-rotate.sh
ssh ctrl3 sudo cat /etc/kolla/keystone-fernet/crontab
0 16 * * * /usr/bin/fernet-rotate.sh

Currently with three controllers we have this keystone config:

[token]
expiration = 86400 (although, keystone default is one hour)
allow_expired_window = 172800 (this is the keystone default)

[fernet_tokens]
max_active_keys = 4

Currently, kolla-ansible configures key rotation according to the following:

   rotation_interval = token_expiration / num_hosts

This means we rotate keys more quickly the more hosts we have, which doesn't
make much sense.

Keystone docs state:

   max_active_keys =
     ((token_expiration + allow_expired_window) / rotation_interval) + 2

For details see:
https://docs.openstack.org/keystone/stein/admin/fernet-token-faq.html

Rotation is based on pushing out a staging key, so should any server
start using that key, other servers will consider that valid. Then each
server in turn starts using the staging key, each in term demoting the
existing primary key to a secondary key. Eventually you prune the
secondary keys when there is no token in the wild that would need to be
decrypted using that key. So this all makes sense.

This change adds new variables for fernet_token_allow_expired_window and
fernet_key_rotation_interval, so that we can correctly calculate the
correct number of active keys. We now set the default rotation interval
so as to minimise the number of active keys to 3 - one primary, one
secondary, one buffer.

This change also fixes the fernet cron job generator, which was broken
in the following cases:

* requesting an interval of more than 1 day resulted in no jobs
* requesting an interval of more than 60 minutes, unless an exact
  multiple of 60 minutes, resulted in no jobs

It should now be possible to request any interval up to a week divided
by the number of hosts.

Change-Id: I10c82dc5f83653beb60ddb86d558c5602153341a
Closes-Bug: #1809469

6c1442c3

Add unit test for keystone fernet cron generator · 25ac955a

Mark Goddard authored 5 years ago

Before making changes to this script, document its behaviour with a unit
test.

There are two major issues:

* requesting an interval of more than 1 day results in no jobs
* requesting an interval of more than 60 minutes, unless an exact
  multiple of 60 minutes, results in no jobs

Change-Id: I655da1102dfb4ca12437b7db0b79c9a61568f79e
Related-Bug: #1809469

25ac955a

Apr 19, 2019

OpenDev Migration Patch · 92d8d22c

OpenDev Sysadmins authored 5 years ago

This commit was bulk generated and pushed by the OpenDev sysadmins
as a part of the Git hosting and code review systems migration
detailed in these mailing list posts:

http://lists.openstack.org/pipermail/openstack-discuss/2019-March/003603.html
http://lists.openstack.org/pipermail/openstack-discuss/2019-April/004920.html

Attempts have been made to correct repository namespaces and
hostnames based on simple pattern matching, but it's possible some
were updated incorrectly or missed entirely. Please reach out to us
via the contact information listed at https://opendev.org/ with any
questions you may have.

92d8d22c

Apr 14, 2019

Fix periodic CI jobs · 2b7a9dc2

Mark Goddard authored 5 years ago

Periodic jobs don't have zuul.change defined, since there is no change
being tested. This causes an early failure when referencing zuul.change
to set the image tag for built images. In periodic jobs we'll never need
to build images because there is no dependent kolla change under test.

Change-Id: I6d9d81cf17b7d0d7aaf87cd96418c904c46681f2

2b7a9dc2

Apr 10, 2019

Remove RabbitMQ support from Bifrost · 33564a00

Mark Goddard authored 5 years ago

During the Train cycle, Bifrost switched to using JSON-RPC by default
for Ironic's internal communication [1], avoiding the need to install
RabbitMQ. This simplifies things, so we may as well remove our custom
configuration of RabbitMQ.

[1] https://review.openstack.org/645093

Change-Id: I3107349530aa753d68fd59baaf13eb7dd5485ae6

33564a00