Commits · c6481469e66311681dcd61a15db5e8d8c0e7be6a · Very Demiurge Very Mindful / Kolla Ansible

Dec 10, 2019

CI: Test Zun with Cinder LVM backend (iSCSI) · c6481469

Hongbin Lu authored 5 years ago


Co-authored-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Depends-on: https://review.opendev.org/694476
Change-Id: I6e7f2f4229c8b579dcc17dacffeb74160875ae29

c6481469

Dec 09, 2019

CI: Use python 3 for local kolla-ansible execution · a5408f42

Mark Goddard authored 5 years ago

This change switches the CI jobs to use python 3 for local execution of
the kolla-ansible commands.

For upgrades, we use python 2 for the previous (Train) deploy, then
reinstall using python 3 for the (Ussuri) upgrade.

NOTE: This is separate from the python interpreter used on remote hosts,
which is configured via ansible_python_interpreter.

Partially Implements: blueprint python-3
Related: blueprint drop-py2-support

Change-Id: I5bdc165f68b7bde1f9ef30fe8216f2a44e6d4706

a5408f42

CI: Move ansible installation & configuration to Ansible · c320077f

Mark Goddard authored 5 years ago

Continue to reduce the scope of setup_gate.sh. Allows us to more easily
select python 2 or 3.

Change-Id: If2eeeacbbbdf58afb765b4a39772b5a1af7b952b
Partially Implements: blueprint python-3

c320077f

Dec 08, 2019

CI: Don't fail on expected critical log messages · 2f7640d3

Mark Goddard authored 5 years ago


There is a number of critical log messages that we see in CI from time
to time. While these should be fixed, let's not fail jobs unnecessarily.

This change introduces one expected critical message in
placement-api.log:

    Failed to fetch token data from identity server

Co-authored-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Related-bug: #1847727
Change-Id: I92ad0be70ed05925612f0c709907ab62280326b8

2f7640d3

Nov 28, 2019

Support configuration of Docker client timeout · 01050dc0

Mark Goddard authored 5 years ago

Adds support for configuration of the Docker client timeout via
'docker_client_timeout'.

This change also increases the default timeout to 120 seconds, as we
sometimes see timeouts in CI and heavily loaded or underpowered
environments. Increasing 'docker_client_timeout' further may be helpful
in cases where Docker reports 'Read timed out'.

Change-Id: I73745771078cb2c0ebae2b1d87ba2c4c12958d82
Closes-Bug: #1809844

01050dc0

Nov 26, 2019

CI: Refactor a lot · a2fc6841

Radosław Piliszek authored 5 years ago

Separate upgrade logic to is_upgrade job var and rename
scenarios to match.

Rename "ACTION" to "SCENARIO" (as it is a scenario).

Separate testing of dashboard (aka Horizon) and increase
its timeout to 5 minutes (CentOS 7 slow as always).

Separate initialization of core OpenStack.

Use gate setup script from ./tests/

Remove useless tox setupenv.

Do not deploy Heat when not really necessary.

Change-Id: I4fca319ccc3de7188f8b7b44c9c71321e3899467

a2fc6841

Nov 21, 2019

CI: Wait for Zun to delete the test container · a3c8a848

Radosław Piliszek authored 5 years ago

We fail randomly on check-failure.sh which checks for
containers being down.
Since we share Docker with Zun, the script sees Zun test container
and may fail when it is stopped but not yet removed.

Change-Id: If8b001f7507663e49e8e535f1889592e5f428ab5
Closes-bug: #1853452

a3c8a848

Nov 18, 2019

CI: Reconfigure deployment after checking health · 7758fe43
Radosław Piliszek authored 5 years ago
```
Change-Id: I27507816e3fe62df2a043dad96e7d1bb4b439869
Closes-bug: #1847331
```
7758fe43

Remove OpenDaylight role · eec6831f

Michal Nasiadka authored 5 years ago

Opendaylight support has been deprecated in Train - time to remove it.

Change-Id: I3a61bfbcbf366c327ea3e25d2424bc3fedca29f0

eec6831f

Nov 15, 2019
- CI: Add more service list checks to OS "smoke testing" · f59f6a61
  Radosław Piliszek authored 5 years ago
  
  Change-Id: Ie7ee70e19536c2d8f4300df55c9c6ca38abb7ae4
  f59f6a61
Nov 14, 2019

Attempt to pull image before stopping and removing container · 64d07c0b

Mark Goddard authored 5 years ago

* Deploy services using kolla-ansible deploy
* Reconfigure the image for one or more services to use an invalid
* config
* Deploy/reconfigure services using kolla-ansible reconfigure

The invalid config could be a wrong docker registry, wrong image name,
wrong tag, etc.

The restart handler for the service fails, and the old container is
left running.

The restart handler for the service fails, and the old container is
stopped and removed. This leaves the service in a broken state.

This change fixes the issue by pulling the image if necessary prior to
stopping and removing the container.

Change-Id: I85b2a1b224d4c4d85c32c4922a2cd2c41171a1dc
Closes-Bug: #1852572

64d07c0b

CI: Remove Stein upgrade support from CI · 6f876254

Mark Goddard authored 5 years ago

Resolves a number of TODOs in the CI configuration that provide support
for upgrading from the Stein release.

Change-Id: I9bac5c230b82ac7c097fe6ca2556e428abda31a1
Depends-On: https://review.opendev.org/694254

6f876254

Nov 07, 2019

CI: Add mariadb test · ed996ef9

Mark Goddard authored 5 years ago

Tests the following operations for MariaDB:

* Stop
* Recovery

Backup and restore will be added in a separate change.

Depends-On: https://review.opendev.org/693329
Change-Id: I836d91554715cce0e82c1bbebb7430c457418b2d

ed996ef9

CI: use become to install python3 · 16ff0ae1

Marcin Juszkiewicz authored 5 years ago

Commit 73b6a66f added installation of
Python 3 package. But without root permissions it fails.

Change-Id: I65ca794955a1b1419853bf63be36cb0d1f2d2345

16ff0ae1

CI: Remove unused tests/ansible_setup_ceph_disks.yml · 5a961f7e
Mark Goddard authored 5 years ago
```
Change-Id: I82fc125ad1d1a1e1de58da7eac3dc086c25155f6
```
5a961f7e

CI: Don't always build images for Debian · e18e40eb

Mark Goddard authored 5 years ago

Debian source images are published now.

Change-Id: Id611fd2fa71eb54ca08d1f68de9505d28ad4ea40

e18e40eb

CI: install Python 3 package · 73b6a66f

Marcin Juszkiewicz authored 5 years ago

Needed for https://review.opendev.org/#/c/691316

Change-Id: I64e250eb15882f50ecbbc57b87e036f5772f7e3a

73b6a66f

Oct 25, 2019

Disable cinder-backup in cinder-lvm scenario · df0c64ed

Michal Nasiadka authored 5 years ago

cinder-backup[1] does not include an lvm driver, we could use posix
filesystem driver - but it's not supported in kolla-ansible currently.

[1]: https://docs.openstack.org/cinder/rocky/drivers.html#backup-drivers

Change-Id: I847a55692a59c52990186332388f571a04c377b7
Closes-Bug: #1847049

df0c64ed

Oct 23, 2019

CI: run check-logs in post too · 22cea7f0

Radosław Piliszek authored 5 years ago

Also makes check-logs clear the aggregate files
to prevent duplicates due to upgrade and post.

Change-Id: I72377b6ac48e29dc5e24d1e3bd343e87b74a7a71

22cea7f0

Oct 20, 2019

[train] Finish configuring Zun to use Placement · 099a33c8

Radosław Piliszek authored 5 years ago

This also enables Placement when Zun is enabled like Kolla Ansible
already does with Nova.

Change-Id: Id2a09f702e8503b49d2b9e73e06b2ce9f4d168a9
Closes-bug: #1840573

099a33c8

Oct 16, 2019

Support multiple nova cells · 78a828ef

Doug Szumski authored 5 years ago


This patch adds initial support for deploying multiple Nova cells.

Splitting a nova-cell role out from the Nova role allows a more granular
approach to deploying and configuring Nova services.

A new enable_cells flag has been added that enables the support of
multiple cells via the introduction of a super conductor in addition to
cell-specific conductors. When this flag is not set (the default), nova
is configured in the same manner as before - with a single conductor.

The nova role now deploys the global services:

* nova-api
* nova-scheduler
* nova-super-conductor (if enable_cells is true)

The nova-cell role handles services specific to a cell:

* nova-compute
* nova-compute-ironic
* nova-conductor
* nova-libvirt
* nova-novncproxy
* nova-serialproxy
* nova-spicehtml5proxy
* nova-ssh

This patch does not support using a single cell controller for managing
more than one cell. Support for sharing a cell controller will be added
in a future patch.

This patch should be backwards compatible and is tested by existing CI
jobs. A new CI job has been added that tests a multi-cell environment.

ceph-mon has been removed from the play hosts list as it is not
necessary - delegate_to does not require the host to be in the play.

Documentation will be added in a separate patch.

Partially Implements: blueprint support-nova-cells
Co-Authored-By: Mark Goddard <mark@stackhpc.com>
Change-Id: I810aad7d49db3f5a7fd9a2f0f746fd912fe03917

78a828ef

Implement IPv6 support in the control plane · bc053c09

Radosław Piliszek authored 5 years ago

Introduce kolla_address filter.
Introduce put_address_in_context filter.

Add AF config to vars.

Address contexts:
- raw (default): <ADDR>
- memcache: inet6:[<ADDR>]
- url: [<ADDR>]

Other changes:

globals.yml - mention just IP in comment

prechecks/port_checks (api_intf) - kolla_address handles validation

3x interface conditional (swift configs: replication/storage)

2x interface variable definition with hostname
(haproxy listens; api intf)

1x interface variable definition with hostname with bifrost exclusion
(baremetal pre-install /etc/hosts; api intf)

neutron's ml2 'overlay_ip_version' set to 6 for IPv6 on tunnel network

basic multinode source CI job for IPv6

prechecks for rabbitmq and qdrouterd use proper NSS database now

MariaDB Galera Cluster WSREP SST mariabackup workaround
(socat and IPv6)

Ceph naming workaround in CI
TODO: probably needs documenting

RabbitMQ IPv6-only proto_dist

Ceph ms switch to IPv6 mode

Remove neutron-server ml2_type_vxlan/vxlan_group setting
as it is not used (let's avoid any confusion)
and could break setups without proper multicast routing
if it started working (also IPv4-only)

haproxy upgrade checks for slaves based on ipv6 addresses

TODO:

ovs-dpdk grabs ipv4 network address (w/ prefix len / submask)
not supported, invalid by default because neutron_external has no address
No idea whether ovs-dpdk works at all atm.

ml2 for xenapi
Xen is not supported too well.
This would require working with XenAPI facts.

rp_filter setting
This would require meddling with ip6tables (there is no sysctl param).
By default nothing is dropped.
Unlikely we really need it.

ironic dnsmasq is configured IPv4-only
dnsmasq needs DHCPv6 options and testing in vivo.

KNOWN ISSUES (beyond us):

One cannot use IPv6 address to reference the image for docker like we
currently do, see: https://github.com/moby/moby/issues/39033
(docker_registry; docker API 400 - invalid reference format)
workaround: use hostname/FQDN

RabbitMQ may fail to bind to IPv6 if hostname resolves also to IPv4.
This is due to old RabbitMQ versions available in images.
IPv4 is preferred by default and may fail in the IPv6-only scenario.
This should be no problem in real life as IPv6-only is indeed IPv6-only.
Also, when new RabbitMQ (3.7.16/3.8+) makes it into images, this will
no longer be relevant as we supply all the necessary config.
See: https://github.com/rabbitmq/rabbitmq-server/pull/1982

For reliable runs, at least Ansible 2.8 is required (2.8.5 confirmed
to work well). Older Ansible versions are known to miss IPv6 addresses
in interface facts. This may affect redeploys, reconfigures and
upgrades which run after VIP address is assigned.
See: https://github.com/ansible/ansible/issues/63227

Bifrost Train does not support IPv6 deployments.
See: https://storyboard.openstack.org/#!/story/2006689



Change-Id: Ia34e6916ea4f99e9522cd2ddde03a0a4776f7e2c
Implements: blueprint ipv6-control-plane
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

bc053c09

Oct 15, 2019

Fix CI failures · e3e5f7f2

Mark Goddard authored 5 years ago

1. Fix yamllint errors in .yamllint file(!)

YAML lint is currently failling on its own configuration file,
.yamllint. This change fixes the issues.

2. Run bindep role in Zuul jobs

This fixes an issue where libffi is not available.

Change-Id: Ic08a8e53a6905a68f0fe26d4b28184e62a64324f

e3e5f7f2

Oct 07, 2019

CI: Use any_errors_fatal in pre.yml and run.yml · fac16704

Mark Goddard authored 5 years ago

This ensures that failure of a single host fails the whole play at that
task. This can avoid confusing errors such as when the task
"Assert that the nodepool private IPv4 address is assigned" fails on one
host, causing subsequent errors on other hosts.

Note that this only affects the Zuul playbooks, not Kolla Ansible's
playbooks.

Change-Id: I77a6534dd2ddd188f795e17d17a44be249d01f31

fac16704

Oct 04, 2019
- Add Debian/source CI job · b9d6cc2a
  Marcin Juszkiewicz authored 5 years ago
  
  Change-Id: I0628b16e3ebdb3fa8196acdc1bd9c63e75bcfb09
  b9d6cc2a
Oct 01, 2019

CI: Remove multinode non-ceph glance exception · a2159e2b

Radosław Piliszek authored 5 years ago

This is not required since enabling HAProxy over VXLAN [1].

[1] https://review.opendev.org/670690



Change-Id: I239a7c60d6ae0c80640ff10209a80c7a9ca74cd6
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

a2159e2b

Sep 23, 2019

CI: Reinstate use of Docker registry mirror · 5c9a7983

Mark Goddard authored 5 years ago

After modernising docker configuration
(I1215e04ec15b01c0b43bac8c0e81293f6724f278), we lost our
registry-mirrors configuration in CI that lets us use a mirror of
Dockerhub.

This change uses the new docker_custom_config variable to configure the
registry mirror.

Change-Id: I1430413c12e9d0b59e4f216ff66372de0f3a4f21

5c9a7983

Sep 20, 2019

CI: Fix check-logs.sh · bfd8ee19

Mark Goddard authored 5 years ago

This script has a few issues:

* It catches false positives, due to log levels in config options.

* It doesn't fail on CRITICAL logs, due to variable reset issue.

This change fixes these.

Change-Id: I50c859eb2991e498eeb64bca45daf1e6f237761f

bfd8ee19

CI: collect more system configs (name resolution) · e7d5c584

Radosław Piliszek authored 5 years ago


This patch adds configs relevant to name resolution.

Change-Id: I7ebc2409e9ec0bd875abf0bf4e452bc89efe940d
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

e7d5c584

Sep 19, 2019

CI: Use VXLAN overlay network · 8e406291

Mark Goddard authored 5 years ago


VXLAN is necessary to run HA in CI (due to floating VIP
address handled by keepalived).
It also turned out to be required to have private
IPv6 address assignments.
This patch is based on linux bridge rather than OVS
to avoid problems with OVS deployed in containers.

This patch enables haproxy in multinode jobs.

Includes saving of linux networking details.

Makes DASHBOARD_URL agree with OS_AUTH_URL - properly uses the
pre-upgrade value for testing.

Co-authored-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Depends-on: https://review.opendev.org/683068
Depends-on: https://review.opendev.org/682957
Change-Id: I66888712da80c3d6f84ee4949762961664d3adea

8e406291

Sep 18, 2019

CI: Configure the upgrade jobs from the current branch · e2f511b7

Radosław Piliszek authored 5 years ago


This lets us control the upgrade process entirely from the
current branch.

Change-Id: Ic8c39e415846596c23dae93c2839375a24e8b888
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

e2f511b7

Adding Prometheus blackbox exporter · b22375eb

Scott Solkhon authored 5 years ago

This commit follows up the work in Kolla to provide deploy and configure the
Prometheus blackbox exporter.

An example blackbox-exporter module has been added (disabled by default)
called os_endpoint. This allows for the probing of endpoints over HTTP
and HTTPS. This can be used to monitor that OpenStack endpoints return a status
code of either 200 or 300, and the word 'versions' in the payload.

This change introduces a new variable `prometheus_blackbox_exporter_endpoints`.
Currently no defaults are specified because the configuration is heavily
dependent on the deployment.

Co-authored-by: Jack Heskett <Jack.Heskett@gresearch.co.uk>
Change-Id: I36ad4961078d90e2fd70c9a3368f5157d6fd89cd

b22375eb

Sep 16, 2019

Catch errors and changes in kolla_toolbox module · 70b515bf

Mark Goddard authored 5 years ago

The kolla_toolbox Ansible module executes as-hoc ansible commands in the
kolla_toolbox container, and parses the output to make it look as if
ansible-playbook executed the command. Currently however, this module
sometimes fails to catch failures of the underlying command, and also
sometimes shows tasks as 'ok' when the underlying command was changed.
This has been tested both before and after the upgrade to ansible 2.8.

This change fixes this issue by configuring ansible to emit output in
JSON format, to make parsing simpler. We can now pick up errors and
changes, and signal them to the caller.

This change also adds an ansible playbook, tests/test-kolla-toolbox.yml,
that can be executed to test the module. It's not currently integrated
with any CI jobs.

Note that this change cannot be backported as the JSON output callback
plugin was added in Ansible 2.5.

Change-Id: I8236dd4165f760c819ca972b75cbebc62015fada
Closes-Bug: #1844114

70b515bf

Add custom filters for checking services · af2e7fd7

Mark Goddard authored 6 years ago

These filters can be used to capture a lot of the logic that we
currently have in 'when' statements, about which services are enabled
for a particular host.

In order to use these filters, it is necessary to install the
kolla_ansible python module, and not just the dependencies listed in
requirements.txt. The CI test and quickstart install from source
documentation has been updated accordingly.

Ansible is not currently in OpenStack global requirements, so for unit
tests we avoid a direct dependency on Ansible and provide fakes where
necessary.

Change-Id: Ib91cac3c28e2b5a834c9746b1d2236a309529556

af2e7fd7

Sep 14, 2019

CI: Test accessing dashboard · 8722c787

Mark Goddard authored 5 years ago

Also slightly refactor test-openstack.sh script.

Change-Id: I7f10f073e89d2b66367bbb700201b3cd412fc433
Depends-On: https://review.opendev.org/#/c/674241
Depends-On: https://review.opendev.org/#/c/668410
Depends-On: https://review.opendev.org/#/c/668409

8722c787

Sep 10, 2019

Configure Zun for Placement (Train+) · 0f5e0658

Hongbin Lu authored 5 years ago

After the integration with placement [1], we need to configure how
zun-compute is going to work with nova-compute.

* If zun-compute and nova-compute run on the same compute node,
  we need to set 'host_shared_with_nova' as true so that Zun
  will use the resource provider (compute node) created by nova.
  In this mode, containers and VMs could claim allocations against
  the same resource provider.
* If zun-compute runs on a node without nova-compute, no extra
  configuration is needed. By default, each zun-compute will create
  a resource provider in placement to represent the compute node
  it manages.

[1] https://blueprints.launchpad.net/zun/+spec/use-placement-resource-management

Change-Id: I2d85911c4504e541d2994ce3d48e2fbb1090b813

0f5e0658

Sep 05, 2019

Modernize the way of configuring Docker daemon · a5808ad8

Marcin Juszkiewicz authored 5 years ago

Instead of changing Docker daemon command line let's change config
for Docker instead. In /etc/docker/daemon.json file as it should be.

Custom Docker options can be set with 'docker_custom_config' variable.

Old 'docker_custom_option' is still present but should be avoided.

Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Change-Id: I1215e04ec15b01c0b43bac8c0e81293f6724f278

a5808ad8

Aug 22, 2019

Use fluentd image labels · 4180bee0

Michal Nasiadka authored 5 years ago

In order to orchestrate smooth transition to fluentd 0.14.x
aka 1.0 stable branch aka td-agent 3
from td-agent repository - use image labels (fluentd_version
and fluentd_binary).

Depends-On: https://review.opendev.org/676411
Change-Id: Iab8518c34ef876056c6abcdb5f2e9fc9f1f7dbdd

4180bee0

Aug 16, 2019

Check for CRITICAL, WARNING and ERROR log messages in CI · a14eee24
Mark Goddard authored 5 years ago
```
At the end of a CI run, check all log files.

Change-Id: I99afc1c5207757e35beabf7daebd86c56151c96d
```
a14eee24

CI: Zun jobs · d4de1d75

Radosław Piliszek authored 5 years ago

- Test Zun on CentOS too
- Make etcd change also trigger Zun jobs (like kuryr and zun)
- Test multinode Zun deployments instead of AIO
  (more likely to break)
- In Zun scenario, stop configuring docker for legacy swarm mode
  (Zun is no swarm)
- Separate test-zun.sh testing script
- Show appcontainer to see which node it has been started on

Change-Id: I289b1009fe00aedb9b78cbd83298b14da5fd9670
Depends-On: https://review.opendev.org/676736


Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>

d4de1d75