- Jul 23, 2019
-
-
Jeffrey Zhang authored
Neutron FWaaS v1 is deprecated and removed since stein cycle by [0]. So remove related options in kolla. [0] https://review.opendev.org/616410 Change-Id: Ia03e7979dd48bafb34c11edd08c2a2a87b949e0e
-
- Jul 18, 2019
-
-
Jason authored
Most other services already gate the DB bootstrap operations with the 'use_preconfigured_databases' variable; Blazar did not. Change-Id: I772b1cb92612c7e6936f052ed9947f93582f264c
-
Mark Goddard authored
Change https://review.opendev.org/#/c/670247/ attempted to fix glance deployment with the file backend. However, it added a new bug by being more strict about only generating configuration where the container will be deployed. This means that the current method of running the glance bootstrap container on any host in glance-api group could be broken, since it needs the container configuration. This change only runs the bootstrap container on hosts in the glance_api_hosts list, which in the case of the file backend typically only contains one host. This change also fixes up some logic during rolling upgrade, where we might not generate new configuration for the bootstrap host. Change-Id: I83547cd83b06ddefb3a9e1f39844537bdb32bd7f Related-Bug: #1836151
-
- Jul 16, 2019
-
-
Michal Nasiadka authored
* Ubuntu ships with nfs-ganesha 2.6.0, which requires to do an rpcbind udp test on startup (was fixed later) * Add rpcbind package to be installed by kolla-ansible bootstrap when ceph_nfs is enabled * Update Ceph deployment docs with a note Change-Id: Ic19264191a0ed418fa959fdc122cef543446fbe5
-
- Jul 12, 2019
-
-
Mark Goddard authored
The ironic inspector iPXE configuration includes the following kernel argument: initrd=agent.ramdisk However, the ramdisk is actually called ironic-agent.initramfs, so the argument should be: initrd=ironic-agent.initramfs In BIOS boot mode this does not cause a problem, but for compute nodes with UEFI enabled, it seems to be more strict about this, and fails to boot. Change-Id: Ic84f3b79fdd3cd1730ca2fb79c11c7a4e4d824de Closes-Bug: #1836375
-
Mark Goddard authored
A common class of problems goes like this: * kolla-ansible deploy * Hit a problem, often in ansible/roles/*/tasks/bootstrap.yml * Re-run kolla-ansible deploy * Service fails to start This happens because the DB is created during the first run, but for some reason we fail before performing the DB sync. This means that on the second run we don't include ansible/roles/*/tasks/bootstrap_service.yml because the DB already exists, and therefore still don't perform the DB sync. However this time, the command may complete without apparent error. We should be less careful about when we perform the DB sync, and do it whenever it is necessary. There is an argument for not doing the sync during a 'reconfigure' command, although we will not change that here. This change only always performs the DB sync during 'deploy' and 'reconfigure' commands. Change-Id: I82d30f3fcf325a3fdff3c59f19a1f88055b566cc Closes-Bug: #1823766 Closes-Bug: #1797814
-
- Jul 11, 2019
-
-
Mark Goddard authored
Since https://review.opendev.org/647699/, we lost the logic to only deploy glance-api on a single host when using the file backend. This code was always a bit custom, and would be better supported by using the 'host_in_groups' pattern we have in a few other places where a single group name does not describe the placement of containers for a service. Change-Id: I21ce4a3b0beee0009ac69fecd0ce24efebaf158d Closes-Bug: #1836151
-
- Jul 10, 2019
-
-
Radosław Piliszek authored
Controllers lacking compute should not be required to provide valid migration_interface as it is not used there (and prechecks do not check that either). Inclusion of libvirt conf section is now conditional on service type. libvirt conf section has been moved to separate included file to avoid evaluation of the undefined variable (conditional block did not prevent it and using 'default' filter may hide future issues). See https://github.com/ansible/ansible/issues/58835 Additionally this fixes the improper nesting of 'if' blocks for libvirt. Change-Id: I77af534fbe824cfbe95782ab97838b358c17b928 Closes-Bug: #1835713 Signed-off-by:
Radosław Piliszek <radoslaw.piliszek@gmail.com>
-
Radosław Piliszek authored
This mimics behavior of core 'template' module to allow relative includes from the same dir as merged template, base dir of playbook/role (usually role for us) and its 'templates' subdir. Additionally old unused code was removed. Change-Id: I83804d3cf5f17eb2302a2dfe49229c6277b1e25f Signed-off-by:
Radosław Piliszek <radoslaw.piliszek@gmail.com>
-
Michal Nasiadka authored
* Sometimes getting/creating ceph mds keyring fails, similar to https://tracker.ceph.com/issues/16255 Change-Id: I47587cbeb8be0e782c13ba7f40367409e2daa8a8
-
- Jul 08, 2019
-
-
Mark Goddard authored
Due to a bug in ansible, kolla-ansible deploy currently fails in nova with the following error when used with ansible earlier than 2.8: TASK [nova : Waiting for nova-compute services to register themselves] ********* task path: /home/zuul/src/opendev.org/openstack/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml:30 fatal: [primary]: FAILED! => { "failed": true, "msg": "The field 'vars' has an invalid value, which includes an undefined variable. The error was: 'nova_compute_services' is undefined\n\nThe error appears to have been in '/home/zuul/src/opendev.org/openstack/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml': line 30, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Waiting for nova-compute services to register themselves\n ^ here\n" } Example: http://logs.openstack.org/00/669700/1/check/kolla-ansible-centos-source/81b65b9/primary/logs/ansible/deploy This was caused by https://review.opendev.org/#/q/I2915e2610e5c0b8d67412e7ec77f7575b8fe9921, which hits upon an ansible bug described here: https://github.com/markgoddard/ansible-experiments/tree/master/05-referencing-registered-var-do-until. We can work around this by not using an intermediary variable. Change-Id: I58f8fd0a6e82cb614e02fef6e5b271af1d1ce9af Closes-Bug: #1835817
-
- Jul 05, 2019
-
-
Mark Goddard authored
* Fix wsrep sequence number detection. Log message format is 'WSREP: Recovered position: <UUID>:<seqno>' but we were picking out the UUID rather than the sequence number. This is as good as random. * Add become: true to log file reading and removal since I4a5ebcedaccb9261dbc958ec67e8077d7980e496 added become: true to the 'docker cp' command which creates it. * Don't run handlers during recovery. If the config files change we would end up restarting the cluster twice. * Wait for wsrep recovery container completion (don't detach). This avoids a potential race between wsrep recovery and the subsequent 'stop_container'. * Finally, we now wait for the bootstrap host to report that it is in an OPERATIONAL state. Without this we can see errors where the MariaDB cluster is not ready when used by other services. Change-Id: Iaf7862be1affab390f811fc485fd0eb6879fd583 Closes-Bug: #1834467
-
- Jul 04, 2019
-
-
Mark Goddard authored
There are now several good tools for deploying Ceph, including Ceph Ansible and ceph-deploy. Maintaining our own Ceph deployment is a significant maintenance burden, and we should focus on our core mission to deploy OpenStack. Given that this is a significant part of kolla ansible currently we will need a long deprecation period and a migration path to another tool. Change-Id: Ic603c85c04d8794580a19f9efaa7a8589565f4f6 Partially-Implements: blueprint remove-ceph
-
Christian Berendt authored
Change-Id: Ib5490d504a5b7c9a37dda7babf1257aa661c11de
-
Mark Goddard authored
There is a race condition during nova deploy since we wait for at least one compute service to register itself before performing cells v2 host discovery. It's quite possible that other compute nodes will not yet have registered and will therefore not be discovered. This leaves them not mapped into a cell, and results in the following error if the scheduler picks one when booting an instance: Host 'xyz' is not mapped to any cell The problem has been exacerbated by merging a fix [1][2] for a nova race condition, which disabled the dynamic periodic discovery mechanism in the nova scheduler. This change fixes the issue by waiting for all expected compute services to register themselves before performing host discovery. This includes both virtualised compute services and bare metal compute services. [1] https://bugs.launchpad.net/kolla-ansible/+bug/1832987 [2] https://review.opendev.org/665554 Change-Id: I2915e2610e5c0b8d67412e7ec77f7575b8fe9921 Closes-Bug: #1835002
-
- Jul 02, 2019
-
-
Rafael Weingärtner authored
This proposal will add support to Kolla-Ansible for Cloudkitty InfluxDB storage system deployment. The feature of InfluxDB as the storage backend for Cloudkitty was created with the following commit https://github.com/openstack/cloudkitty/commit/ c4758e78b49386145309a44623502f8095a2c7ee Problem Description =================== With the addition of support for InfluxDB in Cloudkitty, which is achieving general availability via Stein release, we need a method to easily configure/support this storage backend system via Kolla-ansible. Kolla-ansible is already able to deploy and configure an InfluxDB system. Therefore, this proposal will use the InfluxDB deployment configured via Kolla-ansible to connect to CloudKitty and use it as a storage backend. If we do not provide a method for users (operators) to manage Cloudkitty storage backend via Kolla-ansible, the user has to execute these changes/configurations manually (or via some other set of automated scripts), which creates distributed set of configuration files, "configurations" scripts that have different versioning schemas and life cycles. Proposed Change =============== Architecture ------------ We propose a flag that users can use to make Kolla-ansible configure CloudKitty to use InfluxDB as the storage backend system. When enabling this flag, Kolla-ansible will also enable the deployment of the InfluxDB via Kolla-ansible automatically. CloudKitty will be configured accordingly to [1] and [2]. We will also externalize the "retention_policy", "use_ssl", and "insecure", to allow fine granular configurations to operators. All of these configurations will only be used when configured; therefore, when they are not set, the default value/behavior defined in Cloudkitty will be used. Moreover, when we configure "use_ssl" to "true", the user will be able to set "cafile" to a custom trusted CA file. Again, if these variables are not set, the default ones in Cloudkitty will be used. Implementation -------------- We need to introduce a new variable called `cloudkitty_storage_backend`. Valid options are `sqlalchemy` or `influxdb`. The default value in Kolla-ansible is `sqlalchemy` for backward compatibility. Then, the first step is to change the definition for the following variable: `/ansible/group_vars/all.yml:enable_influxdb: "{{ enable_monasca | bool }}"` We also need to enable InfluxDB when CloudKitty is configured to use it as the storage backend. Afterwards, we need to create tasks in CloudKitty configurations to create the InfluxDB schema and configure the configuration files accordingly. Alternatives ------------ The alternative would be to execute the configurations manually or handle it via a different set of scripts and configurations files, which can become cumbersome with time. Security Impact --------------- None identified by the author of this spec Notifications Impact -------------------- Operators that are already deploying CloudKitty with InfluxDB as storage backend would need to convert their configurations to Kolla-ansible (if they wish to adopt Kolla-ansible to execute these tasks). Also, deployments (OpenStack environments) that were created with Cloudkitty using storage v1 will need to migrate all of their data to V2 before enabling InfluxDB as the storage system. Other End User Impact --------------------- None. Performance Impact ------------------ None. Other Deployer Impact --------------------- New configuration options will be available for CloudKitty. * cloudkitty_storage_backend * cloudkitty_influxdb_retention_policy * cloudkitty_influxdb_use_ssl * cloudkitty_influxdb_cafile * cloudkitty_influxdb_insecure_connections * cloudkitty_influxdb_name Developer Impact ---------------- None Implementation ============== Assignee -------- * `Rafael Weingärtner <rafaelweingartne>` Work Items ---------- * Extend InfluxDB "enable/disable" variable * Add new tasks to configure Cloudkitty accordingly to these new variables that are presented above * Write documentation and release notes Dependencies ============ None Documentation Impact ==================== New documentation for the feature. References ========== [1] `https://docs.openstack.org/cloudkitty/latest/admin/configuration/storage.html#influxdb-v2` [2] `https://docs.openstack.org/cloudkitty/latest/admin/configuration/collector.html#metric-collection` Change-Id: I65670cb827f8ca5f8529e1786ece635fe44475b0 Signed-off-by:
Rafael Weingärtner <rafael@apache.org>
-
Mark Goddard authored
This performs the same as a deploy-bifrost, but first stops the bifrost services and container if they are running. This can help where a docker stop may lead to an ungraceful shutdown, possibly due to running multiple services in one container. Change-Id: I131ab3c0e850a1d7f5c814ab65385e3a03dfcc74 Implements: blueprint bifrost-upgrade Closes-Bug: #1834332
-
- Jul 01, 2019
-
-
Mark Goddard authored
This is necessary for some Ansible tests which were renamed in 2.5 - including 'version' and 'successful'. Change-Id: Iacf88ef5589c7571fcf56ba8b99d3dbe76975195
-
- Jun 28, 2019
-
-
Will Szumski authored
otherwise I'm seeing: TASK [monasca : Creating the monasca agent user] **************************************************************************************************************************** fatal: [monitor1]: FAILED! => {"changed": false, "module_stderr": "Shared connection to 172.16.3.24 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n F ile \"/tmp/ansible_I0RmxQ/ansible_module_kolla_toolbox.py\", line 163, in <module>\r\n main()\r\n File \"/tmp/ansible_I0RmxQ/ansible_module_kolla_toolbox.py\", line 141, in main\r\n output = client.exec_start(job)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/decorators.py\", line 19, in wrapped\r\n return f(self, resource_id, *args, **kwargs)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/api/exec_api.py\", line 165, in exec_start\r\ n return self._read_from_socket(res, stream, tty)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/api/client.py\", line 377, in _read_from_ socket\r\n return six.binary_type().join(gen)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/socket.py\", line 75, in frames_iter\r\ n n = next_frame_size(socket)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/socket.py\", line 62, in next_frame_size\r\n data = read_exactly(socket, 8)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/socket.py\", line 47, in read_exactly\r\n next_data = read(socket, n - len(data))\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/socket.py\", line 31, in read\r\n return socket.recv(n)\r\nsocket.timeout: timed out\r\n", "msg": "MODULE FAILURE", "rc": 1} when the monitoring nodes aren't on the public API network. Change-Id: I7a93f69da0e02c9264da0b081d2e60626f899e3a
-
- Jun 27, 2019
-
-
Mark Goddard authored
Currently, we have a lot of logic for checking if a handler should run, depending on whether config files have changed and whether the container configuration has changed. As rm_work pointed out during the recent haproxy refactor, these conditionals are typically unnecessary - we can rely on Ansible's handler notification system to only trigger handlers when they need to run. This removes a lot of error prone code. This patch removes conditional handler logic for all services. It is important to ensure that we no longer trigger handlers when unnecessary, because without these checks in place it will trigger a restart of the containers. Implements: blueprint simplify-handlers Change-Id: I4f1aa03e9a9faaf8aecd556dfeafdb834042e4cd
-
Christian Berendt authored
Change-Id: Ia7041be384ac07d0a790c2c5c68b1b31ff0e567a
-
Mark Goddard authored
During an upgrade, nova pins the version of RPC calls to the minimum seen across all services. This ensures that old services do not receive data they cannot handle. After the upgrade is complete, all nova services are supposed to be reloaded via SIGHUP to cause them to check again the RPC versions of services and use the new latest version which should now be supported by all running services. Due to a bug [1] in oslo.service, sending services SIGHUP is currently broken. We replaced the HUP with a restart for the nova_compute container for bug 1821362, but not other nova services. It seems we need to restart all nova services to allow the RPC version pin to be removed. Testing in a Queens to Rocky upgrade, we find the following in the logs: Automatically selected compute RPC version 5.0 from minimum service version 30 However, the service version in Rocky is 35. There is a second issue in that it takes some time for the upgraded services to update the nova services database table with their new version. We need to wait until all nova-compute services have done this before the restart is performed, otherwise the RPC version cap will remain in place. There is currently no interface in nova available for checking these versions [2], so as a workaround we use a configurable delay with a default duration of 30 seconds. Testing showed it takes about 10 seconds for the version to be updated, so this gives us some headroom. This change restarts all nova services after an upgrade, after a 30 second delay. [1] https://bugs.launchpad.net/oslo.service/+bug/1715374 [2] https://bugs.launchpad.net/nova/+bug/1833542 Change-Id: Ia6fc9011ee6f5461f40a1307b72709d769814a79 Closes-Bug: #1833069 Related-Bug: #1833542
-
Mark Goddard authored
When running deploy or reconfigure for Keystone, ansible/roles/keystone/tasks/deploy.yml calls init_fernet.yml, which runs /usr/bin/fernet-rotate.sh, which calls keystone-manage fernet_rotate. This means that a token can become invalid if the operator runs deploy or reconfigure too often. This change splits out fernet-push.sh from the fernet-rotate.sh script, then calls fernet-push.sh after the fernet bootstrap performed in deploy. Change-Id: I824857ddfb1dd026f93994a4ac8db8f80e64072e Closes-Bug: #1833729
-
- Jun 26, 2019
-
-
Radosław Piliszek authored
They are used only to obtain keys for the next task. Change-Id: I2fac22af4710b70e4df8e3a272bcfb6cc8b8532e Signed-off-by:
Radosław Piliszek <radoslaw.piliszek@gmail.com>
-
- Jun 24, 2019
-
-
chenxing authored
The Hitachi NAS Platform iSCSI driver was marked as not supported by Cinder in the Ocata realease[1]. [1] https://review.opendev.org/#/c/444287/ Change-Id: I1a25789374fddaefc57bc59badec06f91ee6a52a Closes-Bug: #1832821
-
ZijianGuo authored
In some cases, we can mount extra volumes for gnocchi to facilitate integration. Change-Id: Ife475ca7d0555562f6e3ef0867835d69d288c8c4 Signed-off-by:
ZijianGuo <guozijn@gmail.com>
-
- Jun 21, 2019
-
-
Radosław Piliszek authored
"Check if policies shall be overwritten" already exists in its newer form. The removed one had no effect on play. Change-Id: I48ed6c1c71c4162a3ab28ab2b51dc1e02932dfef Signed-off-by:
Radosław Piliszek <radoslaw.piliszek@gmail.com>
-
ZijianGuo authored
Actually, 'mongodb.conf' is a yaml format configuration file. Do not use merge_configs to merge it. Change-Id: Id3c006df00c1e2d66472c2195781e01c640cab22 Signed-off-by:
ZijianGuo <guozijn@gmail.com>
-
Doug Szumski authored
The TSI is recommended for all users. Some of the key benefits are a reduction in memory requirements and an increase in the maximum number of time series. For more information see this link: https://docs.influxdata.com/influxdb/v1.7/concepts/tsi-details/ Change-Id: I4b29eb5a4ae82f6c39059d0b6de41debdfd75508
-
Gaëtan Trellu authored
Since this review[1], Qinling supports WSGI execution. From a production perspective, Qinling should be deployed using Apache and mod_wsgi. "api_worker" option is not needed anymore because processes will be handle by Apache mod_wsgi. Qinling Docker image review[2] has ben created. [1] https://review.opendev.org/661851 [2] https://review.opendev.org/666647 Change-Id: I9aaee4c2932f1e4ea9fe780a64e96a28fa6bccfb Story: 2005920 Task: 34181
-
- Jun 19, 2019
-
-
Gaëtan Trellu authored
The "environment" variable set in config.yml and handlers/main.yml has been removed to fix de deployment and the reconfigure. Change-Id: I912cadb5113d5572235731863825588b2eb12759
-
Tatsuma Matsuki authored
Change-Id: I97263385372a28204c0ae81373836a2d6292f3bd Closes-Bug: #1833336
-
- Jun 18, 2019
-
-
Marek Svensson authored
This change defaults freezer to use mariadb as default backend for database and adds elasticsearch as an optional backend due to the requirement of freezer to use elasticsearch version 2.3.0. The default elasticsearch in kolla-ansible is 5.6.x and that doesn't work with freezer. Added needed options to the elasticsearch backend like: - protocol - address - port - number of replicas Change-Id: I88616c285bdb297fd1f738846ddffe1b08a7a827 Signed-off-by:
Marek Svensson <marek@marex.st>
-
Doug Szumski authored
This change formats internal Fluent logs in a similar way to other logs. It makes it easier for a user to identify issues with Fluent parsing logs. Any failure to parse a log will be ingested into the logging framework and can easily be located by searching for 'pattern not match' or by filtering for Fluent log warnings. Change-Id: Iea6d12c07a2f4152f2038d3de2ef589479b3332b
-
ZijianGuo authored
* When using redis as the backend of osprofiler, it cannot connect to redis because the redis_connection_string is incorrect. * Let other places that use redis also use this variable. Change-Id: I14de6597932d05cd7f804a35c6764ba4ae9087cd Closes-Bug: #1833200 Signed-off-by:
ZijianGuo <guozijn@gmail.com>
-
Doug Szumski authored
Kolla service logs which don't match a Fluentd rewriterule get dropped. This change prevents that by tagging them with 'unmatched'. Change-Id: I0a2484d878d5c86977fb232a57c52f874ca7a34c
-
Doug Szumski authored
Monasca Python service logs prior to this change were being dropped due to missing entries in the Fluent record_transformer config file. This change adds support for ingesting those logs, and explicitly removes support for ingesting Monasca Log API logs to reduce the risk of feedback, for example if debug logging is turned on in the Monasca Log API. Change-Id: I9e3436a8f946873867900eed5ff0643d84584358
-
Doug Szumski authored
Presently, errors can appear in Fluentd and Monasca Log API logs due to log output from some Monasca services, which do not use Oslo log, being processed alongside other OpenStack logs which do. This change parses these log files separately to prevent these errors. Change-Id: Ie3cbb51424989b01727b5ebaaeba032767073462
-
Radosław Piliszek authored
Since we have different upgrade paths, we must use the actually installed Ceph release name when doing require-osd-release Closes-Bug: #1832989 Change-Id: I6aaa4b4ac0fb739f7ad885c13f55b6db969996a2 Signed-off-by:
Radosław Piliszek <radoslaw.piliszek@gmail.com>
-
- Jun 17, 2019
-
-
Radosław Piliszek authored
The task does not change any state but is used to set a fact from parsed output. Also adjust task name. Change-Id: I5fe322546d82a373522645485be18fe7bfc57999 Signed-off-by:
Radosław Piliszek <radoslaw.piliszek@gmail.com>
-