SUSE CaaS Platform/Cluster Update

Fonte: https://wiki.microfocus.com/index.php/SUSE_CaaS_Platform/Cluster_Update

Requirements

To make sure the Cluster Update can work it is required to have exactly two channels connected to zypper on all servers (admin, master and worker):

For SUSE CaaS Platform up to 2.0 these channels are required:

 SUSE-CAASP-ALL-Pool
 SUSE-CAASP-ALL-Updates

For SUSE CaaS Platform 3.0 these channels are required:

 SUSE-CAASP-3.0-Pool
 SUSE-CAASP-3.0-Updates

Optional the following channels may be available for SUSE CaaS Platform:

 SUSE-CAASP-Toolchain-3-Pool
 SUSE-CAASP-Toolchain-3-Updates

This can be achieved either by registering the servers to a local SMT server, direct to SUSE Customer Center, during AutoYaST with specifying the addon channels i.e. to get the channels from a SMT or SUSE Manager Server via

    zypper ar -f http://<URL-to-pool-channel> SUSE-CAASP-ALL-Pool
    zypper ar -f http://<URL-to-update-channel> SUSE-CAASP-ALL-Updates

Note for Channels provided by SUSE Manager:

By default SUSE Manager does not expose repositories for direct access. In order to be able to access them via https, you need to create a Distribution for Autoinstallation for the SUSE CaaS Platform (x86_64) Product on SUSE Manager. Section "Creating Distribution for Autoinstallation" in the SUSE Manager Docu. During the distribution setup you need to provide a Label for the distribution. This label will be part of the URL under which the repositories are available.

See also https://www.suse.com/documentation/suse-manager-3/singlehtml/book_suma_advanced_topics_31/book_suma_advanced_topics_31.html

    zypper ar -f http://$SUSE_MANAGER/ks/dist/child/$(pool-channel-label)/$(distribution-label) SUSE-CAASP-ALL-Pool
    zypper ar -f http://$SUSE_MANAGER/ks/dist/child/$(updates-channel-label)/$(distribution-label) SUSE-CAASP-ALL-Updates


It is required to have the channels attached with Refresh "Yes" for the pool and for the updates channels as both channels can and will receive updates. Due to that ensure to use the "-f" parameter when adding repositories manually via zypper addrepo / zypper ar.

In case the source for the channels is doing staging - the cluster update only will apply patches during the next run after the connected stage will get updated on the staging system.

Cluster Update

How does the update process of SUSE CaaS Platform work?

The Administration and Cluster Nodes check daily, if there are updates pending and if they can be applied. This happens on the Administration Node and on the Cluster Nodes. The update order here is, that we need to update the Administration Node first, before we can update the Cluster Nodes.

Administration Node

transactional-update was successful

CaaSP-Velum-Admin-Reboot.png

If you get a blue message, that the Administration Node is running outdated software without any error text, press "UPDATE ADMIN NODE" and reboot the administration node.

transactional-update failed

CaaSP-Velum-Admin-Failed.png

If you get a red message, that the Administration Node is running outdated software, and this message includes "failed to update", don't update the admin node by pressing "UPDATE ADMIN NODE" button. Instead, login, read /var/log/transactional-update.logand look, what the error cause is. Most likely the installation source was a DVD and is still enabled, but the DVD was removed. In this case, disable the repository of the installation source. After the reason for the error is fixed, wait until the scripts are running the next time (by default during the next 24 hours) or run transactional-update dup reboot. After some time (10 minutes to 1 hour) the message in velum should change.

Cluster Node

transactional-update was successful

1. Cluster node tells velum via salt, that this machine has pending updates and needs a reboot. Velum/Salt look for the flag set by transactional-update every 10 minutes, so there is a up to 10 minute delay between transactional-update completing, and the Velum UI showing updates as available.

Hint: Velum gets this information from the salt grain "tx_update_reboot_needed" that is either "True" or "False".

2. Velum shows to the admin, that a Node has pending updates and needs a reboot.

CaaSP-Velum-Update-Available.png

=> If you see the button to start the update manual (update all nodes), don't play with transactional-update on the cluster nodes, even if something goes wrong during the following update. transactional-update itself was successful, and all what can happen is that the cluster is completely broken afterwards, but it cannot fix the problems velum runs into.

3. The update all nodes button can be pressed whenever is a good time for this. Nothing will happen if the update is not applied immediately, but only during the maintenance window.

4. Velum will via salt shutdown the services on that cluster node and disable them, reboot the cluster nodes to enable the update and apply missing patches, adjust the configuration, enable and start the services.

=> If this goes wrong, velum will notify you. To get this fixed, you need to find out which of the steps in 4. did go wrong. Hint: it is not transactional-update which did go wrong, we know already that this succeeded.

Hint2: if you think the "Failed" is wrong: you cannot fix the wrong "Failed" by calling transactional-update, since this is not coming from transactional-update.

transactional-update failed

CaaSP-Velum-Update-Failed.png

In case the initial check if updates are available and can be applied goes wrong: there will be a Update Failed message for one or more nodes in Velum. In this case: read the error message in /var/log/transactional-update.log on that cluster nodes and most likely you have to disable your installation source or fix your update channel configuration.

Hint: if the system got registered during installation, all pending updates will already be applied during installation, the installation source will get disabled and the repositories are setup correct. If the system gets registered manually later, you have to "fix" that everywhere at your own. This means that you have to disable the installation sources at your own, and you have to make sure that autorefresh is enabled for the POOL and Update channel. "autorefresh" should always be enabled for respositories, which can change over the time, like the pool and update repositories. It should be disabled for repositories, which do not change over the time, like the installation source (but that needs to be disabled anyway). This option tells the tools if a refresh of this repositories is needed or not. The tools cannot autodetect this. If this option is set wrong. the tools will either assume, that no update is available, or try to download something no longer available and the update will fail.