OpenShift Plattform Monitoring Skript

This script will run some basic tests on a given openshift platform. The eintension is to have a first state of the platform after having done updates, etc.

Bei jedem Start wird eine kurze Beschreibung Anleitung angezeigt. Man muss sich jedes Mal auf der entsprechenden OSE Plattform einloggen. Es braucht für den Betrieb ein gültiges Host Inventory File. Es werden Angaben zum Login etc aus dem angegebenen Host Inventory File ausgelesen und angewendet, - so z.B. openshift_logging_master_public_url.

u229044@oseadmin01az1 admin$ ./ose_monitoring_checks.sh ../ansible/inventory/tss/
==========================================================
This script does some basic checks of a given OSE platform.

you need to enter the path to the host inventory file

Please be aware, that this script is only to be run on the two
bastion hosts oseadmin01 and 02 from your personal account.
You need OSE cluster admin privileges to run this script
Please keep in mind, that your local ~/.kube/config will
be erased on each run. The reason is to make sure you are
working only on the OSE cluster of the given host file
This is also the reason why you have to oc login on each run

If you entcounter many ssh missing fingerprint messages:
EXECUTE: # export ANSIBLE_HOST_KEY_CHECKING=False

==========================================================
Please enter ansible vault password:
Getting master cluster public hostname from ansible..
Log in on: https://master.ose.sbb-cloud.net:8443
Authentication required for https://master.ose.sbb-cloud.net:8443 (openshift)
Username: USER
Password: PASSWORD

Login successful.
. . .

Filename: ose_monitoring_checks.sh

#!/bin/bash
#
# abstract: This script will run some basic tests on a given openshift
#           platform. Th eintension is to have a first state of the
#           platform after having done updates, etc.
#
#           To be able to have a defined context, when working on the bastion host,
#           one needs the input of a hosts inventory file.
#
#
# run this script on bastion host
#
# author:   M.Reber - Cloud Platform Team
# date:     2019.08.19
#===================================================================
#
#===================================================================
#
tmpfile=$(mktemp)
ansible_vault=$(mktemp)
#
# if script is interupted execute the cleanup function
trap 'cleanup_function' exit
# Function that gets executed on script interruption.
cleanup_function () {
        # cleanup
        rm -f ${tmpfile}
        rm -f ${ansible_vault}
}
#
echo "=========================================================="
echo "This script does some basic checks of a given OSE platform."
echo ""
echo "you need to enter the path to the host inventory file"
echo ""
echo "Please be aware, that this script is only to be run on the two"
echo "bastion hosts oseadmin01 and 02 from your personal account."
echo "You need OSE cluster admin privileges to run this script"
echo "Please keep in mind, that your local ~/.kube/config will"
echo "be erased on each run. The reason is to make sure you are"
echo "working only on the OSE cluster of the given host file"
echo "This is also the reason why you have to oc login on each run"
echo ""
echo "If you entcounter many ssh missing fingerprint messages:"
echo "EXECUTE: # export ANSIBLE_HOST_KEY_CHECKING=False"
echo ""
echo "=========================================================="
#
#
#------------------------------------------------------------------------------------
#                           INITIAL SKRIPT ENV CHECKS:
#------------------------------------------------------------------------------------
# check if the input parameter is the host inventory file of the platform
invfile="${1}"
if [[ ! -d "${invfile}" ]] ; then
  echo "Usage $0 ../ansible/inventory/otc_test03/"
  exit 1
fi
if ! hostname --short | grep --silent oseadmin ; then
  echo "Run this script on bastion host oseadmin01 or oseadmin02 only, exit"
  exit 1
fi
#
echo Please enter ansible vault password:
read -s ansible_vault_pw
echo $ansible_vault_pw > $ansible_vault
#
echo "Getting master cluster public hostname from ansible.."
master_url=https://$(ansible masters -i ${invfile} -m debug -a "var=openshift_master_cluster_public_hostname" --vault-password-file=${ansible_vault} | grep -v SUCCESS | awk '{print $2}' | head -1 | sed 's/"//g'):8443
#
# we need to get the cluster name out of the master_url and there
# replace the "." (dot) with "-" (dash)
# there is also need to cut the https:// off
cluster_name=$(echo $master_url | tr . - | cut -d"/" -f3)
#
# this script should only run as unprivileged user
if [[ "$EUID" -eq 0 ]]; then
  echo "Error, run this script as unprivileged user."
  exit 1
fi
#
# here we log in to the cluster
echo "Log in on: ${master_url}"
oc login ${master_url} || exit $?
#
clear
#------------------------------------------------------------------------------------
#                           START WITH CLUSTER CHECKS:
#------------------------------------------------------------------------------------
#
# check openshift_logging_master_public_url
echo "==== Checking master_public_url health=="
health=$(curl --silent ${master_url}/healthz)
if [[ "${health}" != "ok" ]] ; then
  echo "[ NOK ] master_url ${master_url} health problem ${health}"
else
  echo "[ OK ] master_url ${master_url}  health is ${health}"
fi
echo ""
#
echo "==== Checking cluster globaly ==="
cmd_status=`ansible masters -i ${invfile} -m shell -a "ps -ef | grep etcd | grep -v grep" --vault-password-file=${ansible_vault} | grep -v SUCCESS | grep -v etcd`
if [[ "${cmd_status}" != "" ]] ; then
  echo "[ NOK ] not all etcd on masters present: ${cmd_status}"
else
  echo "[ OK ] all etcd processes present"
fi
#
first_master=$(ansible masters -i ${invfile} --list-host --vault-password-file=${ansible_vault} | grep -v hosts | head -n1 | awk '{print $1}')
master_nodes=$(ansible ${first_master} -i ${invfile} -m shell -a "oc get nodes | grep master" --vault-password-file=${ansible_vault} | grep -v SUCCESS | awk '{print $1}')
#
# here we check the health status of the etcd cluster
# this can be done by selecting the etcd container and passing
etccmd_string=""
for node in ${master_nodes}
  do
    if [[ "${etccmd_string}" == "" ]] ; then
      etccmd_string="https://${node}:2379"
    else
      etccmd_string="${etccmd_string},https://${node}:2379"
    fi
done
etccmd_string="${etccmd_string} --ca-file=/etc/etcd/ca.crt --key-file=/etc/etcd/peer.key --cert-file=/etc/etcd/peer.crt cluster-health"
#
# Checks OpenShift Cluster minor version.. 3.X. (Workaround if OpenShift is on 3.9..)
if [[ $(ansible ${first_master} -i ${invfile} -m shell -a "oc version | grep openshift" --vault-password-file=${ansible_vault} | grep -v SUCCESS | awk '{print $2}' | cut -d '.' -f2) != "9" ]] ; then
  # get docker container ID
  docker_id=$(ansible ${first_master} -i ${invfile} -m shell -a "docker ps | grep etcd_master-etcd" --vault-password-file=${ansible_vault} | grep -v SUCCESS| awk '{print $1}')
  clus_stat=$(ansible ${first_master} -i ${invfile} -m shell -a "docker exec -t ${docker_id} etcdctl -C ${etccmd_string}" --vault-password-file=${ansible_vault} | grep cluster | grep healthy)
#
  if [[ "${clus_stat}" == "" ]] ; then
    # we got a negative result, cluster is not healthy
    echo "[ NOK ] etcd cluster does not show healthy result on master ${first_master}"
    ansible ${first_master} -i ${invfile} -m shell -a "docker exec -t ${docker_id} etcdctl -C ${etccmd_string}" --vault-password-file=${ansible_vault}
  else
    echo "[ OK ] etcd cluster is healthy on master ${first_master}"
  fi
else
  echo "In OpenShift 3.9 etcd service is a systemd service (not dockerised..) - checking service status"
  clus_stat=$(ansible ${first_master} -i ${invfile} -m shell -a "systemctl status etcd" --vault-password-file=${ansible_vault} | grep -v SUCCESS | grep "Active:" | awk '{print $2}')
  if [[ "${clus_stat}" == "active" ]] ; then
    echo "[ OK ] etcd cluster is healthy on master ${first_master}"
  else
    echo "[ NOK ] etcd cluster does not show healthy result on master ${first_master} error was:"
    ansible ${first_master} -i ${invfile} -m shell -a "systemctl status etcd" --vault-password-file=${ansible_vault} | grep -v SUCCESS
  fi
fi
#
# here we check the reachability of all nodes
cmd_status=$(ansible nodes -i ${invfile} --timeout=40 -a 'echo "success"' --vault-password-file=${ansible_vault} | grep -vi SUCCESS)
if [[ "${cmd_status}" != "" ]] ; then
  echo "[ NOK ] not all nodes reachable: ${cmd_status}"
else
  echo "[ OK ] all nodes reachable by ansible and ssh"
fi
cmd_status=$(ansible ${first_master} -i ${invfile} -m shell -a "oc get nodes | grep -v Ready" --vault-password-file=${ansible_vault} | grep -v SUCCESS | grep -v NAME)
if [[ "${cmd_status}" != "" ]] ; then
  echo "[ NOK ] review nodes, not all nodes are shedulable: ${cmd_status}"
  ansible ${first_master} -i ${invfile} -m shell -a "oc get nodes | grep -v Ready" --vault-password-file=${ansible_vault} | grep -v SUCCESS
else
  echo "[ OK ] all nodes present and ready"
fi
cmd_status=$(oc get pods --all-namespaces | grep -v Running |grep -v Completed | grep -v NAMESPACE)
if [[ "${cmd_status}" != "" ]] ; then
  echo "[ NOK ] there are Pods in not-ok status, review"
  oc get pods --all-namespaces | grep -v Running | grep -v Completed
  pod_errors=true
  fixable_errors=true
else
  echo "[ OK ] all Pods running OK"
fi
echo ""
#
echo "==== Checking ha proxy pod presence  ==="
cmd_status=$(oc get pods --no-headers -n default | grep -v "1/1")
if [[ "${cmd_status}" != "" ]] ; then
  echo "[ NOK ] review ha pods not all running: ${cmd_status}"
  oc get pods --no-headers -n default | grep -v "1/1"
else
  echo "[ OK ] all needed Pods are runnig as desired READY 1/1"
fi
echo ""
#
echo "==== Checking HA Router Status ==="
# we need to have all DC running the same number desired and current
#oc get dc -n default
for res in $(oc get dc --no-headers -n default | awk '{print $1}');do
  desnr=$(oc get dc --no-headers -n default | grep "^${res}" | awk '{print $3}')
  curnr=$(oc get dc --no-headers -n default | grep "^${res}" | awk '{print $4}')
  if [[ ${desnr} -ne ${curnr} ]] ; then
    echo "[ NOK ] not equal DESIRED ${desnr} and CURRENT ${curnr} of DC Name ${res}"
  else
    echo "[ OK ] runnig DESIRED ${desnr} and CURRENT ${curnr} of DC Name ${res}"
  fi
done
#
# here we need to check the image version if it is on the same level as the RPM package
package_version=$(ansible ${first_master} -i ${invfile} -m shell -a "warn=false rpm -qa | grep atomic-openshift-node | cut -d '-' -f4" --vault-password-file=${ansible_vault} | grep -v 'SUCCESS')
echo "Checking image version of ha-router"
router_name=$(oc get dc -n default | grep router | head -1 | awk '{print $1}')
oc get dc -n default ${router_name} -o yaml | grep "image:"
check_vers=$(oc get dc -n default ${router_name} -o yaml | grep "image:" | grep ${package_version})
if [[ "${check_vers}" == "" ]] ; then
  echo "[ NOK ] OSE version of HA router is not equal like version ${package_version} in hosts file"
else
  echo "[ OK ] OSE version of HA router is equal like version ${package_version} in hosts file"
fi
echo ""
#
echo "==== Checking openshift-monitoring ==="
# OSE monitoring need to be running on each node
# on each node we have a node-exporter
# we have one pod alertmanager-main-1 and two prometheus-k8s
oc get pods -n openshift-monitoring -o wide | sort -k 7 > ${tmpfile}
for node in $(ansible ${first_master} -i ${invfile} -m shell -a "oc get nodes --no-headers" --vault-password-file=${ansible_vault} | grep -v SUCCESS | awk '{print $1}')
  do
    podfnd=$(grep ${node} ${tmpfile} | grep node-exporter| grep "2/2")
    if [[ "${podfnd}" == "" ]] ; then
      echo "[ NOK ] pod node-exporter is not running 2/2 on node ${node}"
    else
      echo "[ OK ] pod node-exporter is running 2/2 on node ${node}"
    fi
done
echo ""
prom_master_ste_1=$(grep prometheus-k ${tmpfile}| awk '{print $2}'| cut -d"/" -f1)
prom_master_ste_2=$(grep prometheus-k ${tmpfile}| awk '{print $2}'| cut -d"/" -f1)
prom_master_node=$(grep prometheus-k ${tmpfile}| awk '{print $7}')
if [[ "${prom_master_ste_1}" != "${prom_master_ste_2}" ]] ; then
  echo "[ NOK ] pod prometheus-k8s is not running as desired ${prom_master_ste_1}/${prom_master_ste_2} on node ${prom_master_node}"
else
  echo "[ OK ] pod prometheus-k8s is running as desired ${prom_master_ste_1}/${prom_master_ste_2} on node ${prom_master_node}"
fi
rm -f ${tmpfile}
echo ""
# We need to have a sematext-agent pod running on each node
logging_proj=$(oc get projects| grep logging| awk '{print $1}'  | tail -1)
echo "==== Checking ${logging_proj} ==="
oc get pods -n ${logging_proj} -o wide > ${tmpfile}
for node in $(ansible ${first_master} -i ${invfile} -m shell -a "oc get nodes --no-headers" --vault-password-file=${ansible_vault} | grep -v SUCCESS | awk '{print $1}')
  do
    podfnd=$(grep ${node} ${tmpfile} | grep sematext-agent | grep "1/1")
    if [[ "${podfnd}" == "" ]] ; then
      echo "[ NOK ] pod sematext-agent is not running 1/1 on node ${node}"
      docker_pod_id=$(ansible ${node} -i ${invfile} -m shell -a "docker ps -a | grep sematext-agent-docker" --vault-password-file=${ansible_vault} | grep -v SUCCESS | grep -v FAILED | grep -v non-zero | awk '{print $1}')
      if [[ "${docker_pod_id}" != "" ]] ; then
        ansible ${node} -i ${invfile} -m shell -a "docker logs ${docker_pod_id}" --vault-password-file=${ansible_vault}
      else
        echo "no 'sematext-agent deployed'"
      fi
    else
      echo "[ OK ] pod sematext-agent is running 1/1 on node ${node}"
    fi
done
echo ""
#for node in $(grep logging-es ${tmpfile}|awk '{print $7}')
#  do
#    pod_st_1=$(grep ${node} ${tmpfile} | grep logging-es| awk '{print $2}'| cut -d"/" -f1)
#    pod_st_2=$(grep ${node} ${tmpfile} | grep logging-es| awk '{print $2}'| cut -d"/" -f2)
#    if [ "${pod_st_1}" != "${pod_st_2}" ] ; then
#      echo "[ NOK ] pod logging-es is not running as desired ${pod_st_1}/${pod_st_2} on node ${node}"
#    else
#      echo "[ OK ] pod logging-es is running as desired ${pod_st_1}/${pod_st_2} on node ${node}"
#    fi
#done
#echo ""
#for pds in curator kibana
#  do
#    pod_curator_ste_1=$(grep ${pds} ${tmpfile}| awk '{print $2}'| cut -d"/" -f1)
#    pod_curator_ste_2=$(grep ${pds} ${tmpfile}| awk '{print $2}'| cut -d"/" -f1)
#    pod_curator_node=$(grep ${pds} ${tmpfile}| awk '{print $7}')
#    if [ "${pod_curator_ste_1}" != "${pod_curator_ste_2}" ] ; then
#      echo "[ NOK ] pod ${pds} is not running as desired ${pod_curator_ste_1}/${pod_curator_ste_2} on node ${pod_curator_node}"
#    else
#      echo "[ OK ] pod ${pds} is running as desired ${pod_curator_ste_1}/${pod_curator_ste_2} on node ${pod_curator_node}"
#    fi
#done
rm -f ${tmpfile}
echo ""
#
echo "===== Checking infra pods on each node, openshift-node and openshift-sdn ===="
# on each node we need to have a logging, sync, ovs and snd pod running
# the logging pods have already been tested, so one may test only
# the projects openshift-node and openshift-sdn
# we do have one pod in openshift-node but two pods per node in openshift-sdn
for prj in openshift-node openshift-sdn
  do
    echo ""
    echo "==== Checking infra project ${prj} ==="
    oc get pods -n ${prj} -o wide > ${tmpfile}
    for node in $(ansible ${first_master} -i ${invfile} -m shell -a "oc get nodes --no-headers" --vault-password-file=${ansible_vault} | awk '{print $1}')
      do
        pod_name=$(grep ${node} ${tmpfile} | awk '{print $1}')
        for pdn in ${pod_name}
          do
            pod_st_1=$(grep ${node} ${tmpfile}| grep ${pdn} | awk '{print $2}'| cut -d"/" -f1)
            pod_st_2=$(grep ${node} ${tmpfile}| grep ${pdn} | awk '{print $2}'| cut -d"/" -f2)
            if [[ "${pod_st_1}" != "${pod_st_2}" ]] ; then
              echo "[ NOK ] pod ${pdn} is not running as desired ${pod_st_1}/${pod_st_2} on node ${node}"
            else
              echo "[ OK ] pod ${pdn} is running as desired ${pod_st_1}/${pod_st_2} on node ${node}"
            fi
        done
    done
    rm -f ${tmpfile}
done
echo ""
# searching for errors on masters messages log
echo "==== Checking log errors on masters ==="
# we need to check the syslog on each master for error messages
mymonth=$(date +%b)
myday=$(date +%d)
for master_node in $(ansible ${first_master} -i ${invfile} -m shell -a "oc get nodes | grep master" --vault-password-file=${ansible_vault} | grep -v SUCCESS | awk '{print $1}')
  do
    ansible ${master_node} -i ${invfile} -m shell -a "grep error /var/log/messages | tail -n 10| grep ${mymonth} | grep ${myday} | grep -v ansible-command" --vault-password-file=${ansible_vault} > ${tmpfile}
    msg_fnd=$(grep error ${tmpfile})
    if [[ "${msg_fnd}" != "" ]] ; then
      echo "[ NOK ] there are error messages in syslog in master ${master_node}, please review"
      ansible ${master_node} -i ${invfile} -m shell -a "grep error /var/log/messages | tail -n 10| grep ${mymonth} | grep ${myday} | grep -v ansible-command" --vault-password-file=${ansible_vault}
    else
      echo "[ OK ] no problems on master node ${master_node} found"
    fi
done
echo""
echo "==== Check for recent problems in systemd service logs ==="
cmd_status=$(ansible ${first_master} -i ${invfile} -m shell -a "oc adm diagnostics analyzelogs" --vault-password-file=${ansible_vault} | grep 'Completed' | grep -v 'no errors')
if [[ "${cmd_status}" != "" ]] ; then
  echo "[ NOK ] one of the following services have a problem: ${cmd_status}"
else
  echo "[ OK ] The service 'atomic-openshift-node' and 'docker' are working fine"
fi
echo ""
# testing nfs cluster storage
echo "==== Checking cluster persistent storage ==="
# we need to check the syslog on each node for refused messages
for node in $(ansible ${first_master} -i ${invfile} -m shell -a "oc get nodes --no-headers" --vault-password-file=${ansible_vault} | grep -v SUCCESS | awk '{print $1}' | grep -v master)
  do
    ansible ${node} -i ${invfile} -m shell -a "grep refused /var/log/messages | tail -n 10| grep ${mymonth} | grep ${myday} | grep -v ansible-command | grep -v 'dial tcp'" --vault-password-file=${ansible_vault} > ${tmpfile}
    msg_fnd=$(grep refused ${tmpfile})
    if [[ "${msg_fnd}" != "" ]] ; then
      echo "[ NOK ] there are storage refused messages in syslog on node ${node}"
      echo "${msg_fnd}"
    else
      echo "[ OK ] no storage connection problems on node ${node} found"
    fi
    ansible ${node} -i ${invfile} -m shell -a "systemctl is-active rpcbind.service" --vault-password-file=${ansible_vault} > ${tmpfile}
    rpcbind_stat=$(grep -v active ${tmpfile} | grep -v SUCCESS)
    if [[ "${rpcbind_stat}" != "" ]] ; then
      echo "[ NOK ] service rpcbind is not active on node: ${node}"
      ansible ${node} -i ${invfile} -m shell -a "systemctl status rpcbind.service" --vault-password-file=${ansible_vault}
      fixable_errors=true
      rpcbind_errors=true
      echo "------------------------------------------------------------"
    else
      echo "[ OK ] service rpcbind is active and running - ${node}"
      echo "------------------------------------------------------------"
    fi
done
echo ""
#
echo "==== Checking openshift-web-console ==="
oc get pods --no-headers -n openshift-web-console -o wide | grep -vi completed > ${tmpfile}
for pde in $(cat ${tmpfile} | awk '{print $1}')
  do
    pod_st_1=$(grep ${pde} ${tmpfile}| awk '{print $2}'| cut -d"/" -f1)
    pod_st_2=$(grep ${pde} ${tmpfile}| awk '{print $2}'| cut -d"/" -f2)
    pod_node=$(grep ${pde} ${tmpfile}| awk '{print $7}')
    if [[ "${pod_st_1}" != "${pod_st_2}" ]] ; then
      echo "[ NOK ] pod ${pde} is not running as desired ${pod_st_1}/${pod_st_2} on node ${pod_node}"
    else
      echo "[ OK ] pod ${pde} is running as desired ${pod_st_1}/${pod_st_2} on node ${pod_node}"
    fi
done
echo ""
#echo "==== Create an application and test that it deploys correctly ==="
# TODO
# Step 1 - create new test namespace:
#ansible ${first_master} -i ${invfile} -m shell -a "oc create namespace ose-platform-test" --vault-password-file=${ansible_vault} | grep -v SUCCESS
# Step 2 - add new test app
#ansible ${first_master} -i ${invfile} -m shell -a "oc new-app --template=deploy-image -p DEPLOYMENT=ose-plattform-test -p IMAGE_NAME=marthaler/demoapp -p NETWORK_PORT=8080 -n ose-platform-test" --vault-password-file=${ansible_vault} | grep -v SUCCESS

## Check here if Pod is deployed correctly.

# Expose service
#ansible ${first_master} -i ${invfile} -m shell -a "oc expose svc/ose-plattform-test -n ose-platform-test" --vault-password-file=${ansible_vault} | grep -v SUCCESS

#app_route=$(ansible ${first_master} -i ${invfile} -m shell -a "oc get route ose-plattform-test --no-headers -n ose-platform-test" --vault-password-file=${ansible_vault} | grep -v SUCCESS | awk '{print $2}')

#diag_result=$(cat ${tmpfile} | grep Completed | grep "no errors")
#if [[ "${diag_result}" != "" ]] ; then
#  echo "[ OK ] application test deploy was sucessfuly"
#else
#  echo "[ NOK ] diagnositcs application test deploy problem, please review"
#  cat ${tmpfile}
#fi
#else
#  echo "sorry no internet access from cluster. - can't deploy test-image from redhat"
#fi

#echo ""
#echo "==== Check the integrated heapster metrics can be reached via the API proxy ==="
#ansible ${first_master} -i ${invfile} -m shell -a "oc adm diagnostics metricsapiproxy" --vault-password-file=${ansible_vault} > ${tmpfile} 2>&1
#diag_result=`cat ${tmpfile} | grep Completed | grep "no errors"`
#if [ "${diag_result}" != "" ] ; then
#  echo "[ OK ] diagnositcs MetricsApiProxy"
#else
#  echo "[ NOK ] diagnositcs MetricsApiProxy problem, please review"
#fi
#
#------------------------------------------------------------------------------------
#                           OPTIONAL MAINTENANCE TASKS:
#------------------------------------------------------------------------------------
echo "######################################################################################################"
echo ""
echo "===== Maintenance Tasks ===="
echo ""
echo "######################################################################################################"
echo ""
if [[ "${fixable_errors}" ]] ; then
#
  if [[ "${pod_errors}" ]] ; then
    echo "==== Delete failed and crashed pods? ==="
    echo "y/n:"
    read delete_pods_now
    if [[ "${delete_pods_now}" == "y" ]] ; then
      oc get pods --no-headers --all-namespaces | grep -v Running | grep -v Completed > ${tmpfile}
      for pod_name in $(cat ${tmpfile} | awk '{print $2}')
        do
          pod_namespace=$(grep ${pod_name} ${tmpfile}| awk '{print $1}')
          oc delete pod $pod_name -n $pod_namespace
        done
    fi
  fi
#
  if [[ "${rpcbind_errors}" ]] ; then
    echo "==== Try to restart all failed rpcbind services? ==="
    echo "y/n:"
    read restart_rpcbind_now
    if [[ "${restart_rpcbind_now}" == "y" ]] ; then
      for node in $(ansible ${first_master} -i ${invfile} -m shell -a "oc get nodes --no-headers" --vault-password-file=${ansible_vault} | grep -v SUCCESS | awk '{print $1}' | grep -v master)
        do
        rpcbind_node_status=$(ansible ${node} -i ${invfile} -m shell -a "systemctl is-active rpcbind.service" --vault-password-file=${ansible_vault} | grep -v SUCCESS)
        if [[ $rpcbind_node_status != "active" ]] ; then
          ansible ${node} -i ${invfile} -m shell -a "systemctl restart rpcbind.service && systemctl enable rpcbind.service" --vault-password-file=${ansible_vault}
          ansible ${node} -i ${invfile} -m shell -a "systemctl status rpcbind.service" --vault-password-file=${ansible_vault}
        fi
      done
    fi
  fi
#
else
  echo "notihing to do.. Everyting is all right. ;)"
fi

OpenShift Plattform Monitoring Skript

How to Use

Skript Sourcecode