Bootstrap

OpenShift - Resolve node status NotReady

by Jeremy Canfield | Updated: December 23 2020 | OpenShift articles

In this example, the oc get nodes command will return the list of nodes, and one of the nodes has a status of NotReady.

oc get nodes -o wide
. . .
NAME         STATUS    ROLES     AGE       VERSION          INTERNAL-IP    EXTERNAL-IP  OS-IMAGE KERNEL VERSION        CONTAINER-RUNTIME
node001    Ready     infra     273d      v1.11.0+d4cacc0    10.141.115.11  <none>       Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64  docker://1.13.1
node002    NotReady  infra     273d      v1.11.0+d4cacc0    10.141.115.12  <none>       Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64  docker://1.13.1
node003    Ready     infra     273d      v1.11.0+d4cacc0    10.141.115.13  <none>       Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64  docker://1.13.1
node004    Ready     compute   273d      v1.11.0+d4cacc0    10.141.115.14  <none>       Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64  docker://1.13.1
node005    Ready     compute   273d      v1.11.0+d4cacc0    10.141.115.15  <none>       Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64  docker://1.13.1
node006    Ready     master    273d      v1.11.0+d4cacc0    10.141.115.16  <none>       Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64  docker://1.13.1
node007    Ready     master    273d      v1.11.0+d4cacc0    10.141.115.17  <none>       Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64  docker://1.13.1

The oc describe node command can be used to see if the node contains events, which could return something like this.

Events:
  Type    Reason                   Age                  From                          Message
  ----    ------                   ----                 ----                          -------
  Normal  Starting                 112s                 kubelet, node001  Starting kubelet.
  Normal  NodeHasSufficientMemory  112s (x2 over 112s)  kubelet, node001  Node node001 status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    112s (x2 over 112s)  kubelet, node001  Node node001 status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     112s (x2 over 112s)  kubelet, node001  Node node001 status is now: NodeHasSufficientPID
  Normal  NodeNotReady             112s                 kubelet, node001  Node node001 status is now: NodeNotReady
  Normal  NodeAllocatableEnforced  110s                 kubelet, node001  Updated Node Allocatable limit across pods
  Normal  NodeReady                110s                 kubelet, node001  Node node001 status is now: NodeReady

If conditions contains Kubelet stopped posting node status, refer to OpenShift - Resolve "Kubelet stopped posting node status".

Conditions:
  Type            Status   LastHeartbeatTime                 LastTransitionTime               Reason             Message
  MemoryPressure  Unknown  Wed, 11 Nov 2020 20:47:37 -0600   Wed, 11 Nov 2020 20:50:40 -0600  NodeStatusUnknown  Kubelet stopped posting node status
  DiskPressure    Unknown  Wed, 11 Nov 2020 20:47:37 -0600   Wed, 11 Nov 2020 20:50:40 -0600  NodeStatusUnknown  Kubelet stopped posting node status
  PIDPressure     Unknown  Wed, 11 Nov 2020 20:47:37 -0600   Wed, 11 Nov 2020 20:50:40 -0600  NodeStatusUnknown  Kubelet stopped posting node status
  Ready           Unknown  Wed, 11 Nov 2020 20:47:37 -0600   Wed, 11 Nov 2020 20:50:40 -0600  NodeStatusUnknown  Kubelet stopped posting node status

From the OpenShift jump box, using the ssh command, try to SSH onto the node that has a status of NotReady. If you are not able to SSH onto the node, you will likely want to restart the node (meaning that you would restart the Operating System).

If you are able to SSH onto the node, use the df command to determine if any directories have run out of space.

Filesystem                            Size  Used Avail Use% Mounted on
/dev/sda1                             364M  167M  174M  50% /boot
/dev/sda2                             127M  6.9M  120M   6% /opt

Use the free command to determine if the VM has run out of memory.

~]# free -h
             total      used    free   shared  buff/cache  available
Mem:           19G       19G    510M      16K        3.6G       8.0G
Swap:         2.0G      380M    1.6G

Use the top command to determine if there is a process using the majority of the CPU.

After the issue is resolved, and the node is Ready, you'll want to use the oc get pods command to ensure all of the pods in the node are running.

~]# oc get pods --all-namespaces -o wide --field-selector spec.nodeName=node001
NAMESPACE                          NAME                                     READY   STATUS    RESTARTS   AGE
project001                         pod001                           1/1     Running   0          84d
project001                         pod002                           1/1     Running   0          84d
project002                         pod001                           1/1     Running   0          84d

Did you find this article helpful?

If so, consider buying me a coffee over at

Did you find this article helpful?

Comments

Add a Comment