
In this example, the oc get nodes command will return the list of nodes, and one of the nodes has a status of NotReady.
oc get nodes -o wide
. . .
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL VERSION CONTAINER-RUNTIME
node001 Ready infra 273d v1.11.0+d4cacc0 10.141.115.11 <none> Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64 docker://1.13.1
node002 NotReady infra 273d v1.11.0+d4cacc0 10.141.115.12 <none> Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64 docker://1.13.1
node003 Ready infra 273d v1.11.0+d4cacc0 10.141.115.13 <none> Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64 docker://1.13.1
node004 Ready compute 273d v1.11.0+d4cacc0 10.141.115.14 <none> Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64 docker://1.13.1
node005 Ready compute 273d v1.11.0+d4cacc0 10.141.115.15 <none> Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64 docker://1.13.1
node006 Ready master 273d v1.11.0+d4cacc0 10.141.115.16 <none> Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64 docker://1.13.1
node007 Ready master 273d v1.11.0+d4cacc0 10.141.115.17 <none> Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64 docker://1.13.1
The oc describe node command can be used to see if the node contains events, which could return something like this.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 112s kubelet, node001 Starting kubelet.
Normal NodeHasSufficientMemory 112s (x2 over 112s) kubelet, node001 Node node001 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 112s (x2 over 112s) kubelet, node001 Node node001 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 112s (x2 over 112s) kubelet, node001 Node node001 status is now: NodeHasSufficientPID
Normal NodeNotReady 112s kubelet, node001 Node node001 status is now: NodeNotReady
Normal NodeAllocatableEnforced 110s kubelet, node001 Updated Node Allocatable limit across pods
Normal NodeReady 110s kubelet, node001 Node node001 status is now: NodeReady
If conditions contains Kubelet stopped posting node status, refer to OpenShift - Resolve "Kubelet stopped posting node status".
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
MemoryPressure Unknown Wed, 11 Nov 2020 20:47:37 -0600 Wed, 11 Nov 2020 20:50:40 -0600 NodeStatusUnknown Kubelet stopped posting node status
DiskPressure Unknown Wed, 11 Nov 2020 20:47:37 -0600 Wed, 11 Nov 2020 20:50:40 -0600 NodeStatusUnknown Kubelet stopped posting node status
PIDPressure Unknown Wed, 11 Nov 2020 20:47:37 -0600 Wed, 11 Nov 2020 20:50:40 -0600 NodeStatusUnknown Kubelet stopped posting node status
Ready Unknown Wed, 11 Nov 2020 20:47:37 -0600 Wed, 11 Nov 2020 20:50:40 -0600 NodeStatusUnknown Kubelet stopped posting node status
From the OpenShift jump box, using the ssh command, try to SSH onto the node that has a status of NotReady. If you are not able to SSH onto the node, you will likely want to restart the node (meaning that you would restart the Operating System).
If you are able to SSH onto the node, use the df command to determine if any directories have run out of space.
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 364M 167M 174M 50% /boot
/dev/sda2 127M 6.9M 120M 6% /opt
Use the free command to determine if the VM has run out of memory.
~]# free -h
total used free shared buff/cache available
Mem: 19G 19G 510M 16K 3.6G 8.0G
Swap: 2.0G 380M 1.6G
Use the top command to determine if there is a process using the majority of the CPU.
After the issue is resolved, and the node is Ready, you'll want to use the oc get pods command to ensure all of the pods in the node are running.
~]# oc get pods --all-namespaces -o wide --field-selector spec.nodeName=node001
NAMESPACE NAME READY STATUS RESTARTS AGE
project001 pod001 1/1 Running 0 84d
project001 pod002 1/1 Running 0 84d
project002 pod001 1/1 Running 0 84d
Did you find this article helpful?
If so, consider buying me a coffee over at