Bootstrap

OpenShift - Resolve "Kubelet stopped posting node status"

by Jeremy Canfield | Updated: April 15 2024 | OpenShift articles

Let's say the oc describe node command contains the following events.

Conditions:
  Type            Status   LastHeartbeatTime                 LastTransitionTime               Reason             Message
  MemoryPressure  Unknown  Wed, 11 Nov 2020 20:47:37 -0600   Wed, 11 Nov 2020 20:50:40 -0600  NodeStatusUnknown  Kubelet stopped posting node status
  DiskPressure    Unknown  Wed, 11 Nov 2020 20:47:37 -0600   Wed, 11 Nov 2020 20:50:40 -0600  NodeStatusUnknown  Kubelet stopped posting node status
  PIDPressure     Unknown  Wed, 11 Nov 2020 20:47:37 -0600   Wed, 11 Nov 2020 20:50:40 -0600  NodeStatusUnknown  Kubelet stopped posting node status
  Ready           Unknown  Wed, 11 Nov 2020 20:47:37 -0600   Wed, 11 Nov 2020 20:50:40 -0600  NodeStatusUnknown  Kubelet stopped posting node status

Use the oc get nodes -o wide or oc describe node command to determine the IP address of the node.

oc get nodes --output wide
. . .
NAME         STATUS    ROLES     AGE       VERSION          INTERNAL-IP    EXTERNAL-IP  OS-IMAGE KERNEL VERSION        CONTAINER-RUNTIME
node001    Ready     infra     273d      v1.11.0+d4cacc0    10.141.115.11  <none>       Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64  docker://1.13.1
node002    NotReady  infra     273d      v1.11.0+d4cacc0    10.141.115.12  <none>       Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64  docker://1.13.1
node003    Ready     infra     273d      v1.11.0+d4cacc0    10.141.115.13  <none>       Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64  docker://1.13.1
node004    Ready     compute   273d      v1.11.0+d4cacc0    10.141.115.14  <none>       Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64  docker://1.13.1
node005    Ready     compute   273d      v1.11.0+d4cacc0    10.141.115.15  <none>       Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64  docker://1.13.1
node006    Ready     master    273d      v1.11.0+d4cacc0    10.141.115.16  <none>       Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64  docker://1.13.1
node007    Ready     master    273d      v1.11.0+d4cacc0    10.141.115.17  <none>       Red Hat Enterprise Linux 3.10.0-1127.8.2.el7.x86_64  docker://1.13.1

You can start one of the nodes in debug mode.

~]# oc debug node/my-node-5n4fj
Starting pod/my-node-5n4fj-debug ...
sh-4.4#

Typically you will first issue the chroot /host command is used to set /host as the root directory because the root file system is mounted to /host in the debug pod.

sh-4.4# chroot /host

systemctl can be used to determine if the kubelet service is running.

sh-5.1# systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
     Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; preset: disabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─01-kubens.conf, 10-mco-default-madv.conf, 20-aws-node-name.conf, 20-aws-providerid.conf, 20-logging.conf
     Active: active (running) since Tue 2024-04-23 04:14:54 UTC; 1 day 1h ago

Here is a one liner that I use to loop through each node and to return the status of the kubelet service.

for node in $(oc get nodes | grep -v ^NAME | awk '{print $1}'); do echo $node; oc debug node/$node -- chroot /host /usr/bin/systemctl status kubelet | grep Active:; done;

Which should return something like this.

infra-node-1
     Active: active (running) since Thu 2024-04-04 16:31:39 UTC; 2 weeks 5 days ago
infra-node-2
     Active: active (running) since Thu 2024-04-04 16:18:48 UTC; 2 weeks 5 days ago
infra-node-3
     Active: active (running) since Thu 2024-04-04 16:26:03 UTC; 2 weeks 5 days ago

master-node-1
     Active: active (running) since Thu 2024-04-04 16:34:01 UTC; 2 weeks 5 days ago
master-node-2
     Active: active (running) since Thu 2024-04-04 16:21:05 UTC; 2 weeks 5 days ago
master-node-3
     Active: active (running) since Thu 2024-04-04 16:27:32 UTC; 2 weeks 5 days ago

worker-node-1
     Active: active (running) since Tue 2024-04-04 16:34:01 UTC; 1 day 1h ago
worker-node-2
     Active: active (running) since Tue 2024-04-04 16:21:05 UTC; 1 day 1h ago
worker-node-3
     Active: active (running) since Tue 2024-04-04 16:27:32 UTC; 1 day 2h ago

This one liner can be used to restart the kubelet service in each node, if you want to try restarting the kubelet service to see if this resolves the issue.

for node in $(oc get nodes | grep -v ^NAME | awk '{print $1}'); do echo $node; oc debug node/$node -- chroot /host /usr/bin/systemctl restart kubelet; done;

Use the journalctl command to check the journal for any kubelet events at log levels emerg, alert, crit, warning, and notice. There are usually no results at these log levels, but many results at log level info.

journalctl -p emerg | grep -i kubelet
journalctl -p alert | grep -i kubelet
journalctl -p crit | grep -i kubelet
journalctl -p warning | grep -i kubelet
journalctl -p notice | grep -i kubelet

Did you find this article helpful?

If so, consider buying me a coffee over at

Did you find this article helpful?

Comments

Add a Comment