Let's say a Thanos pod, which is typically found in the openshift-monitoring namespace, is returning ThanosQueryHttpRequestQueryRangeErrorRateHigh.
The oc describe command can be used to view the events of the Thanos pods. Thanos pod events may also have Readiness and Liveness probe failed.
~]$ oc describe pod/thanos-querier-549f6dc744-7xxlp --namespace openshift-monitoring Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Unhealthy 16m (x217 over 53d) kubelet Readiness probe failed: command timed out Warning Unhealthy 15m (x189 over 53d) kubelet Liveness probe failed: command timed out
The oc get pods command may show that the Thanos pods may have a number of restarts.
~]$ oc get pods --namespace openshift-monitoring NAME READY STATUS RESTARTS AGE thanos-querier-549f6dc744-7xxlp 5/5 Running 15 53d thanos-querier-549f6dc744-dt2ld 5/5 Running 11 53d
The oc version command can be used to display the client, server and Kubernetes version. After we opened a case with Red Hat on this, we were told this is a known bug that should be fixed in version 4.8.2, thus if you are below version 4.8.2, this may be a bug you could observe.
~]$ oc version Client Version: 4.6.8 Server Version: 4.6.8 Kubernetes Version: v1.19.0+7070803