Bootstrap FreeKB - OpenShift - Resolve "Prometheus has failed to evaluate rules in the last 5m"
OpenShift - Resolve "Prometheus has failed to evaluate rules in the last 5m"

Updated:   |  OpenShift articles

Prometheus and ELK (Elastic Search, Logstash, Kibana) are similar tools used to gather, display, and analyze data. This is sometimes referred to a "scraping metrics". For example, both Prometheus and ELK can be used to display data about servers, virtual machines (VMs), databases, containers (e.g. Docker, OpenShift), messaging (e.g. IBM MQ, RabbitMQ), and the list goes on.

Event "Prometheus has failed to evaluate rules in the last 5m" means that one of the Prometheus pods, which are probably in the openshift-monitoring namespace, could not evaluate one of it's rules. The oc get pods command can be used to list the Prometheus pods in the openshift-monitoring namespace.

~]$ oc get pods --namespace openshift-monitoring | grep -i prometheus
prometheus-adapter-8559d6b5fb-42mng            1/1     Running   0          13h
prometheus-adapter-8559d6b5fb-ppcxf            1/1     Running   0          13h
prometheus-k8s-0                               6/6     Running   1          67d
prometheus-k8s-1                               6/6     Running   1          67d
prometheus-operator-5956c5d77-84qzq            2/2     Running   0          68d

 

Use the oc logs command to look for interesting events with the Prometheus pods.

~]$ oc logs pod/prometheus-k8s-0 --namespace openshift-monitoring --container prometheus

 

Here is one such interesting event. Notice the event has "timed out".  This may be related to this Red Hat Bugzilla, which indicates the rule has been removed starting with version 4.6.9 of OpenShift.

~]$ oc logs pod/prometheus-k8s-0 --namespace openshift-monitoring
level=warn ts=2021-09-14T09:20:29.235Z caller=manager.go:598 component="rule manager" group=kube-apiserver-availability.rules msg="Evaluating rule failed" rule="record: apiserver_request:availability30d\nexpr: 1 - ((sum(increase(apiserver_request_duration_seconds_count{verb=~\"POST|PUT|PATCH|DELETE\"}[30d])) - sum(increase(apiserver_request_duration_seconds_bucket{le=\"1\",verb=~\"POST|PUT|PATCH|DELETE\"}[30d]))) + (sum(increase(apiserver_request_duration_seconds_count{verb=~\"LIST|GET\"}[30d])) - ((sum(increase(apiserver_request_duration_seconds_bucket{le=\"0.1\",scope=~\"resource|\",verb=~\"LIST|GET\"}[30d])) or vector(0)) + sum(increase(apiserver_request_duration_seconds_bucket{le=\"0.5\",scope=\"namespace\",verb=~\"LIST|GET\"}[30d])) + sum(increase(apiserver_request_duration_seconds_bucket{le=\"5\",scope=\"cluster\",verb=~\"LIST|GET\"}[30d])))) + sum(code:apiserver_request_total:increase30d{code=~\"5..\"} or vector(0))) / sum(code:apiserver_request_total:increase30d)\nlabels:\n verb: all\n" err="query timed out in expression evaluation"

 

The oc version command can be used to display the version of OpenShift. If OpenShift is below version 4.6.9, the event can be ignored, since the alert has been removed in version 4.6.9 of OpenShift.

~]$ oc version
Client Version: 4.5.6
Server Version: 4.6.8
Kubernetes Version: v1.19.0+7070803

 




Did you find this article helpful?

If so, consider buying me a coffee over at Buy Me A Coffee



Comments


Add a Comment


Please enter 80f916 in the box below so that we can be sure you are a human.