Bootstrap FreeKB - OpenShift - Reload Prometheus Configurations
OpenShift - Reload Prometheus Configurations

Updated:   |  OpenShift articles

Prometheus and ELK (Elastic Search, Logstash, Kibana) are similar tools used to gather, display, and analyze data. This is sometimes referred to a "scraping metrics". For example, both Prometheus and ELK can be used to display data about servers, virtual machines (VMs), databases, containers (e.g. Docker, OpenShift), messaging (e.g. IBM MQ, RabbitMQ), and the list goes on.

Often, Prometheus or ELK send alerts to some other system, such as Alert Manager and then Alert Manager would route alerts to certain targets, such as an SMTP email server or OpsGenie.

The oc get PrometheusRules command can be used to list the Prometheus Rules in the openshift-monitoring namespace.

~]$ oc get PrometheusRules --namespace openshift-monitoring
NAME                                           AGE
alertmanager-main-rules                        692d
cluster-monitoring-operator-prometheus-rules   692d
kube-state-metrics-rules                       692d
kubernetes-monitoring-rules                    692d
node-exporter-rules                            692d
prometheus-k8s-prometheus-rules                692d
prometheus-k8s-thanos-sidecar-rules            692d
prometheus-operator-rules                      692d
telemetry                                      692d
thanos-querier                                 692d

 

For example, let's say you want to make a change to one of the Alert Manager rules. Let's redirect the alertmanager-main-rules to a YAML file.

oc get PrometheusRule alertmanager-main-rules --namespace openshift-monitoring --output yaml > alertmanager-main-rules.yaml

 

And then let's make some change to the YAML file. For example, perhaps updating AlertmanagerFailedToSendAlerts from 5m (five minutes) to 10m (ten minutes).

    - alert: AlertmanagerFailedToSendAlerts
      annotations:
        description: Alertmanager {{ $labels.namespace }}/{{ $labels.pod}} failed
          to send {{ $value | humanizePercentage }} of notifications to {{ $labels.integration
          }}.
        runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/AlertmanagerFailedToSendAlerts.md
        summary: An Alertmanager instance failed to send notifications.
      expr: |
        (
          rate(alertmanager_notifications_failed_total{job=~"alertmanager-main|alertmanager-user-workload"}[10m])
        /
          ignoring (reason) group_left rate(alertmanager_notifications_total{job=~"alertmanager-main|alertmanager-user-workload"}[10m])
        )
        > 0.01
      for: 10m
      labels:
        severity: warning

 

Let's apply the updated YAML to update the alertmanager-main-rules using the updated YAML.

oc apply --filename alertmanager-main-rules.yaml

 

This should cause the prometheus-k8s-rulefiles-0 configmap to get updated to have 10m (ten minutes) for AlertmanagerFailedToSendAlerts.

~]$ oc get configmap prometheus-k8s-rulefiles-0 --namespace openshift-monitoring --output yaml | grep AlertmanagerFailedToSendAlerts -A 11
      - alert: AlertmanagerFailedToSendAlerts
        annotations:
          description: Alertmanager {{ $labels.namespace }}/{{ $labels.pod}} failed to
            send {{ $value | humanizePercentage }} of notifications to {{ $labels.integration
            }}.
          runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/AlertmanagerFailedToSendAlerts.md
          summary: An Alertmanager instance failed to send notifications.
        expr: |
          (
            rate(alertmanager_notifications_failed_total{job=~"alertmanager-main|alertmanager-user-workload"}[10m])
          /
            ignoring (reason) group_left rate(alertmanager_notifications_total{job=~"alertmanager-main|alertmanager-user-workload"}[10m])
          )
          > 0.01
        for: 10m
        labels:
          severity: warning

 

The oc get pods command can be used to list the Prometheus pods, by default in the openshift-monitoring namespace.

~]$ oc get pods --namespace openshift-monitoring
NAME                                                     READY   STATUS    RESTARTS        AGE
prometheus-adapter-6b98c646c7-m4g76                      1/1     Running   0               8d
prometheus-adapter-6b98c646c7-tczr2                      1/1     Running   0               8d
prometheus-k8s-0                                         6/6     Running   0               11d
prometheus-k8s-1                                         6/6     Running   0               11d
prometheus-operator-6766f68555-mkfv9                     2/2     Running   0               11d
prometheus-operator-admission-webhook-8589888cbc-mq2jx   1/1     Running   0               11d
prometheus-operator-admission-webhook-8589888cbc-t62mt   1/1     Running   0               11d

 

There should be a directory /etc/prometheus/rules/prometheus-k8s-rulefiles-0 in the prometheus pod.

~]$ oc exec pod/prometheus-k8s-0 --container prometheus --namespace openshift-monitoring -- ls -l /etc/prometheus/rules/
total 12
drwxrwsrwx. 3 root nobody 8192 May 12 20:30 prometheus-k8s-rulefiles-0

 

And there should be a YAML file in the pod that contains AlertmanagerFailedToSendAlerts.

~]$ oc exec pod/prometheus-k8s-0 --container prometheus --namespace openshift-monitoring -- cat /etc/prometheus/rules/prometheus-k8s-rulefiles-0/openshift-monitoring-alertmanager-main-rules-1b98ab31-7439-4f52-9f48-c04a696979c3.yaml
- name: alertmanager.rules
  rules:
  - alert: AlertmanagerFailedToSendAlerts
    annotations:
      description: Alertmanager {{ $labels.namespace }}/{{ $labels.pod}} failed to
        send {{ $value | humanizePercentage }} of notifications to {{ $labels.integration
        }}.
      runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/AlertmanagerFailedToSendAlerts.md
      summary: An Alertmanager instance failed to send notifications.
    expr: |
      (
        rate(alertmanager_notifications_failed_total{job=~"alertmanager-main|alertmanager-user-workload"}[5m])
      /
        ignoring (reason) group_left rate(alertmanager_notifications_total{job=~"alertmanager-main|alertmanager-user-workload"}[5m])
      )
      > 0.01
    for: 5m
    labels:
      severity: warning

 

The oc exec and curl commands can be used to issue a POST request inside of your Prometheus pod to the /-/reload to reload Prometheus configurations.

oc exec prometheus-k8s-0 --container prometheus --namespace openshift-monitoring -- curl --request POST --url http://localhost:9090/-/reload

 




Did you find this article helpful?

If so, consider buying me a coffee over at Buy Me A Coffee



Comments


Add a Comment


Please enter 2c500d in the box below so that we can be sure you are a human.