Bootstrap

OpenShift - Create Alert Manager alerts in OpsGenie

by Jeremy Canfield | Updated: February 20 2025 | OpenShift articles

At a high level, the alerts are created like this: Prometheus > Alert Manager > OpsGenie

Alert Manager, as the name implies, does things like receive alerts from a client (such as Prometheus) and routing alerts to certain targets, such as an SMTP email server or OpsGenie. Alert Manager also has some features such as being able to silence alerts for a period of time.

There should be a secret in the Alert Manager stateful set named alertmanager-main-generated.

~]$ oc get statefulset alertmanager-main --namespace openshift-monitoring --output yaml
spec:
  template:
    spec:
      volumes:
      - name: config-volume
        secret:
          defaultMode: 420
          secretName: alertmanager-main-generated

And the alertmanager-main-generated secret should contain something like this. Notice the secret contains the alertmanager.yaml.gz file.

~]$ oc get secret alertmanager-main-generated --namespace openshift-monitoring --output yaml
apiVersion: v1
data:
  alertmanager.yaml.gz: H4sIAAAAAAAA/6xSwY7TMBC95yvmjJTuFlpt5RN3JK4cEEQTZ5xa2BkzHqf071GSTTdVkeDAzTN+8+b5+fWBWwymAhDKHEZq1EfiogaOsQLIUVPjhKMBDCQaccCe5KOexY806M7yDZYjip45q5nr/rLbosz74woU+lm8UKMhG3AYMlUAnHJPg6cGk2+KBANn1ZTN0xMmv1tvX/fdgX/Q1cAztgdH7Uu9x87Wh6Nrazx0WHdkD+7lhO502lfCRWl5rCU/khjoyGEJWgH0wiU17XW6r8GGkpVkPg8YKSe0NFeJu4lhosoL9o3tgmrPHfcVAECcCjMfYXFvIjLw5Q20nbXi1VsMj7OZRhKv1zvMdtQPjh/HHKEWIbO+BYtytpOQv25uhP4g/Pun0tJn7mj37p8EzI4amCLhh762JSvHeibMN8Mv6NewLQ0/KMmIwcB+7golQt20PzzHal2dTbX8z00DRfShsTw437/+j7KBln/NDAPpIsCxYKT7IK9UG0P+B92asbW+pUQppoBTjuDrt+p3AAAA//8S4NTEjAMAAA==
kind: Secret
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"alertmanager.yaml.gz":"H4sICDuirWcAA2FsZXJ0bWFuYWdlci55YW1sAK1TPU/DQAzd+RW3VUJKodCKKhM7EisbkXPxpSfuI/iclv57fElTAmVBIlP8/Pz87DitizW48kopwhTdHiu2HmPPpdp4QZPnrkoeiHcxCbjIQHtY8o7sHgMvdfTl3WYxUQ1JrBbgkNhDgBbpcc49Ewnfe0vSzqVSGXAJJRO71GKwWEFnqzc8itIt1GuD9UOxgkYX642pC1g3UDSo1+ZhC2a7XS1+lvbkpHTH3KXy5kaQ5ZQePRBqFEuU8uCFCuCxVDaYKKFS6MG6SsdgbDswMoejKNbxwwZGCsjDhMlEktqLCSdJTZatBvfPsg0a6B3PkAOw3jWxvSL5dJjFW3nrqvo4NdKuT9LiFOWq1IHGU9zF5lwzWNnLUajVcAIjegA73cS0vbkRwg6BZ7X3twM12zkP67PLMcjPMOvo/2XyP2a+OhzmiQuFhEKyfPyx6bnAt8SFgBHTPeUvNa4Heo5JZ2MXQufzOInIBf86yutTX+NzbHB5/Uczw6JLlX80G9pCi6Poi/EifrfzCRxqUwa+AwAA"},"kind":"Secret","metadata":{"annotations":{},"creationTimestamp":"2025-02-13T07:24:51Z","labels":{"managed-by":"prometheus-operator"},"name":"alertmanager-main-generated","namespace":"openshift-monitoring","ownerReferences":[{"apiVersion":"monitoring.coreos.com/v1","blockOwnerDeletion":true,"controller":true,"kind":"Alertmanager","name":"main","uid":"e6edf69d-2220-44f7-b60d-83ec2d4c83ae"}],"resourceVersion":"448712910","uid":"b9a5ca4c-518f-4868-bb4b-f0d60a4adc0d"},"type":"Opaque"}
  creationTimestamp: "2025-02-13T07:24:51Z"
  labels:
    managed-by: prometheus-operator
  name: alertmanager-main-generated
  namespace: openshift-monitoring
  ownerReferences:
  - apiVersion: monitoring.coreos.com/v1
    blockOwnerDeletion: true
    controller: true
    kind: Alertmanager
    name: main
    uid: e6edf69d-2220-44f7-b60d-83ec2d4c83ae
  resourceVersion: "448713417"
  uid: b9a5ca4c-518f-4868-bb4b-f0d60a4adc0d
type: Opaque

For example, let’s say alertmanager.yaml has the following and you want to change priority from P3 to P1.

global:
  resolve_timeout: 5m
  opsgenie_api_key: '3756622d-bd21-4c3c-9127-e92f11007b3a'
  opsgenie_api_url: 'https://api.opsgenie.com'
receivers:
  - name: info
    email_configs:
    - to: 'no-reply@example.com'
  - name: critical
    opsgenie_con
    figs:
    - message: "{{ range .Alerts }}{{ .labels.alertname }}{{ end }}"
      description: "{{ range .Alerts }}{{ .labels.severity }} - {{ .annotations.description }}{{ end }}
      priority: "P3"
      http_config:
        proxy_url: "http://proxy.example.com"
  - name: default
  - name: watchdog
route:
  group_by:
    - cluster
    - namespace
    - pod
  group_interval: 15m
  group_wait: 5m
  receiver: default
  repeat_interval: 30m
  routes:
    - match:
        alertname: Watchdog
      receiver: watchdog
    - match:
       severity: critical
      receiver: critical

    - match_re:
        alertname: ^KubeNode.*
      receiver: critical

Notice the above YAML has this.

- message: "{{ range .Alerts }}{{ .labels.alertname }}{{ end }}"
  description: "{{ range .Alerts }}{{ .labels.severity }} - {{ .annotations.description }}{{ end }}"

The amtool in one of the alertmanager pods in the openshift-monitoring namespace in on-prem OpenShift can be used to display the current active alerts, in JSON.

oc exec pod/alertmanager-main-0 \
--namespace openshift-monitoring -- \
amtool --alertmanager.url="http://localhost:9093" alert --output json | jq

The amtool command should return something like this. Notice one of the JSON keys is annotations.description. This is why alertmanager.yaml has .annotations.description.

[
  {
    "annotations": {
      "description": "Insights recommendation \"The user workloads will be scheduled on infra nodes when the infrastructure nodes are not configured for taints with the \"NoSchedule\" effect\" with total risk \"Low\" was detected on the cluster. More information is available at https://console.redhat.com/openshift/insights/advisor/clusters/da49be12-f6de-42ab-9f34-ed5e9ab5e17a?first=ccx_rules_ocp.external.rules.check_infra_nodes_configurations_taints%7CINFRA_NODES_NOT_CONFIGURE_TAINTS.",
      "summary": "An Insights recommendation is active for this cluster."
    },
    "endsAt": "2025-02-13T10:02:40.965Z",
    "fingerprint": "363207909f3e8d5d",
    "receivers": [
      {
        "name": "default"
      }
    ],
    "startsAt": "2025-02-10T15:54:10.965Z",
    "status": {
      "inhibitedBy": [],
      "silencedBy": [],
      "state": "active"
    },
    "updatedAt": "2025-02-13T09:58:40.970Z",
    "generatorURL": "https://console-openshift-console.apps.lab.op.example.com/monitoring/graph?g0.expr=insights_recommendation_active+%3D%3D+1&g0.tab=1",
    "labels": {
      "alertname": "InsightsRecommendationActive",
      "container": "insights-operator",
      "description": "The user workloads will be scheduled on infra nodes when the infrastructure nodes are not configured for taints with the \"NoSchedule\" effect",
      "endpoint": "https",
      "info_link": "https://console.redhat.com/openshift/insights/advisor/clusters/da49be12-f6de-42ab-9f34-ed5e9ab5e17a?first=ccx_rules_ocp.external.rules.check_infra_nodes_configurations_taints%7CINFRA_NODES_NOT_CONFIGURE_TAINTS",
      "instance": "10.129.0.29:8443",
      "job": "metrics",
      "namespace": "openshift-insights",
      "openshiftCluster": "lab.op.example.com",
      "openshift_io_alert_source": "platform",
      "pod": "insights-operator-5cdb7ccdd4-nlrn6",
      "prometheus": "openshift-monitoring/k8s",
      "service": "metrics",
      "severity": "info",
      "total_risk": "Low"
    }
  }
]

Let's say you update priority from P3 to P1 in alertmanager.yaml.

global:
  resolve_timeout: 5m
  opsgenie_api_key: '3756622d-bd21-4c3c-9127-e92f11007b3a'
  opsgenie_api_url: 'https://api.opsgenie.com'
receivers:
  - name: info
    email_configs:
    - to: 'no-reply@example.com'
  - name: critical
    opsgenie_con
    figs:
    - message: "{{ range .Alerts }}{{ .labels.alertname }}{{ end }}"
      description: "{{ range .Alerts }}{{ .labels.severity }} - {{ .annotations.description }}{{ end }}
      priority: "P1"
      http_config:
        proxy_url: "http://proxy.example.com"
  - name: default
  - name: watchdog
route:
  group_by:
    - cluster
    - namespace
    - pod
  group_interval: 15m
  group_wait: 5m
  receiver: default
  repeat_interval: 30m
  routes:
    - match:
        alertname: Watchdog
      receiver: watchdog
    - match:
       severity: critical
      receiver: critical

    - match_re:
        alertname: ^KubeNode.*
      receiver: critical

Let’s get the base64 encoded string of alertmanager.yml.

~]$ cat alertmanager.yaml | base64 | sed ':label; N; $! b label; s|\n||g'
Z2xvYmFsOgogIHJlc29sdmVfdGltZW91dDogNW0KICBvcHNnZW5pZV9hcGlfa2V5OiAnMzc1NjYyMmQtYmQyMS00YzNjLTkxMjctZTkyZjExMDA3YjNhJwogIG9wc2dlbmllX2FwaV91cmw6ICdodHRwczovL2FwaS5vcHNnZW5pZS5jb20nCnJlY2VpdmVyczoKICAtIG5hbWU6IGluZm8KICAgIGVtYWlsX2NvbmZpZ3M6CiAgICAtIHRvOiAnbm8tcmVwbHlAZXhhbXBsZS5jb20nCiAgLSBuYW1lOiBjcml0aWNhbAogICAgb3BzZ2VuaWVfY29uCiAgICBmaWdzOgogICAgLSBtZXNzYWdlOiAie3sgcmFuZ2UgLkFsZXJ0cyB9fXt7IC5sYWJlbHMuYWxlcnRuYW1lIH19e3sgZW5kIH19IgogICAgICBkZXNjcmlwdGlvbjogInt7IHJhbmdlIC5BbGVydHMgfX17eyAubGFiZWxzLnNldmVyaXR5IH19IC0ge3sgLmFubm90YXRpb25zLmRlc2NyaXB0aW9uIH19e3sgZW5kIH19CiAgICAgIHByaW9yaXR5OiAiUDMiCiAgICAgIGh0dHBfY29uZmlnOgogICAgICAgIHByb3h5X3VybDogImh0dHA6Ly9wcm94eS5leGFtcGxlLmNvbSIKICAtIG5hbWU6IGRlZmF1bHQKICAtIG5hbWU6IHdhdGNoZG9nCnJvdXRlOgogIGdyb3VwX2J5OgogICAgLSBjbHVzdGVyCiAgICAtIG5hbWVzcGFjZQogICAgLSBwb2QKICBncm91cF9pbnRlcnZhbDogMTVtCiAgZ3JvdXBfd2FpdDogNW0KICByZWNlaXZlcjogZGVmYXVsdAogIHJlcGVhdF9pbnRlcnZhbDogMzBtCiAgcm91dGVzOgogICAgLSBtYXRjaDoKICAgICAgICBhbGVydG5hbWU6IFdhdGNoZG9nCiAgICAgIHJlY2VpdmVyOiB3YXRjaGRvZwogICAgLSBtYXRjaDoKICAgICAgIHNldmVyaXR5OiBjcml0aWNhbAogICAgICByZWNlaXZlcjogY3JpdGljYWwKCiAgICAtIG1hdGNoX3JlOgogICAgICAgIGFsZXJ0bmFtZTogXkt1YmVOb2RlLioKICAgICAgcmVjZWl2ZXI6IGNyaXRpY2FsCg==

Let’s get the current YAML of the alertmanager-main secret.

oc get secret alertmanager-main --namespace openshift-monitoring --output yaml > alertmanager-main.yml

Let’s update alertmanager-main.yml with the base64 string.

vim alertmanager-main.yml

Let’s update the alertmanager-main secret.

oc apply -f alertmanager-main.yml

Now the alertmanager.yaml file in the alertmanager-main secret contain whatever change we made to alertmanager.yaml which is priority P1 in this example.

oc get secret alertmanager-main --namespace openshift-monitoring --output jsonpath="{.data.alertmanager\.yaml}" | base64 --decode

But the /etc/alertmanager/config_out/alertmanager.env.yaml file in the Alert Manager pod may not be immediately updated because the pod uses the alertmanager-main-generated secret.

oc exec pod/alertmanager-main-0 --namespace openshift-monitoring -- cat /etc/alertmanager/config_out/alertmanager.env.yaml

Typically, you just need to wait a few minutes for Alert Manager to do it's think. Alert Manager should gzip compress the new alertmanager.yaml file, update the alertmanager-main-generated secret to contain the gzip compressed new alertmanager.yaml file, and then /etc/alertmanager/config_out/alertmanager.env.yaml in the Alert Manager pod should contain the new alertmanager.yaml file. After waiting a few minutes, the /etc/alertmanager/config_out/alertmanager.env.yaml file in the Alert Manager pod should have the new alertmanager.yaml.

oc exec pod/alertmanager-main-0 --namespace openshift-monitoring -- cat /etc/alertmanager/config_out/alertmanager.env.yaml

Take a look at the pod logs and ensure there are no errors.

oc logs pod/alertmanager-main-0 --namespace openshift-monitoring

You'll want to see events like this.

ts=2025-03-07T04:12:32.199Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml
ts=2025-03-07T04:12:32.199Z caller=coordinator.go:126 level=info component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml

Now when there are critical errors (in this example) Prometheus Alert Manager should create the alert in OpsGenie. Nice!

Did you find this article helpful?

If so, consider buying me a coffee over at

Did you find this article helpful?

Comments

Add a Comment