
At a high level, the alerts are created like this: Prometheus > Alert Manager > OpsGenie
Alert Manager, as the name implies, does things like receive alerts from a client (such as Prometheus) and routing alerts to certain targets, such as an SMTP email server or OpsGenie. Alert Manager also has some features such as being able to silence alerts for a period of time.
There should be a secret in the Alert Manager stateful set named alertmanager-main-generated.
~]$ oc get statefulset alertmanager-main --namespace openshift-monitoring --output yaml
spec:
template:
spec:
volumes:
- name: config-volume
secret:
defaultMode: 420
secretName: alertmanager-main-generated
And the alertmanager-main-generated secret should contain something like this. Notice the secret contains the alertmanager.yaml.gz file.
~]$ oc get secret alertmanager-main-generated --namespace openshift-monitoring --output yaml
apiVersion: v1
data:
alertmanager.yaml.gz: H4sIAAAAAAAA/6xSwY7TMBC95yvmjJTuFlpt5RN3JK4cEEQTZ5xa2BkzHqf071GSTTdVkeDAzTN+8+b5+fWBWwymAhDKHEZq1EfiogaOsQLIUVPjhKMBDCQaccCe5KOexY806M7yDZYjip45q5nr/rLbosz74woU+lm8UKMhG3AYMlUAnHJPg6cGk2+KBANn1ZTN0xMmv1tvX/fdgX/Q1cAztgdH7Uu9x87Wh6Nrazx0WHdkD+7lhO502lfCRWl5rCU/khjoyGEJWgH0wiU17XW6r8GGkpVkPg8YKSe0NFeJu4lhosoL9o3tgmrPHfcVAECcCjMfYXFvIjLw5Q20nbXi1VsMj7OZRhKv1zvMdtQPjh/HHKEWIbO+BYtytpOQv25uhP4g/Pun0tJn7mj37p8EzI4amCLhh762JSvHeibMN8Mv6NewLQ0/KMmIwcB+7golQt20PzzHal2dTbX8z00DRfShsTw437/+j7KBln/NDAPpIsCxYKT7IK9UG0P+B92asbW+pUQppoBTjuDrt+p3AAAA//8S4NTEjAMAAA==
kind: Secret
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"alertmanager.yaml.gz":"H4sICDuirWcAA2FsZXJ0bWFuYWdlci55YW1sAK1TPU/DQAzd+RW3VUJKodCKKhM7EisbkXPxpSfuI/iclv57fElTAmVBIlP8/Pz87DitizW48kopwhTdHiu2HmPPpdp4QZPnrkoeiHcxCbjIQHtY8o7sHgMvdfTl3WYxUQ1JrBbgkNhDgBbpcc49Ewnfe0vSzqVSGXAJJRO71GKwWEFnqzc8itIt1GuD9UOxgkYX642pC1g3UDSo1+ZhC2a7XS1+lvbkpHTH3KXy5kaQ5ZQePRBqFEuU8uCFCuCxVDaYKKFS6MG6SsdgbDswMoejKNbxwwZGCsjDhMlEktqLCSdJTZatBvfPsg0a6B3PkAOw3jWxvSL5dJjFW3nrqvo4NdKuT9LiFOWq1IHGU9zF5lwzWNnLUajVcAIjegA73cS0vbkRwg6BZ7X3twM12zkP67PLMcjPMOvo/2XyP2a+OhzmiQuFhEKyfPyx6bnAt8SFgBHTPeUvNa4Heo5JZ2MXQufzOInIBf86yutTX+NzbHB5/Uczw6JLlX80G9pCi6Poi/EifrfzCRxqUwa+AwAA"},"kind":"Secret","metadata":{"annotations":{},"creationTimestamp":"2025-02-13T07:24:51Z","labels":{"managed-by":"prometheus-operator"},"name":"alertmanager-main-generated","namespace":"openshift-monitoring","ownerReferences":[{"apiVersion":"monitoring.coreos.com/v1","blockOwnerDeletion":true,"controller":true,"kind":"Alertmanager","name":"main","uid":"e6edf69d-2220-44f7-b60d-83ec2d4c83ae"}],"resourceVersion":"448712910","uid":"b9a5ca4c-518f-4868-bb4b-f0d60a4adc0d"},"type":"Opaque"}
creationTimestamp: "2025-02-13T07:24:51Z"
labels:
managed-by: prometheus-operator
name: alertmanager-main-generated
namespace: openshift-monitoring
ownerReferences:
- apiVersion: monitoring.coreos.com/v1
blockOwnerDeletion: true
controller: true
kind: Alertmanager
name: main
uid: e6edf69d-2220-44f7-b60d-83ec2d4c83ae
resourceVersion: "448713417"
uid: b9a5ca4c-518f-4868-bb4b-f0d60a4adc0d
type: Opaque
For example, let’s say alertmanager.yaml has the following and you want to change priority from P3 to P1.
global:
resolve_timeout: 5m
opsgenie_api_key: '3756622d-bd21-4c3c-9127-e92f11007b3a'
opsgenie_api_url: 'https://api.opsgenie.com'
receivers:
- name: info
email_configs:
- to: 'no-reply@example.com'
- name: critical
opsgenie_con
figs:
- message: "{{ range .Alerts }}{{ .labels.alertname }}{{ end }}"
description: "{{ range .Alerts }}{{ .labels.severity }} - {{ .annotations.description }}{{ end }}
priority: "P3"
http_config:
proxy_url: "http://proxy.example.com"
- name: default
- name: watchdog
route:
group_by:
- cluster
- namespace
- pod
group_interval: 15m
group_wait: 5m
receiver: default
repeat_interval: 30m
routes:
- match:
alertname: Watchdog
receiver: watchdog
- match:
severity: critical
receiver: critical
- match_re:
alertname: ^KubeNode.*
receiver: critical
Notice the above YAML has this.
- message: "{{ range .Alerts }}{{ .labels.alertname }}{{ end }}"
description: "{{ range .Alerts }}{{ .labels.severity }} - {{ .annotations.description }}{{ end }}"
The amtool in one of the alertmanager pods in the openshift-monitoring namespace in on-prem OpenShift can be used to display the current active alerts, in JSON.
oc exec pod/alertmanager-main-0 \
--namespace openshift-monitoring -- \
amtool --alertmanager.url="http://localhost:9093" alert --output json | jq
The amtool command should return something like this. Notice one of the JSON keys is annotations.description. This is why alertmanager.yaml has .annotations.description.
[
{
"annotations": {
"description": "Insights recommendation \"The user workloads will be scheduled on infra nodes when the infrastructure nodes are not configured for taints with the \"NoSchedule\" effect\" with total risk \"Low\" was detected on the cluster. More information is available at https://console.redhat.com/openshift/insights/advisor/clusters/da49be12-f6de-42ab-9f34-ed5e9ab5e17a?first=ccx_rules_ocp.external.rules.check_infra_nodes_configurations_taints%7CINFRA_NODES_NOT_CONFIGURE_TAINTS.",
"summary": "An Insights recommendation is active for this cluster."
},
"endsAt": "2025-02-13T10:02:40.965Z",
"fingerprint": "363207909f3e8d5d",
"receivers": [
{
"name": "default"
}
],
"startsAt": "2025-02-10T15:54:10.965Z",
"status": {
"inhibitedBy": [],
"silencedBy": [],
"state": "active"
},
"updatedAt": "2025-02-13T09:58:40.970Z",
"generatorURL": "https://console-openshift-console.apps.lab.op.example.com/monitoring/graph?g0.expr=insights_recommendation_active+%3D%3D+1&g0.tab=1",
"labels": {
"alertname": "InsightsRecommendationActive",
"container": "insights-operator",
"description": "The user workloads will be scheduled on infra nodes when the infrastructure nodes are not configured for taints with the \"NoSchedule\" effect",
"endpoint": "https",
"info_link": "https://console.redhat.com/openshift/insights/advisor/clusters/da49be12-f6de-42ab-9f34-ed5e9ab5e17a?first=ccx_rules_ocp.external.rules.check_infra_nodes_configurations_taints%7CINFRA_NODES_NOT_CONFIGURE_TAINTS",
"instance": "10.129.0.29:8443",
"job": "metrics",
"namespace": "openshift-insights",
"openshiftCluster": "lab.op.example.com",
"openshift_io_alert_source": "platform",
"pod": "insights-operator-5cdb7ccdd4-nlrn6",
"prometheus": "openshift-monitoring/k8s",
"service": "metrics",
"severity": "info",
"total_risk": "Low"
}
}
]
Let's say you update priority from P3 to P1 in alertmanager.yaml.
global:
resolve_timeout: 5m
opsgenie_api_key: '3756622d-bd21-4c3c-9127-e92f11007b3a'
opsgenie_api_url: 'https://api.opsgenie.com'
receivers:
- name: info
email_configs:
- to: 'no-reply@example.com'
- name: critical
opsgenie_con
figs:
- message: "{{ range .Alerts }}{{ .labels.alertname }}{{ end }}"
description: "{{ range .Alerts }}{{ .labels.severity }} - {{ .annotations.description }}{{ end }}
priority: "P1"
http_config:
proxy_url: "http://proxy.example.com"
- name: default
- name: watchdog
route:
group_by:
- cluster
- namespace
- pod
group_interval: 15m
group_wait: 5m
receiver: default
repeat_interval: 30m
routes:
- match:
alertname: Watchdog
receiver: watchdog
- match:
severity: critical
receiver: critical
- match_re:
alertname: ^KubeNode.*
receiver: critical
Let’s get the base64 encoded string of alertmanager.yml.
~]$ cat alertmanager.yaml | base64 | sed ':label; N; $! b label; s|\n||g'
Z2xvYmFsOgogIHJlc29sdmVfdGltZW91dDogNW0KICBvcHNnZW5pZV9hcGlfa2V5OiAnMzc1NjYyMmQtYmQyMS00YzNjLTkxMjctZTkyZjExMDA3YjNhJwogIG9wc2dlbmllX2FwaV91cmw6ICdodHRwczovL2FwaS5vcHNnZW5pZS5jb20nCnJlY2VpdmVyczoKICAtIG5hbWU6IGluZm8KICAgIGVtYWlsX2NvbmZpZ3M6CiAgICAtIHRvOiAnbm8tcmVwbHlAZXhhbXBsZS5jb20nCiAgLSBuYW1lOiBjcml0aWNhbAogICAgb3BzZ2VuaWVfY29uCiAgICBmaWdzOgogICAgLSBtZXNzYWdlOiAie3sgcmFuZ2UgLkFsZXJ0cyB9fXt7IC5sYWJlbHMuYWxlcnRuYW1lIH19e3sgZW5kIH19IgogICAgICBkZXNjcmlwdGlvbjogInt7IHJhbmdlIC5BbGVydHMgfX17eyAubGFiZWxzLnNldmVyaXR5IH19IC0ge3sgLmFubm90YXRpb25zLmRlc2NyaXB0aW9uIH19e3sgZW5kIH19CiAgICAgIHByaW9yaXR5OiAiUDMiCiAgICAgIGh0dHBfY29uZmlnOgogICAgICAgIHByb3h5X3VybDogImh0dHA6Ly9wcm94eS5leGFtcGxlLmNvbSIKICAtIG5hbWU6IGRlZmF1bHQKICAtIG5hbWU6IHdhdGNoZG9nCnJvdXRlOgogIGdyb3VwX2J5OgogICAgLSBjbHVzdGVyCiAgICAtIG5hbWVzcGFjZQogICAgLSBwb2QKICBncm91cF9pbnRlcnZhbDogMTVtCiAgZ3JvdXBfd2FpdDogNW0KICByZWNlaXZlcjogZGVmYXVsdAogIHJlcGVhdF9pbnRlcnZhbDogMzBtCiAgcm91dGVzOgogICAgLSBtYXRjaDoKICAgICAgICBhbGVydG5hbWU6IFdhdGNoZG9nCiAgICAgIHJlY2VpdmVyOiB3YXRjaGRvZwogICAgLSBtYXRjaDoKICAgICAgIHNldmVyaXR5OiBjcml0aWNhbAogICAgICByZWNlaXZlcjogY3JpdGljYWwKCiAgICAtIG1hdGNoX3JlOgogICAgICAgIGFsZXJ0bmFtZTogXkt1YmVOb2RlLioKICAgICAgcmVjZWl2ZXI6IGNyaXRpY2FsCg==
Let’s get the current YAML of the alertmanager-main secret.
oc get secret alertmanager-main --namespace openshift-monitoring --output yaml > alertmanager-main.yml
Let’s update alertmanager-main.yml with the base64 string.
vim alertmanager-main.yml
Let’s update the alertmanager-main secret.
oc apply -f alertmanager-main.yml
Now the alertmanager.yaml file in the alertmanager-main secret contain whatever change we made to alertmanager.yaml which is priority P1 in this example.
oc get secret alertmanager-main --namespace openshift-monitoring --output jsonpath="{.data.alertmanager\.yaml}" | base64 --decode
But the /etc/alertmanager/config_out/alertmanager.env.yaml file in the Alert Manager pod may not be immediately updated because the pod uses the alertmanager-main-generated secret.
oc exec pod/alertmanager-main-0 --namespace openshift-monitoring -- cat /etc/alertmanager/config_out/alertmanager.env.yaml
Typically, you just need to wait a few minutes for Alert Manager to do it's think. Alert Manager should gzip compress the new alertmanager.yaml file, update the alertmanager-main-generated secret to contain the gzip compressed new alertmanager.yaml file, and then /etc/alertmanager/config_out/alertmanager.env.yaml in the Alert Manager pod should contain the new alertmanager.yaml file. After waiting a few minutes, the /etc/alertmanager/config_out/alertmanager.env.yaml file in the Alert Manager pod should have the new alertmanager.yaml.
oc exec pod/alertmanager-main-0 --namespace openshift-monitoring -- cat /etc/alertmanager/config_out/alertmanager.env.yaml
Take a look at the pod logs and ensure there are no errors.
oc logs pod/alertmanager-main-0 --namespace openshift-monitoring
You'll want to see events like this.
ts=2025-03-07T04:12:32.199Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml
ts=2025-03-07T04:12:32.199Z caller=coordinator.go:126 level=info component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml
Now when there are critical errors (in this example) Prometheus Alert Manager should create the alert in OpsGenie. Nice!
Did you find this article helpful?
If so, consider buying me a coffee over at