
Let's say you get an alert like this.
alertname = FluentDVeryHighErrorRate
instance = 10.131.17.130:24231
namespace = openshift-logging
openshiftCluster = stg001.op.thrivent.com
openshift_io_alert_source = platform
prometheus = openshift-monitoring/k8s
severity = critical
message = +Inf% of records have resulted in an error by fluentd 10.131.17.130:24231.
summary = FluentD output errors are very high
These alerts come from the alertmanager pod in the openshift-monitoring namespace. The oc get pods command can be used to list the pods in the openshift-monitoring namespace.
~]$ oc get pods --namespace openshift-monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 6/6 Running 1 (3d19h ago) 3d19h
alertmanager-main-1 6/6 Running 1 (3d19h ago) 3d19h
The oc exec command can be used to run the amtool CLI in one of the alertmanager pods. The amtool alert command can be used to list the active alerts. If FluentDVeryHighErrorRate is not in the ouput this means the alert is no longer active.
~]$ oc exec pod/alertmanager-main-0 --namespace openshift-monitoring -- amtool --alertmanager.url="http://localhost:9093" alert
Alertname Starts At Summary State
FluentDVeryHighErrorRate 2024-01-12 23:12:17 UTC FluentD output errors are very high. active
The amtool alert command with the --silenced flag can be used to list silenced alerts.
~]$ oc exec pod/alertmanager-main-0 --namespace openshift-monitoring -- amtool --alertmanager.url="http://localhost:9093" alert --silenced
ID Matchers Ends At Created By Comment
d86e8aa8-91f3-463d-a34a-daf530682f38 alertname="FluentdNodeDown" 2024-01-20 09:29:29 UTC john.doe temporarily silencing this alert for 1 day
This command can be used to return the scrape URLs being used between Prometheus and Fluentd.
~]$ oc exec pod/prometheus-k8s-0 --container prometheus --namespace openshift-monitoring -- curl -s 'http://localhost:9090/api/v1/targets' | python -m json.tool | grep scrapeUrl | grep 24231 | sort
"scrapeUrl": "https://10.128.0.52:24231/metrics",
"scrapeUrl": "https://10.128.17.147:24231/metrics",
"scrapeUrl": "https://10.128.19.37:24231/metrics",
"scrapeUrl": "https://10.128.2.102:24231/metrics",
"scrapeUrl": "https://10.128.21.49:24231/metrics",
"scrapeUrl": "https://10.128.22.185:24231/metrics",
"scrapeUrl": "https://10.128.25.124:24231/metrics",
"scrapeUrl": "https://10.128.27.102:24231/metrics",
"scrapeUrl": "https://10.128.5.230:24231/metrics",
"scrapeUrl": "https://10.128.6.50:24231/metrics",
"scrapeUrl": "https://10.129.0.27:24231/metrics",
"scrapeUrl": "https://10.129.17.141:24231/metrics",
"scrapeUrl": "https://10.129.18.155:24231/metrics",
"scrapeUrl": "https://10.129.20.213:24231/metrics",
"scrapeUrl": "https://10.129.22.243:24231/metrics",
"scrapeUrl": "https://10.129.25.147:24231/metrics",
"scrapeUrl": "https://10.129.26.247:24231/metrics",
"scrapeUrl": "https://10.129.4.33:24231/metrics",
"scrapeUrl": "https://10.129.6.13:24231/metrics",
"scrapeUrl": "https://10.130.0.51:24231/metrics",
"scrapeUrl": "https://10.130.16.195:24231/metrics",
"scrapeUrl": "https://10.130.19.4:24231/metrics",
"scrapeUrl": "https://10.130.21.199:24231/metrics",
"scrapeUrl": "https://10.130.22.213:24231/metrics",
"scrapeUrl": "https://10.130.2.33:24231/metrics",
"scrapeUrl": "https://10.130.24.212:24231/metrics",
"scrapeUrl": "https://10.130.26.222:24231/metrics",
"scrapeUrl": "https://10.130.5.191:24231/metrics",
"scrapeUrl": "https://10.130.6.13:24231/metrics",
"scrapeUrl": "https://10.131.1.0:24231/metrics",
"scrapeUrl": "https://10.131.14.158:24231/metrics",
"scrapeUrl": "https://10.131.17.130:24231/metrics",
"scrapeUrl": "https://10.131.19.49:24231/metrics",
"scrapeUrl": "https://10.131.20.228:24231/metrics",
"scrapeUrl": "https://10.131.23.27:24231/metrics",
"scrapeUrl": "https://10.131.2.38:24231/metrics",
"scrapeUrl": "https://10.131.25.171:24231/metrics",
"scrapeUrl": "https://10.131.4.29:24231/metrics",
Here is a one-liner to loop through each /metrics endpoint and return the current fluentd_output_status_num_errors counts.
for scrapeurl in $(oc exec pod/prometheus-k8s-0 --container prometheus --namespace openshift-monitoring -- curl -s 'http://localhost:9090/api/v1/targets' | python -m json.tool | grep scrapeUrl | grep 24231 | sort | sed 's|.*"scrapeUrl": "||g' | sed 's|".*||g'); do oc exec pod/prometheus-k8s-0 --container prometheus --namespace openshift-monitoring -- curl --silent --insecure --request GET --url "$scrapeurl" | grep -i ^fluentd_output_status_num_errors; done;
Which should return something like this. Notice the counts are all zero except for Elastic Search.
fluentd_output_status_num_errors{hostname="collector-pn2ck",plugin_id="object:7bc",type="relabel"} 0.0
fluentd_output_status_num_errors{hostname="collector-pn2ck",plugin_id="object:80c",type="rewrite_tag_filter"} 0.0
fluentd_output_status_num_errors{hostname="collector-pn2ck",plugin_id="object:c3dc",type="relabel"} 0.0
fluentd_output_status_num_errors{hostname="collector-pn2ck",plugin_id="object:c3f0",type="relabel"} 0.0
fluentd_output_status_num_errors{hostname="collector-pn2ck",plugin_id="object:c404",type="relabel"} 0.0
fluentd_output_status_num_errors{hostname="collector-pn2ck",plugin_id="object:c418",type="stdout"} 0.0
fluentd_output_status_num_errors{hostname="collector-pn2ck",plugin_id="object:c454",type="relabel"} 0.0
fluentd_output_status_num_errors{hostname="collector-pn2ck",plugin_id="object:c47c",type="relabel"} 0.0
fluentd_output_status_num_errors{hostname="collector-pn2ck",plugin_id="object:c4a4",type="relabel"} 0.0
fluentd_output_status_num_errors{hostname="collector-pn2ck",plugin_id="object:c4b8",type="relabel"} 0.0
fluentd_output_status_num_errors{hostname="collector-pn2ck",plugin_id="object:c4cc",type="relabel"} 0.0
fluentd_output_status_num_errors{hostname="collector-pn2ck",plugin_id="auditlogs_arcsight",type="remote_syslog"} 0.0
fluentd_output_status_num_errors{hostname="collector-pn2ck",plugin_id="retry_default",type="elasticsearch"} 0.0
fluentd_output_status_num_errors{hostname="collector-pn2ck",plugin_id="default",type="elasticsearch"} 67.0 <- Elastic Search error count
Did you find this article helpful?
If so, consider buying me a coffee over at