
Prometheus is a system used to
- Collect metrics (e.g. memory, CPU) - This is often referred to as "scraping metrics"
- Store the metrics, for example, in object storage, such as an Amazon Web Services S3 Bucket
- Configure conditions that should create an alert (e.g. high CPU or high memory usage)
For example, Prometheus can be used to gather metrics from servers, virtual machines (VMs), databases, containers (e.g. Docker, OpenShift), messaging (e.g. IBM MQ, RabbitMQ), and the list goes on. Then the metrics could be stored in object storage, such as an Amazon Web Services (AWS) S3 Bucket.
Often, an Observability system such as Kibana is used to display the metrics with a UI that is used to display query the metrics, for example, to find systems with high CPU or memory usage.
Also, an alerting system such as Alert Manager is often used to create alerts when a certain condition is met, such as a system with high CPU or high memory usage. The alerting system would route alerts to certain targets, such as an SMTP email server or OpsGenie.
Metrics are collected by having
- ServiceMonitor and configuring one or more services with the /metrics endpoint
- PodMonitor and configuring one or more pods with the /metrics endpoint
The Monitoring Cluster Operator
By default, an OpenShift cluster will include the monitoring Cluster Operator. The oc get clusteroperators command should include the "monitoring" Cluster Operator. I am not sure if this is the same in a Kubernetes cluster.
~]$ oc get clusteroperators
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
monitoring 4.16.30 True False False 204d
To configure the monitoring cluster operator to use Prometheus, let's create a file named config.yaml that contains something like this.
prometheusK8s:
resources:
limits:
cpu: 1
memory: 25Gi
requests:
cpu: 200m
memory: 2Gi
externalLabels:
openshiftCluster: my-cluster.op.example.com
nodeSelector:
monitoringstack: ""
volumeClaimTemplate:
metadata:
name: prometheusk8s-db
spec:
storageClassName: thin-csi
resources:
requests:
storage: 400Gi
alertmanagerMain:
nodeSelector:
monitoringstack: ""
volumeClaimTemplate:
metadata:
name: alertmanagermain-db
spec:
storageClassName: thin
resources:
requests:
storage: 10Gi
prometheusOperator:
nodeSelector:
monitoringstack: ""
kubeStateMetrics:
nodeSelector:
monitoringstack: ""
grafana:
nodeSelector:
monitoringstack: ""
telemeterClient:
nodeSelector:
monitoringstack: ""
k8sPrometheusAdapter:
nodeSelector:
monitoringstack: ""
openshiftStateMetrics:
nodeSelector:
monitoringstack: ""
thanosQuerier:
nodeSelector:
monitoringstack: ""
monitoringPlugin:
nodeSelector:
monitoringstack: ""
And then use the kubectl create configmap (Kubernetes) or oc create configmap (OpenShift) to create a config map named cluster-monitoring-config that contains the config.yaml file.
oc create configmap cluster-monitoring-config --namespace openshift-monitoring --from-file config.yaml
The kubectl get configmaps (Kubernetes) or oc get configmaps (OpenShift) command can then be used to verify that the config map was created.
~]$ oc get configmaps --namespace openshift-monitoring
NAME DATA AGE
cluster-monitoring-config 1 1m
And the config map should contain the config.yaml file.
~]$ oc get configmap cluster-monitoring-config --namespace openshift-monitoring --output yaml
apiVersion: v1
data:
config.yaml: |
prometheusK8s:
resources:
limits:
cpu: 1
memory: 25Gi
requests:
cpu: 200m
memory: 2Gi
externalLabels:
openshiftCluster: my-cluster.op.example.com
nodeSelector:
monitoringstack: ""
volumeClaimTemplate:
metadata:
name: prometheusk8s-db
spec:
storageClassName: thin-csi
resources:
requests:
storage: 400Gi
alertmanagerMain:
nodeSelector:
monitoringstack: ""
volumeClaimTemplate:
metadata:
name: alertmanagermain-db
spec:
storageClassName: thin
resources:
requests:
storage: 10Gi
prometheusOperator:
nodeSelector:
monitoringstack: ""
kubeStateMetrics:
nodeSelector:
monitoringstack: ""
grafana:
nodeSelector:
monitoringstack: ""
telemeterClient:
nodeSelector:
monitoringstack: ""
k8sPrometheusAdapter:
nodeSelector:
monitoringstack: ""
openshiftStateMetrics:
nodeSelector:
monitoringstack: ""
thanosQuerier:
nodeSelector:
monitoringstack: ""
monitoringPlugin:
nodeSelector:
monitoringstack: ""
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
Prometheus Pods
There should now be Prometheus pods. The kubectl get pods (Kubernetes) or oc get pods (OpenShift) command can be used to list the Prometheus pods. The prometheus-k8s pods are the pods that are responsible for scraping metrics, such as CPU and memory.
~]$ oc get pods --namespace openshift-monitoring
NAME READY STATUS RESTARTS AGE
prometheus-adapter-6b98c646c7-m4g76 1/1 Running 0 8d
prometheus-adapter-6b98c646c7-tczr2 1/1 Running 0 8d
prometheus-k8s-0 6/6 Running 0 11d
prometheus-k8s-1 6/6 Running 0 11d
prometheus-operator-6766f68555-mkfv9 2/2 Running 0 11d
prometheus-operator-admission-webhook-8589888cbc-mq2jx 1/1 Running 0 11d
prometheus-operator-admission-webhook-8589888cbc-t62mt 1/1 Running 0 11d
PodMonitor and ServiceMonitor
Metrics are collected by having
- ServiceMonitor and configuring one or more services with the /metrics endpoint
- PodMonitor and configuring one or more pods with the /metrics endpoint
For example, you perhaps could have the following ServiceMonitor. Notice in this example that the ServiceMonitor exposes the /metrics endpoint on port 9100.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-service-monitor
namespace: my-project
spec:
endpoints:
- interval: 10s
path: /metrics
port: 9100
scheme: http
jobLabel: app.kubernetes.io/name
namespaceSelector:
any: true
sampleLimit: 1000
selector:
matchExpressions:
- key: app.kubernetes.io/name
operator: Exists
Then you could configure one or more Services to expose port 9100 since this is the port that was exposed in the ServiceMonitor, which ultimately is how Prometheus scrapes the Services metrics, such as CPU and memory.
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/name: my-service
name: my-service
namespace: my-project
spec:
ports:
- name: metrics
port: 9100
protocol: TCP
targetPort: 9100
selector:
app.kubernetes.io/name: my-service
Or you could have the following PodMonitor resource. Notice in this example that the PodMonitor podMetricsEndpoints has port 9100.
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: my-pod
labels:
team: frontend
spec:
selector:
matchLabels:
app: my-app
podMetricsEndpoints:
- port: 9100
Then you could configure one or more containers in Pods to expose port 9100 since this is the port that was exposed in the PodMonitor, which ultimately is how Prometheus scrapes the container metrics, such as CPU and memory.
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
containers:
- image: registry.redhat.io/openshift4/my-image
name: my-container
ports:
- name: metrics
port: 9100
protocol: TCP
targetPort: 9100
Prometheus Rules
The kubectl get PrometheusRules (Kubernetes) or oc get PrometheusRules (OpenShift) command can be used to list the Prometheus Rules.
~]$ oc get PrometheusRules --namespace openshift-monitoring
NAME AGE
alertmanager-main-rules 692d
cluster-monitoring-operator-prometheus-rules 692d
kube-state-metrics-rules 692d
kubernetes-monitoring-rules 692d
node-exporter-rules 692d
prometheus-k8s-prometheus-rules 692d
prometheus-k8s-thanos-sidecar-rules 692d
prometheus-operator-rules 692d
telemetry 692d
thanos-querier 692d
The kubectl exec (Kubernetes) or oc exec (OpenShift) and curl commands can be used to issue a GET request inside of your Prometheus pod to the /metrics endpoint of one of the collector pods running on the worker node. In this example, IP address 10.129.4.2 is the IP of the collector-26tk2 pod.
This will often output a slew of data, too much to parse as stdout, so almost always the output will be redirected to an out file.
oc exec pod/prometheus-k8s-0 --container prometheus --namespace openshift-monitoring -- curl --insecure --request GET --url 'http://10.129.4.2:24231/metrics' | tee --append port_24231_metrics.out
Similarlly, you may also want to get the metrics on port 2112.
oc exec pod/prometheus-k8s-0 --container prometheus --namespace openshift-monitoring -- curl --insecure --request GET --url 'http://10.129.4.2:2112/metrics' | tee --append port_2112_metrics.out
Did you find this article helpful?
If so, consider buying me a coffee over at