Bootstrap FreeKB - Prometheus - Getting Started with Prometheus
Prometheus - Getting Started with Prometheus

Updated:   |  Prometheus articles

Prometheus is a system used to

  • Collect metrics (e.g. memory, CPU) - This is often referred to as "scraping metrics"
  • Store the metrics, for example, in object storage, such as an Amazon Web Services S3 Bucket
  • Configure conditions that should create an alert (e.g. high CPU or high memory usage)

For example, Prometheus can be used to gather metrics from servers, virtual machines (VMs), databases, containers (e.g. Docker, OpenShift), messaging (e.g. IBM MQ, RabbitMQ), and the list goes on. Then the metrics could be stored in object storage, such as an Amazon Web Services (AWS) S3 Bucket.

Often, an Observability system such as Kibana is used to display the metrics with a UI that is used to display query the metrics, for example, to find systems with high CPU or memory usage.

Also, an alerting system such as Alert Manager is often used to create alerts when a certain condition is met, such as a system with high CPU or high memory usage. The alerting system would route alerts to certain targets, such as an SMTP email server or OpsGenie.

Metrics are collected by having

  • ServiceMonitor and configuring one or more services with the /metrics endpoint
  • PodMonitor and configuring one or more pods with the /metrics endpoint

 

The Monitoring Cluster Operator

By default, an OpenShift cluster will include the monitoring Cluster Operator. The oc get clusteroperators command should include the "monitoring" Cluster Operator. I am not sure if this is the same in a Kubernetes cluster.

~]$ oc get clusteroperators
NAME         VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
monitoring   4.16.30   True        False         False      204d

 

To configure the monitoring cluster operator to use Prometheus, let's create a file named config.yaml that contains something like this.

prometheusK8s:
  resources:
    limits:
      cpu: 1
      memory: 25Gi
    requests:
      cpu: 200m
      memory: 2Gi
  externalLabels:
    openshiftCluster: my-cluster.op.example.com
  nodeSelector:
    monitoringstack: ""
  volumeClaimTemplate:
    metadata:
      name: prometheusk8s-db
    spec:
      storageClassName: thin-csi
      resources:
        requests:
          storage: 400Gi
alertmanagerMain:
  nodeSelector:
    monitoringstack: ""
  volumeClaimTemplate:
    metadata:
      name: alertmanagermain-db
    spec:
      storageClassName: thin
      resources:
        requests:
          storage: 10Gi
prometheusOperator:
  nodeSelector:
    monitoringstack: ""
kubeStateMetrics:
  nodeSelector:
    monitoringstack: ""
grafana:
  nodeSelector:
    monitoringstack: ""
telemeterClient:
  nodeSelector:
    monitoringstack: ""
k8sPrometheusAdapter:
  nodeSelector:
    monitoringstack: ""
openshiftStateMetrics:
  nodeSelector:
    monitoringstack: ""
thanosQuerier:
  nodeSelector:
    monitoringstack: ""
monitoringPlugin:
  nodeSelector:
    monitoringstack: ""

 

And then use the kubectl create configmap (Kubernetes) or oc create configmap (OpenShift) to create a config map named cluster-monitoring-config that contains the config.yaml file.

oc create configmap cluster-monitoring-config --namespace openshift-monitoring --from-file config.yaml

 

The kubectl get configmaps (Kubernetes) or oc get configmaps (OpenShift) command can then be used to verify that the config map was created.

~]$ oc get configmaps --namespace openshift-monitoring
NAME                              DATA   AGE
cluster-monitoring-config         1      1m

 

And the config map should contain the config.yaml file.

~]$ oc get configmap cluster-monitoring-config --namespace openshift-monitoring --output yaml
apiVersion: v1
data:
  config.yaml: |
    prometheusK8s:
      resources:
        limits:
          cpu: 1
          memory: 25Gi
        requests:
          cpu: 200m
          memory: 2Gi
      externalLabels:
        openshiftCluster: my-cluster.op.example.com
      nodeSelector:
        monitoringstack: ""
      volumeClaimTemplate:
        metadata:
          name: prometheusk8s-db
        spec:
          storageClassName: thin-csi
          resources:
            requests:
              storage: 400Gi
    alertmanagerMain:
      nodeSelector:
        monitoringstack: ""
      volumeClaimTemplate:
        metadata:
          name: alertmanagermain-db
        spec:
          storageClassName: thin
          resources:
            requests:
              storage: 10Gi
    prometheusOperator:
      nodeSelector:
        monitoringstack: ""
    kubeStateMetrics:
      nodeSelector:
        monitoringstack: ""
    grafana:
      nodeSelector:
        monitoringstack: ""
    telemeterClient:
      nodeSelector:
        monitoringstack: ""
    k8sPrometheusAdapter:
      nodeSelector:
        monitoringstack: ""
    openshiftStateMetrics:
      nodeSelector:
        monitoringstack: ""
    thanosQuerier:
      nodeSelector:
        monitoringstack: ""
    monitoringPlugin:
      nodeSelector:
        monitoringstack: ""
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring

 

Prometheus Pods

There should now be Prometheus pods. The kubectl get pods (Kubernetes) or oc get pods (OpenShift) command can be used to list the Prometheus pods. The prometheus-k8s pods are the pods that are responsible for scraping metrics, such as CPU and memory.

~]$ oc get pods --namespace openshift-monitoring
NAME                                                     READY   STATUS    RESTARTS        AGE
prometheus-adapter-6b98c646c7-m4g76                      1/1     Running   0               8d
prometheus-adapter-6b98c646c7-tczr2                      1/1     Running   0               8d
prometheus-k8s-0                                         6/6     Running   0               11d
prometheus-k8s-1                                         6/6     Running   0               11d
prometheus-operator-6766f68555-mkfv9                     2/2     Running   0               11d
prometheus-operator-admission-webhook-8589888cbc-mq2jx   1/1     Running   0               11d
prometheus-operator-admission-webhook-8589888cbc-t62mt   1/1     Running   0               11d

 

PodMonitor and ServiceMonitor

Metrics are collected by having

  • ServiceMonitor and configuring one or more services with the /metrics endpoint
  • PodMonitor and configuring one or more pods with the /metrics endpoint

For example, you perhaps could have the following ServiceMonitor. Notice in this example that the ServiceMonitor exposes the /metrics endpoint on port 9100. 

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-service-monitor
  namespace: my-project
spec:
  endpoints:
    - interval: 10s
      path: /metrics
      port: 9100
      scheme: http
  jobLabel: app.kubernetes.io/name
  namespaceSelector:
    any: true
  sampleLimit: 1000
  selector:
    matchExpressions:
      - key: app.kubernetes.io/name
        operator: Exists

 

Then you could configure one or more Services to expose port 9100 since this is the port that was exposed in the ServiceMonitor, which ultimately is how Prometheus scrapes the Services metrics, such as CPU and memory.

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: my-service
  name: my-service
  namespace: my-project
spec:
  ports:
    - name: metrics
      port: 9100
      protocol: TCP
      targetPort: 9100
  selector:
    app.kubernetes.io/name: my-service

 

Or you could have the following PodMonitor resource. Notice in this example that the PodMonitor podMetricsEndpoints has port 9100. 

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: my-pod
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: my-app
  podMetricsEndpoints:
  - port: 9100

 

Then you could configure one or more containers in Pods to expose port 9100 since this is the port that was exposed in the PodMonitor, which ultimately is how Prometheus scrapes the container metrics, such as CPU and memory.

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - image: registry.redhat.io/openshift4/my-image
    name: my-container
    ports:
    - name: metrics
      port: 9100
      protocol: TCP
      targetPort: 9100

 

Prometheus Rules

The kubectl get PrometheusRules (Kubernetes) or oc get PrometheusRules (OpenShift) command can be used to list the Prometheus Rules.

~]$ oc get PrometheusRules --namespace openshift-monitoring
NAME                                           AGE
alertmanager-main-rules                        692d
cluster-monitoring-operator-prometheus-rules   692d
kube-state-metrics-rules                       692d
kubernetes-monitoring-rules                    692d
node-exporter-rules                            692d
prometheus-k8s-prometheus-rules                692d
prometheus-k8s-thanos-sidecar-rules            692d
prometheus-operator-rules                      692d
telemetry                                      692d
thanos-querier                                 692d

 

The kubectl exec (Kubernetes) or oc exec (OpenShift) and curl commands can be used to issue a GET request inside of your Prometheus pod to the /metrics endpoint of one of the collector pods running on the worker node. In this example, IP address 10.129.4.2 is the IP of the collector-26tk2 pod.

This will often output a slew of data, too much to parse as stdout, so almost always the output will be redirected to an out file.

oc exec pod/prometheus-k8s-0 --container prometheus --namespace openshift-monitoring -- curl --insecure --request GET --url 'http://10.129.4.2:24231/metrics' | tee --append port_24231_metrics.out

 

Similarlly, you may also want to get the metrics on port 2112.

oc exec pod/prometheus-k8s-0 --container prometheus --namespace openshift-monitoring -- curl --insecure --request GET --url 'http://10.129.4.2:2112/metrics' | tee --append port_2112_metrics.out



Did you find this article helpful?

If so, consider buying me a coffee over at Buy Me A Coffee



Comments


Add a Comment


Please enter ddc251 in the box below so that we can be sure you are a human.