Bootstrap

OpenShift - Elastic Search same_shard the shard cannot be allocated to the same node on which a copy of the shard already exists

by Jeremy Canfield | Updated: April 15 2024 | OpenShift articles

This diagram illustrates the systems that are typically used to parse log data on OpenShift. Fluentd collects the log data from the containers and passes the log data onto Elastic Search. Optionally, Kibana can be used as a tool that may make it easier to visualize the logs.

This is similar to the ELK stack (Elastic Search, Logstash, Kibana), but would actually be the EFK stack (Elastic Search, Fluentd, Kibana).

If you are not familiar with the oc command, refer to OpenShift - Getting Started with the oc command.

Almost always, Elastic Search is in the openshift-logging project. The following command can be used to return the health of Elastic Search.

green = all good
yellow = warning
red = somethings pretty hosed up

~]$ oc get elasticsearch --namespace openshift-logging
NAME            MANAGEMENT STATE   HEALTH   NODES   DATA NODES   SHARD ALLOCATION   INDEX MANAGEMENT
elasticsearch   Managed            yellow   3       3            all

Or, the oc get pods command can be used to list the Elastic Search pods in the openshift-logging project.

~]$ oc get pods --namespace openshift-logging
NAME                                            READY   STATUS      RESTARTS   AGE
elasticsearch-cdm-11c6ubcb-1-6888c6fcb9-frlxz   2/2     Running     0          5d5h
elasticsearch-cdm-11c6ubcb-2-6bddf77998-9m28k   2/2     Running     0          5d4h
elasticsearch-cdm-11c6ubcb-3-5955f88f5-mm9c2    2/2     Running     0          5d4h
elasticsearch-im-app-28342275-dkg7m             0/1     Completed   0          4m38s
elasticsearch-im-audit-28342275-nldkt           0/1     Completed   0          4m38s
elasticsearch-im-infra-28342275-b624d           0/1     Completed   0          4m38s

And then the oc exec command can be used in one of the elasticsearch-cdm pods to run the health command.

oc exec elasticsearch-cdm-mrpf7eom-3-566bd5f5cb-lkdz9 --container elasticsearch --namespace openshift-logging -- health

Something like this should be returned.

epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1700541655 04:40:55  elasticsearch yellow          3         3   1056 529    0    0        2             0                  -                 99.8%

Often, the health will be yellow or red when

The percentage of disk space being used by Elastic Search has exceeded the disk watermark threshold, which is typically 85%, meaning that 85% of the available disk space is being used, which then leads to shards being unable to be assigned to a node (unassigned shards) and also leads to indices then being yellow or red.
cluster.routing.allocation.enabled is set to primaries, new_primaries or none instead of all which may cause same_shard and "the shard cannot be allocated to the same node on which a copy of the shard already exists"

The oc exec command can be use the run the es_util command in one of the elasticsearch-cdm pods. Often, this will return the reason why Elastic Search is yellow or red.

AVOID TROUBLE

Make sure to wrap --query in single or double quotes

~]$ oc exec elasticsearch-cdm-11c6ubcb-1-6888c6fcb9-frlxz --container elasticsearch --namespace openshift-logging -- es_util --query="_cluster/allocation/explain?pretty"
{
  "index" : "app-005586",
  "shard" : 1,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "INDEX_CREATED",
    "at" : "2024-05-07T09:45:04.236Z",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "zZDAC6tURe6RnWbKOxqhpg",
      "node_name" : "elasticsearch-cdm-nc0yql38-3",
      "transport_address" : "10.130.8.6:9300",
      "node_decision" : "no",
      "weight_ranking" : 1,
      "deciders" : [
        {
          "decider" : "enable",
          "decision" : "NO",
          "explanation" : "replica allocations are forbidden due to cluster setting [cluster.routing.allocation.enable=primaries]"
        }
      ]
    },
    {
      "node_id" : "hTauU8PASjCAaJ8f84rAUg",
      "node_name" : "elasticsearch-cdm-nc0yql38-2",
      "transport_address" : "10.131.8.149:9300",
      "node_decision" : "no",
      "weight_ranking" : 2,
      "deciders" : [
        {
          "decider" : "enable",
          "decision" : "NO",
          "explanation" : "replica allocations are forbidden due to cluster setting [cluster.routing.allocation.enable=primaries]"
        },
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[app-005586][1], node[hTauU8PASjCAaJ8f84rAUg], [P], s[STARTED], a[id=M0prYVhcQ7aU0dORpThp5g]]"
        }
      ]
    },
    {
      "node_id" : "ZtWHX-icSYaXS8N_YC-ntg",
      "node_name" : "elasticsearch-cdm-nc0yql38-1",
      "transport_address" : "10.128.10.126:9300",
      "node_decision" : "no",
      "weight_ranking" : 3,
      "deciders" : [
        {
          "decider" : "enable",
          "decision" : "NO",
          "explanation" : "replica allocations are forbidden due to cluster setting [cluster.routing.allocation.enable=primaries]"
        }
      ]
    }
  ]
}

Notice in this example that SHARD ALLOCATION is primaries.

]$ oc get elasticsearch --namespace openshift-logging
NAME            MANAGEMENT STATE   HEALTH   NODES   DATA NODES   SHARD ALLOCATION   INDEX MANAGEMENT
elasticsearch   Managed            yellow   3       3            primaries

And half of the shards are unassigned.

]$ oc exec elasticsearch-cdm-11c6ubcb-1-6888c6fcb9-frlxz --container elasticsearch --namespace openshift-logging -- health
Wed May  8 09:15:17 UTC 2024
epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1715159717 09:15:17  elasticsearch yellow          3         3    251 251    0    0      251             0                  -                 50.0%

More details on the unassigned shards.

]$ oc exec elasticsearch-cdm-11c6ubcb-1-6888c6fcb9-frlxz --container elasticsearch --namespace openshift-logging -- es_util --query="_cat/allocation?v=true"
shards disk.indices disk.used disk.avail disk.total disk.percent host          ip            node
    84        8.4gb    15.6gb    167.3gb      183gb            8 10.128.10.126 10.128.10.126 elasticsearch-cdm-nc0yql38-1
    83       11.5gb      28gb    154.9gb      183gb           15 10.130.8.6    10.130.8.6    elasticsearch-cdm-nc0yql38-3
    84        7.5gb    14.6gb    168.3gb      183gb            8 10.131.8.149  10.131.8.149  elasticsearch-cdm-nc0yql38-2
   251                                                                                       UNASSIGNED

You may want to update the SHARD ALLOCATION to "all".

oc exec elasticsearch-cdm-11c6ubcb-1-6888c6fcb9-frlxz --container elasticsearch --namespace openshift-logging -- es_util --query="_cluster/settings" --request PUT --data '{ "persistent": { "cluster.routing.allocation.enable" : "all" } }'

Did you find this article helpful?

If so, consider buying me a coffee over at

Did you find this article helpful?

Comments

Add a Comment