NodeOutOfMemory

Kubernetes Node Out Of Memory

Overview


Resolve the issue of a Kubernetes node running out of memory to ensure the stability of the cluster.

Initial Response


  • Alert received indicating node out of memoy.

  • Acknowledge the alert and assign yourself as the incident owner.

  • Notify the team about the ongoing incident using the primary communication channel.

  • Update the incident status on the incident tracking system.

Detailed Steps


1) Identify Affected Node

Use the kubectl get nodes command to list all nodes in the cluster and identify the node that is running out of memory.

kubectl get nodes

2) Check Node Resource Usage

Check the resource usage of the affected node to understand what's consuming the memory. Use the following command to get detailed information about the node's resource utilization:

kubectl describe node <node-name>

3) Check Pods Resource Usage

Identify the pods running on the affected node that might be consuming excessive memory. Use the following command to list all pods on the node:

kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=<node-name>

Escalation:


If the issue persists or is severe, escalate to a senior SRE engineer for additional support and guidance.

Further Information

Last updated