↓ Skip to main content

Background Image

Common Error Codes in Kubernetes Operations

2 July 2025·Updated: 5 September 2025·1261 words·6 mins· loading · loading ·

DevOps K8s Container

Author

yuzjing

Table of Contents

Table of Contents

When troubleshooting K8s issues, there are three core commands:

kubectl describe pod/node <name>: To check resource Events and identify the root cause.
kubectl logs <pod-name>: To check application logs and resolve program issues.
kubectl get <resource-type>: To check the status of resources.

Layer 1: Pod Status Codes
#

Status Code (Status)	Core Reason	Core Troubleshooting Steps
`Pending`	Cannot be scheduled: The scheduler cannot find a suitable node.	1. `kubectl describe pod <name>`, check `Events` to find the specific reason: - `Insufficient cpu/memory` (Not enough resources). - `Taints/Tolerations` (Mismatch between taints and tolerations). - `Affinity rules` (Mismatch in affinity/anti-affinity rules). - `PVC not bound` (PersistentVolumeClaim is not ready).
`ImagePullBackOff` / `ErrImagePull`	Image pull failed: The Kubelet cannot pull the container image from the registry.	1. `kubectl describe pod <name>`, check `Events` to find the specific reason: - Incorrect image name or tag (Check the YAML). - Private registry authentication failed (Check `imagePullSecrets`). - Network issue (Log in to the node and test with `docker/crictl pull`).
`CrashLoopBackOff`	Container is crashing repeatedly: The container exits immediately after starting, and the Kubelet keeps restarting it.	1. `kubectl logs <pod-name> --previous` (Check the logs of the previous crash, extremely important). 2. `kubectl logs <pod-name>` (Check the current logs). 3. Investigate application bugs, configuration errors, or out-of-memory issues based on the logs.
`RunContainerError`	Container runtime error: The configuration is correct, but the underlying container runtime (e.g., containerd) cannot start the container.	1. `kubectl describe pod <name>`, `Events` will show `RunContainerError`. 2. SSH into the node and use `journalctl -u containerd` (or `docker`) to check the runtime logs for more low-level error messages.
`CreateContainerConfigError`	Container configuration error: There is an issue with the configuration required to create the container (e.g., a ConfigMap or Secret).	1. `kubectl describe pod <name>`, `Events` will clearly state which resource is missing or has a format error.
`Running` (but Ready is 0/1)	Readiness Probe failed: The Pod is running, but it is not ready to receive traffic.	1. `kubectl describe pod <name>`, `Events` will record `Readiness probe failed`. 2. Check the `ReadinessProbe` configuration (initial delay, timeout) or see if a downstream service the application depends on is failing.
`Terminating` (Stuck)	Pod cannot terminate properly: Usually due to a finalizer preventing its deletion, or a volume that cannot be unmounted.	1. `kubectl describe pod <name>`, check `Events` for storage-related errors like `FailedDetachVolume`. 2. `kubectl edit pod <name>`, check the `metadata.finalizers` field; a finalizer added by a controller may not have been cleaned up.
`Unknown`	Status is unknown: Typically means the node controller cannot communicate with the Kubelet on the Pod’s node.	1. This is almost equivalent to a node being `NotReady`. Immediately check the health of the Pod’s host node (see Layer 4).
Job Failed: `BackoffLimitExceeded`	Job retry limit exceeded: The Pods created by the Job failed, and after reaching the retry limit, the Job is marked as failed.	1. `kubectl get pods -l job-name=<job-name>` to find the failed Pods created by the Job. 2. `kubectl logs <failed-pod-name>` to view the logs and identify the root cause of the task’s failure.

Layer 2: Container Exit Codes
#

Exit Code	Meaning	Core Troubleshooting Steps
`1`	General Application Error	1. Check application logs: `kubectl logs <pod-name> --previous`.
`126 / 127`	Command not executable / Command not found	1. Check the Dockerfile (`chmod +x`) and the `command` path in your YAML.
`137`	OOMKilled (Out of Memory)	1. `kubectl describe pod <name>` to confirm `Reason: OOMKilled`. 2. Increase `resources.limits.memory`.
`139`	Segmentation Fault (SIGSEGV): Code Bug.	1. Notify the developers to debug the code.
`143`	Graceful Termination (SIGTERM): Normal behavior.	1. Occurs during Pod deletion or updates; no action needed.

Layer 3: Network Status Codes and Errors
#

Error/Status	Core Reason	Core Troubleshooting Steps
Endpoints are empty	The Service Selector does not match any Pods.	1. `kubectl describe svc <name>` to check the `Selector`. 2. `kubectl get pods --show-labels` to compare with the Pod’s `Labels`.
HTTP `502/503/504`	Ingress Gateway Error / Service Unavailable / Timeout.	1. A comprehensive check of Endpoints and Pod health (`CrashLoopBackOff`, `0/1 Ready`). 2. For 504: Check Pod logs and resource usage (`kubectl top pod`) to determine if the application is slow to respond.
HTTP `499`	Client Closed Request. A non-standard Nginx status code. Simply put, the backend service took too long to respond.	1. Check backend service response time: Use `kubectl logs <ingress-controller-pod>` to check logs and identify which endpoint (URL) frequently returns 499, and confirm if its `request_time` is too long. 2. Check client timeout settings: Confirm if the client calling the service (browser, app, or another microservice) has set a very short request timeout. 3. Investigate application performance bottlenecks: Analyze the code of the corresponding service for issues like slow database queries or slow calls to third-party services.
`Connection refused`	Connection was refused: The network path is clear, but no process is listening on the target Pod’s port.	1. `kubectl exec -it <pod-name> -- netstat -tulnp` to confirm if the application is listening on the correct port. 2. Check the application’s startup logs for any port binding errors.
`Connection timed out`	Connection timed out: Packets are being lost in the network, usually due to a NetworkPolicy or firewall issue.	1. Check NetworkPolicies: `kubectl get networkpolicy -A` to confirm if a policy is blocking this traffic. 2. Check node security groups or the underlying network firewall.
`No route to host`	No route to host: Typically an issue with the inter-node network (CNI).	1. Check if the CNI plugin’s Pods (`calico-node`, `flannel-ds`, etc.) are running correctly on all nodes.

Layer 4: Node Status Codes
#

Status Code (Status)	Core Reason	Core Troubleshooting Steps
`NotReady`	Node lost contact: Communication between the Kubelet and the API Server is interrupted.	1. SSH into the node, and check `kubelet`, `containerd`, `df -h`, and `free -m` in order.
`SchedulingDisabled`	Scheduling is disabled: The node has been cordoned, and no new Pods will be scheduled on it.	1. This is an administrative action, not a failure. Use `kubectl uncordon <node-name>` to resume scheduling.
`MemoryPressure`	Memory Pressure: The available memory on the node is too low.	1. The node may start evicting Pods. Log in to the node and use `top` to find the memory hogs.
`DiskPressure`	Disk Pressure: The disk space on the node is insufficient.	1. Log in to the node, use `df -h` to locate the partition, and clean up images, containers, and logs.
`PIDPressure`	PID Pressure: The node has run out of Process IDs.	1. Log in to the node and check for any process fork bombs or applications creating too many threads/processes.

Layer 5: Storage Status Codes
#

Status Code / Event	Core Reason	Core Troubleshooting Steps
PVC: `Pending`	The PVC cannot bind to a PV.	1. `kubectl describe pvc <name>`, check `Events` to see if it’s a PV mismatch or a StorageClass issue.
Pod Event: `FailedMount`	Volume mount failed.	1. `kubectl describe pod <name>`, `Events` will provide detailed reasons, such as NFS permissions or cloud disk status.
Pod Event: `FailedDetachVolume`	Volume detach failed: Usually, the underlying storage (e.g., a cloud disk) is busy or has an issue.	1. This issue will cause a Pod to get stuck in the `Terminating` state. 2. Check the CSI plugin logs or the cloud provider’s console to see the status of the volume.
App Log: `Read-only file system`	The file system is read-only: The Pod encounters an error when writing to a PV.	1. `kubectl exec -it <pod-name> -- mount` to view mount information and confirm if the mount option is `ro` (read-only). 2. The storage backend itself may have encountered a problem and entered a read-only protective mode.