What does CrashLoopBackOff mean in Kubernetes?

CrashLoopBackOff means your container starts, crashes immediately, and Kubernetes keeps retrying with an exponential backoff delay. The root cause is almost always an application error, misconfiguration, or missing dependency.

How do I see pod logs in Kubernetes?

Use kubectl logs . For previous crashed container: kubectl logs --previous. For multi-container pods: kubectl logs -c .

Why is my pod stuck in Pending state?

Pending usually means the scheduler can't find a suitable node. Common causes: insufficient CPU/memory, node selector mismatch, PVC not bound, or all nodes have taints.

What is OOMKilled in Kubernetes?

OOMKilled (exit code 137) means your container exceeded its memory limit and was killed by the kernel OOM killer. Fix by increasing the memory limit or reducing memory usage in your application.

Kubernetes Pod Troubleshooting Guide | Debug CrashLoopBackOff, OOMKilled, Pending

2026년 4월 16일 · 18분 읽기 · 수정 2026년 4월 16일 intermediate tutorial

이 글의 핵심

Pod crashes, OOM kills, and scheduling failures are the most common Kubernetes pain points. This guide gives you a systematic troubleshooting workflow and kubectl one-liners for every scenario.

Troubleshooting Workflow

When a pod misbehaves, follow this sequence:

# 1. Check pod status
kubectl get pods

# 2. Describe the pod (events, conditions, resource usage)
kubectl describe pod <pod-name>

# 3. Check logs
kubectl logs <pod-name>
kubectl logs <pod-name> --previous  # last crash

# 4. Shell into container (if it's running)
kubectl exec -it <pod-name> -- /bin/sh

# 5. Check events for the namespace
kubectl get events --sort-by='.lastTimestamp'

1. CrashLoopBackOff

The container starts and exits repeatedly. Kubernetes backs off exponentially between restarts.

Diagnose:

kubectl describe pod <pod-name>
# Look at: Last State, Exit Code, Reason

kubectl logs <pod-name> --previous
# Application output from the last crashed container

Common causes and fixes:

Exit Code	Meaning	Fix
1	Application error	Check `--previous` logs
127	Command not found	Fix `command`/`args` in spec
137	OOMKilled	Increase memory limit
143	SIGTERM timeout	Fix graceful shutdown

# Check what command the container is running
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].command}'

# Check environment variables (missing config?)
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].env}'

Typical fix — missing environment variable:

env:
  - name: DATABASE_URL
    valueFrom:
      secretKeyRef:
        name: app-secrets
        key: database-url

2. OOMKilled (Exit Code 137)

Your container exceeded its memory limit.

Diagnose:

kubectl describe pod <pod-name>
# Look for: OOMKilled, Last State reason

# Check current memory usage
kubectl top pod <pod-name>
kubectl top pod <pod-name> --containers

Fix — increase memory limit:

resources:
  requests:
    memory: "256Mi"
  limits:
    memory: "512Mi"  # increase this

Find memory-hungry pods cluster-wide:

kubectl top pods --all-namespaces --sort-by=memory | head -20

Tips:

Set limits at 2× your typical peak usage
Use requests for scheduling, limits for enforcement
Profile your app before setting limits — guessing leads to OOM or resource waste

3. ImagePullBackOff / ErrImagePull

Kubernetes can’t pull the container image.

Diagnose:

kubectl describe pod <pod-name>
# Look at Events: Failed to pull image "..."

Common causes:

# 1. Wrong image name or tag
# Fix: correct the image in your Deployment spec
image: myapp:v1.2.3  # verify this tag exists in your registry

# 2. Private registry — missing imagePullSecret
kubectl create secret docker-registry regcred \
  --docker-server=registry.example.com \
  --docker-username=user \
  --docker-password=password

# Reference in pod spec:
imagePullSecrets:
  - name: regcred

# 3. Rate limiting (Docker Hub)
# Fix: authenticate or use a mirror

4. Pending

Pod is stuck waiting to be scheduled.

Diagnose:

kubectl describe pod <pod-name>
# Look at Events: "0/3 nodes are available: ..."

Cause: Insufficient resources

kubectl describe nodes | grep -A5 "Allocated resources"
kubectl top nodes

# Fix: reduce requests or scale the cluster
resources:
  requests:
    cpu: "100m"    # not "1000m" unless you need it
    memory: "128Mi"

Cause: Node selector / affinity mismatch

kubectl get nodes --show-labels
# Verify your nodeSelector labels exist on nodes

# Fix nodeSelector mismatch
nodeSelector:
  kubernetes.io/arch: amd64  # make sure nodes have this label

Cause: Taints with no toleration

kubectl describe node <node-name> | grep Taint

# Add toleration to your pod spec
tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "gpu"
    effect: "NoSchedule"

Cause: PVC not bound

kubectl get pvc
# If STATUS is Pending, the PV doesn't exist or StorageClass is wrong

kubectl describe pvc <pvc-name>

5. Running But Not Ready

Pod is running but failing readiness probe — traffic isn’t routed to it.

kubectl describe pod <pod-name>
# Look for: Readiness probe failed

Fix — check your probe config:

readinessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 10   # wait before first check
  periodSeconds: 5
  failureThreshold: 3
  timeoutSeconds: 2

Test the probe endpoint manually:

kubectl exec -it <pod-name> -- curl http://localhost:8080/health

6. Network Issues

Pod can’t reach another service:

# Check service exists and has endpoints
kubectl get service <service-name>
kubectl get endpoints <service-name>
# If ENDPOINTS is <none>, no pods match the selector

# Test DNS resolution from inside a pod
kubectl exec -it <pod-name> -- nslookup my-service.default.svc.cluster.local

# Test connectivity
kubectl exec -it <pod-name> -- curl http://my-service:8080/health

# Check NetworkPolicy
kubectl get networkpolicy -n <namespace>
kubectl describe networkpolicy <policy-name>

Service selector mismatch (most common cause of empty endpoints):

# Service selector
kubectl get service my-app -o jsonpath='{.spec.selector}'
# Output: {"app":"my-app"}

# Pod labels
kubectl get pods --show-labels
# Make sure pods have app=my-app label

7. Init Container Failures

kubectl describe pod <pod-name>
# Look at Init Containers section

kubectl logs <pod-name> -c <init-container-name>

Common: init container waiting for a database that isn’t ready.

initContainers:
  - name: wait-for-db
    image: busybox
    command: ['sh', '-c', 'until nc -z postgres 5432; do echo waiting; sleep 2; done']

8. Useful kubectl One-Liners

# All pods not Running across all namespaces
kubectl get pods -A --field-selector=status.phase!=Running

# Watch pod restarts
kubectl get pods -w

# Pod resource usage sorted by CPU
kubectl top pods --sort-by=cpu

# Describe all pods matching a label
kubectl describe pods -l app=myapp

# Copy file from pod
kubectl cp <pod-name>:/var/log/app.log ./app.log

# Port forward to local machine
kubectl port-forward pod/<pod-name> 8080:8080

# Run a debug pod with full tools
kubectl run debug --image=nicolaka/netshoot -it --rm -- bash

# Force delete a stuck Terminating pod
kubectl delete pod <pod-name> --grace-period=0 --force

# View resource quotas and limits
kubectl describe resourcequota -n <namespace>
kubectl describe limitrange -n <namespace>

9. Pod Status Quick Reference

Status	Meaning	First action
`Pending`	Not scheduled	`describe pod` → check Events
`Init:0/1`	Init container running	`logs -c <init-name>`
`PodInitializing`	Init done, main starting	Wait or check image pull
`Running`	Running but may not be ready	Check readiness probe
`CrashLoopBackOff`	App crashing	`logs --previous`
`OOMKilled`	Memory limit exceeded	Increase memory limit
`Terminating`	Being deleted	Wait; force delete if stuck
`ImagePullBackOff`	Can’t pull image	Check image name + registry creds
`Error`	Container exited with error	`logs --previous` + exit code

Systematic Checklist

□ kubectl get pods          → identify status
□ kubectl describe pod      → read Events section
□ kubectl logs --previous   → app-level error
□ kubectl top pod           → memory/CPU usage
□ kubectl get events        → cluster-level context
□ kubectl exec -- curl      → test endpoints from inside
□ kubectl get endpoints     → verify service routing

Most Kubernetes issues fall into five categories: application errors (logs), resource constraints (top/describe), scheduling conflicts (describe node), image problems (events), and network misconfigurations (endpoints/NetworkPolicy). Work through the checklist in order and you’ll resolve 95% of issues in under five minutes.