Real-Time issues in Kubernetes pod:🫛

The error message indicates that the mario container is in a CrashLoopBackOff state and is unable to start successfully, causing Kubernetes to back off from restarting it repeatedly. Let's take a closer look at potential causes and steps to troubleshoot:

Error Message: 239 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"mario\" with CrashLoopBackOff: \"back-off 10s restarting failed container=mario pod=mario-deployment-65b6dcd6b-6w7q9_mario(08517d25-9052-4c74-b911-e51cd868f429)\"" pod="mario/mario-deployment-65b6dcd6b-6w7q9" podUID="08517d25-9052-4c74-b911-e51cd868f429"

Steps to Troubleshoot

Check Container Logs Inspect the logs of the mario container to see if there's any output that explains why the container is crashing.
```
 kubectl logs mario-deployment-65b6dcd6b-6w7q9 -n mario
```
Describe the Pod Get detailed information about the pod to check for any specific events or errors that might indicate why it's crashing.
```
 kubectl describe pod mario-deployment-65b6dcd6b-6w7q9 -n mario
```
Check Resource Availability Ensure that the node where the pod is scheduled (minikube-m02) has sufficient resources (CPU, memory, etc.).
```
 kubectl describe node minikube-m02
```

Review Pod Spec and Configuration Check if there are any issues with the pod specification, such as missing environment variables, incorrect image configurations, or issues with volume mounts.

 # Example pod spec snippet
 spec:
   containers:
   - name: mario
     image: avsivaranjan/mario-image
     env:
     - name: ENV_VAR_NAME
       value: "value"
     resources:
       limits:
         memory: "512Mi"
         cpu: "500m"
     volumeMounts:
     - name: config-volume
       mountPath: /etc/config
   volumes:
   - name: config-volume
     configMap:
       name: config-map-name

Health Checks Ensure that any health checks (liveness or readiness probes) are correctly configured and not causing the container to be killed prematurely.

 livenessProbe:
   httpGet:
     path: /healthz
     port: 80
   initialDelaySeconds: 3
   periodSeconds: 3

 readinessProbe:
   httpGet:
     path: /readiness
     port: 80
   initialDelaySeconds: 3
   periodSeconds: 3

Compare with Working Node Since the same image is running successfully on another node, compare the configurations of both nodes to identify any differences.
```
 kubectl get nodes -o yaml
```

Example Commands to Use

Get Logs of the Failing Container

 kubectl logs mario-deployment-65b6dcd6b-6w7q9 -n mario

Describe the Failing Pod

 kubectl describe pod mario-deployment-65b6dcd6b-6w7q9 -n mario

Check Node Conditions and Resource Availability
```
 kubectl describe node minikube-m02
```

Check for Events in the Namespace

 kubectl get events -n mario --sort-by='.metadata.creationTimestamp'

Compare Node Configurations
```
 kubectl get nodes -o yaml
```

Example YAML Configuration to Review

Ensure that the pod specification is correct and complete:

apiVersion: v1
kind: Pod
metadata:
  name: mario-deployment
  namespace: mario
spec:
  containers:
  - name: mario
    image: avsivaranjan/mario-image
    ports:
    - containerPort: 80
    env:
    - name: ENV_VAR_NAME
      value: "value"
    resources:
      limits:
        memory: "512Mi"
        cpu: "500m"
    livenessProbe:
      httpGet:
        path: /healthz
        port: 80
      initialDelaySeconds: 3
      periodSeconds: 3
    readinessProbe:
      httpGet:
        path: /readiness
        port: 80
      initialDelaySeconds: 3
      periodSeconds: 3
  volumes:
  - name: config-volume
    configMap:
      name: config-map-name

By following these steps and using these commands, you should be able to identify the root cause of the CrashLoopBackOff and take corrective actions. Let me know if you need further assistance with any specific findings.

Real-Time issues in Kubernetes pod:🫛

Written by Sivaranjan

Steps to Troubleshoot

Example Commands to Use

Example YAML Configuration to Review