Pitfalls Encountered During K8s Cluster Deployment

Table of Contents

1. General Deployment Considerations
#

Version Compatibility: Various K8s components have version compatibility issues. Therefore, it’s crucial to select versions carefully during deployment, especially for components like containerd, kubernetes-dashboard, and etcd.
Calico Network Plugin: Deployment may require bypassing network restrictions (e.g., a VPN). If using a domestic mirror, you’ll need to modify the containerd config.toml file. If deployment still fails, you can try manually pulling the images first.
Helm Domestic Mirrors: Helm charts are just templates; the actual image pulling happens on the cluster nodes. Therefore, you must ensure the containerd registry on the cluster is accessible.
Local Management Tools: By installing helm and kubectl on your personal computer and configuring your environment variables, you can directly manage remote K8s clusters using these commands.

2. Dashboard Showing No Data
#

If you find the dashboard shows no data after installation, it could be due to the following reasons:

Incorrect Namespace Selected: Check if you have selected the correct namespace. The default default namespace has no monitoring data.

Metrics Server Not Installed: You need to install the metrics-server component to collect metrics.

1kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Kubelet Certificate Issue: If the Kubelet certificate is not signed by a CA trusted by the Metrics Server, the TLS handshake will fail, preventing metrics collection. To fix this, you need to add the following argument to the metrics-server deployment configuration to trust insecure Kubelet certificates:
```
1- --kubelet-insecure-tls=true
```

3. K8s Cluster High Availability Process
#

Fault Detection: Kubernetes (via the Node Controller or Liveness Probes) detects an issue with a Pod or its host Node.
Recreation Triggered: The replica controller (Deployment/StatefulSet) notices that the number of running Pods is below the desired count.
New Pod Scheduled: The Kubernetes scheduler selects a healthy Node to create a new Pod.
Use of Persistent Configuration (if enabled):
- The new Pod requests the same PersistentVolumeClaim (PVC) it was using before.
- For network storage (like NFS), the new Pod mounts it directly.
- For block storage (like EBS), Kubernetes and the CSI driver ensure the volume is safely detached from the failed Node and then attached to the new Node before being mounted by the new Pod.
Service Discovery: The Service automatically routes traffic to this new, healthy Pod instance.

4. How to Check the Desired Replica Count
#

For a Deployment:

1kubectl get deployment <deployment-name> -n <namespace> -o wide

Observe the DESIRED column.

For a StatefulSet:

1kubectl get statefulset <statefulset-name> -n <namespace> -o wide

For a ReplicaSet:

1kubectl get replicaset <replicaset-name> -n <namespace> -o wide

5. Understanding the Pod’s READY Column and Replica Count
#

In the output of kubectl get pods, the denominator in the READY column (e.g., the second 1 in 1/1) represents the number of containers defined within a single Pod.

This is a different concept from the controller’s desired replica count. The desired replica count refers to how many instances of that Pod the controller should be running.

“The number of ready containers can be understood as ‘orchestration’”: Yes, a Pod can have multiple containers that are “orchestrated” together by Kubernetes, scheduled and managed as a single unit.

The controller’s desired replica count must be checked using kubectl get deployment/statefulset.

6. About Persistent Storage
#

If your Pod is not configured with persistent storage (i.e., it uses the Pod’s ephemeral storage like emptyDir, or the container’s own temporary filesystem layer), then: when the Pod is deleted and recreated for any reason, all data written by the containers to their ephemeral storage will be lost.

The new Pod instance will start in a completely new, clean state.

7. Understanding Ports in a Service
#

The flow of a request is typically as follows: External Request -> Node(NodeIP:nodePort) -> Service(ClusterIP:port) -> Pod(PodIP:targetPort)

targetPort
- Definition: The port on the backend Pod’s container to which the Service forwards traffic.
- Scope: Inside the Pod.
- Value: Can be a number (8080) or a port name defined in the Pod spec (http-api).
nodePort
- Definition: A static port exposed on each Node’s IP address when the Service type is NodePort or LoadBalancer.
- Scope: Outside the cluster Nodes.
- Value: The default range is 30000-32767. It allows access from outside the cluster via http://<Node-IP>:<nodePort>.
port
- Definition: The port that the Service exposes on its own internal ClusterIP.
- Scope: Inside the cluster.
- Purpose: Other Pods within the cluster can access this service via http://<ServiceName>.<Namespace>.svc.cluster.local:<port>.

8. Pitfalls Encountered When Deploying Fluentd
#

When deploying bitnami/fluentd, I encountered the following issues (the successful values.yaml is in Fluentd.yaml):

Missing CRI Plugin: The official chart was missing the cri plugin for the forward input, defaulting to drop all, which prevented container logs from being correctly processed and parsed.
Missing Elasticsearch Plugin: The Aggregator configuration also lacked the Elasticsearch plugin, making it impossible to send logs to ES.
Protocol Error: The default scheme for connecting to ES was http, but the ES port 9200 is typically https, requiring a change.
Authentication Failure: Connecting to ES requires specifying a username and password (usually the Kibana login credentials), otherwise, it results in a 401 error.

Namespace Label Issue: I discovered that all logs from the efk namespace were being rejected by ES. An AI suggestion pointed to labels containing . and / as a possible cause. Upon inspection, I found the namespace had an extra name: efk label.

 1# Incorrect namespace labels
 2❯ kubectl get ns efk -o yaml
 3apiVersion: v1
 4kind: Namespace
 5metadata:
 6  creationTimestamp: "2025-06-06T06:56:03Z"
 7  labels:
 8    kubernetes.io/metadata.name: efk
 9    name: efk  # <-- This line is redundant
10...
11
12# Normal namespace labels
13❯ kubectl get ns monitoring -o yaml
14apiVersion: v1
15kind: Namespace
16metadata:
17  creationTimestamp: "2025-05-22T05:45:59Z"
18  labels:
19    kubernetes.io/metadata.name: monitoring
20  name: monitoring
21...

Strangely, neither deleting the extra label nor using a Ruby script to replace special characters worked. However, the issue resolved itself the next day, confirming that the name label was indeed the problem.

9. Ingress and MetalLB: Solving External Access
#

Ingress operates at Layer 7 (Application Layer) and can implement internal reverse proxying and load balancing. Services are accessed externally by hitting the Ingress Controller’s nodePort.
Drawback: nodePort must use a high-numbered port (e.g., 30000+), which is inconvenient and inelegant for external access.
Solution: Install the MetalLB plugin.
- Purpose: When a Service is of type LoadBalancer, MetalLB automatically assigns it an IP from a predefined external address pool, which becomes its EXTERNAL-IP.
- Advantages:
  - Eliminates the need to use high-numbered ports for access; you can use the IP directly.
  - Provides a high-availability solution, avoiding the single point of failure that can occur with iptables-based traffic forwarding.

1. General Deployment Considerations#

2. Dashboard Showing No Data#

3. K8s Cluster High Availability Process#

4. How to Check the Desired Replica Count#

5. Understanding the Pod’s READY Column and Replica Count#

6. About Persistent Storage#

7. Understanding Ports in a Service#

8. Pitfalls Encountered When Deploying Fluentd#

9. Ingress and MetalLB: Solving External Access#