1. General Deployment Considerations#
- Version Compatibility: Various K8s components have version compatibility issues. Therefore, it’s crucial to select versions carefully during deployment, especially for components like
containerd
,kubernetes-dashboard
, andetcd
. - Calico Network Plugin: Deployment may require bypassing network restrictions (e.g., a VPN). If using a domestic mirror, you’ll need to modify the
containerd
config.toml
file. If deployment still fails, you can try manually pulling the images first. - Helm Domestic Mirrors: Helm charts are just templates; the actual image pulling happens on the cluster nodes. Therefore, you must ensure the
containerd
registry on the cluster is accessible. - Local Management Tools: By installing
helm
andkubectl
on your personal computer and configuring your environment variables, you can directly manage remote K8s clusters using these commands.
2. Dashboard Showing No Data#
If you find the dashboard shows no data after installation, it could be due to the following reasons:
- Incorrect Namespace Selected: Check if you have selected the correct namespace. The default
default
namespace has no monitoring data. - Metrics Server Not Installed: You need to install the
metrics-server
component to collect metrics.1kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
- Kubelet Certificate Issue: If the Kubelet certificate is not signed by a CA trusted by the Metrics Server, the TLS handshake will fail, preventing metrics collection. To fix this, you need to add the following argument to the
metrics-server
deployment configuration to trust insecure Kubelet certificates:1- --kubelet-insecure-tls=true
3. K8s Cluster High Availability Process#
- Fault Detection: Kubernetes (via the Node Controller or Liveness Probes) detects an issue with a Pod or its host Node.
- Recreation Triggered: The replica controller (Deployment/StatefulSet) notices that the number of running Pods is below the desired count.
- New Pod Scheduled: The Kubernetes scheduler selects a healthy Node to create a new Pod.
- Use of Persistent Configuration (if enabled):
- The new Pod requests the same PersistentVolumeClaim (PVC) it was using before.
- For network storage (like NFS), the new Pod mounts it directly.
- For block storage (like EBS), Kubernetes and the CSI driver ensure the volume is safely detached from the failed Node and then attached to the new Node before being mounted by the new Pod.
- Service Discovery: The Service automatically routes traffic to this new, healthy Pod instance.
4. How to Check the Desired Replica Count#
For a Deployment:
1kubectl get deployment <deployment-name> -n <namespace> -o wide
Observe the
DESIRED
column.For a StatefulSet:
1kubectl get statefulset <statefulset-name> -n <namespace> -o wide
For a ReplicaSet:
1kubectl get replicaset <replicaset-name> -n <namespace> -o wide
5. Understanding the Pod’s READY Column and Replica Count#
In the output of kubectl get pods
, the denominator in the READY
column (e.g., the second 1
in 1/1
) represents the number of containers defined within a single Pod.
This is a different concept from the controller’s desired replica count. The desired replica count refers to how many instances of that Pod the controller should be running.
“The number of ready containers can be understood as ‘orchestration’”: Yes, a Pod can have multiple containers that are “orchestrated” together by Kubernetes, scheduled and managed as a single unit.
The controller’s desired replica count must be checked using kubectl get deployment/statefulset
.
6. About Persistent Storage#
If your Pod is not configured with persistent storage (i.e., it uses the Pod’s ephemeral storage like emptyDir
, or the container’s own temporary filesystem layer), then: when the Pod is deleted and recreated for any reason, all data written by the containers to their ephemeral storage will be lost.
The new Pod instance will start in a completely new, clean state.
7. Understanding Ports in a Service#
The flow of a request is typically as follows: External Request -> Node(NodeIP:nodePort) -> Service(ClusterIP:port) -> Pod(PodIP:targetPort)
targetPort
- Definition: The port on the backend Pod’s container to which the Service forwards traffic.
- Scope: Inside the Pod.
- Value: Can be a number (8080) or a port name defined in the Pod spec (http-api).
nodePort
- Definition: A static port exposed on each Node’s IP address when the Service type is
NodePort
orLoadBalancer
. - Scope: Outside the cluster Nodes.
- Value: The default range is 30000-32767. It allows access from outside the cluster via
http://<Node-IP>:<nodePort>
.
- Definition: A static port exposed on each Node’s IP address when the Service type is
port
- Definition: The port that the Service exposes on its own internal ClusterIP.
- Scope: Inside the cluster.
- Purpose: Other Pods within the cluster can access this service via
http://<ServiceName>.<Namespace>.svc.cluster.local:<port>
.
8. Pitfalls Encountered When Deploying Fluentd#
When deploying bitnami/fluentd
, I encountered the following issues (the successful values.yaml
is in Fluentd.yaml
):
- Missing CRI Plugin: The official chart was missing the
cri
plugin for the forward input, defaulting todrop all
, which prevented container logs from being correctly processed and parsed. - Missing Elasticsearch Plugin: The Aggregator configuration also lacked the Elasticsearch plugin, making it impossible to send logs to ES.
- Protocol Error: The default scheme for connecting to ES was
http
, but the ES port 9200 is typicallyhttps
, requiring a change. - Authentication Failure: Connecting to ES requires specifying a username and password (usually the Kibana login credentials), otherwise, it results in a 401 error.
- Namespace Label Issue: I discovered that all logs from the
efk
namespace were being rejected by ES. An AI suggestion pointed to labels containing.
and/
as a possible cause. Upon inspection, I found the namespace had an extraname: efk
label.Strangely, neither deleting the extra label nor using a Ruby script to replace special characters worked. However, the issue resolved itself the next day, confirming that the1# Incorrect namespace labels 2❯ kubectl get ns efk -o yaml 3apiVersion: v1 4kind: Namespace 5metadata: 6 creationTimestamp: "2025-06-06T06:56:03Z" 7 labels: 8 kubernetes.io/metadata.name: efk 9 name: efk # <-- This line is redundant 10... 11 12# Normal namespace labels 13❯ kubectl get ns monitoring -o yaml 14apiVersion: v1 15kind: Namespace 16metadata: 17 creationTimestamp: "2025-05-22T05:45:59Z" 18 labels: 19 kubernetes.io/metadata.name: monitoring 20 name: monitoring 21...
name
label was indeed the problem.
9. Ingress and MetalLB: Solving External Access#
- Ingress operates at Layer 7 (Application Layer) and can implement internal reverse proxying and load balancing. Services are accessed externally by hitting the Ingress Controller’s
nodePort
. - Drawback:
nodePort
must use a high-numbered port (e.g., 30000+), which is inconvenient and inelegant for external access. - Solution: Install the MetalLB plugin.
- Purpose: When a Service is of type
LoadBalancer
, MetalLB automatically assigns it an IP from a predefined external address pool, which becomes itsEXTERNAL-IP
. - Advantages:
- Eliminates the need to use high-numbered ports for access; you can use the IP directly.
- Provides a high-availability solution, avoiding the single point of failure that can occur with
iptables
-based traffic forwarding.
- Purpose: When a Service is of type