Platform engineering · playbook
Kubernetes from scratch on bare metal. Reproducible in an afternoon.
10 sequenced scripts. Cilium eBPF + ArgoCD + External Secrets Operator + Hashicorp Vault + Cloudflared Zero Trust. No managed control plane, no tribal knowledge — just a cluster a teammate can re-create before their coffee goes cold.
3
Control-plane nodes
10
Bootstrap scripts
HA
From day one
~3 h
End-to-end bootstrap
Why not just use managed?
Managed Kubernetes (GKE / EKS) is great until the economics of your workload flip upside-down — egress heavier than compute, a fleet of bare-metal hosts already paid for, or a private-VLAN networking story you want to keep inside the data center.
The real question was never "managed vs self-hosted." It was: can we self-host in a way that's boring in production and reproducible from a Git repo? No manual `kubectl apply`. No secrets pasted into YAML. No one engineer holding the knowledge of how to stand the cluster up.
The bar: a teammate opens a fresh set of bare-metal hosts, runs 10 scripts in order, and by the end has a cluster identical to production — same ingress, same secrets flow, same GitOps pipeline. This playbook is my distillation of that bootstrap.
Stack, with reasoning
kubeadm
Control plane3 control-plane nodes behind a load-balanced API endpoint. Classic, debuggable, no vendor lock-in. Worth the extra ceremony vs k3s when HA from day one is non-negotiable.
Cilium + Hubble
CNI — eBPF`kubeProxyReplacement: Strict` removes kube-proxy overhead; Hubble gives network-layer observability for free. The catch on Hetzner vSwitch: MTU must be 1350 to fit the provider's 1400 envelope — that one took a day to debug the first time.
CoreDNS
DNSCustom ConfigMap forwards internal domains to the private-VLAN resolvers and adds static hosts for services running outside the cluster (Postgres, Redis, etc.). One source of truth for name resolution.
NGINX Ingress Controller
IngressLabeled to a dedicated ingress node. Nothing fancy at Layer 7 — Cloudflared Zero Trust sits in front, so the cluster edge stays simple.
Cloudflared
Zero Trust ingressDeployed in the `infra` namespace, HTTP/2 to origin. No public load balancer, no exposed IPs. Internal services reachable only through authenticated Cloudflare Access policies. Kills a whole class of attack surface.
ArgoCD
GitOpsHelm-installed, pulling from a deployments repo via SSH deploy key. Webhook-triggered reconciliation on every push. AppSets per environment. After bootstrap, no manual `kubectl apply` — ever.
External Secrets Operator + Hashicorp Vault
SecretsESO reads from Vault KV v2 via scoped AppRoles per environment. Vault runs on a dedicated bare-metal host, not in-cluster — one less chicken-and-egg at recovery time. Secrets flow: Vault → ESO → Kubernetes Secret → Pod. No secret ever lands in Git.
The 10-script bootstrap
Each script is idempotent, commented, and assumes a fresh Ubuntu 22.04 host. Run in order. If one fails, fix it, re-run.
1-prerequisites.shHost setup
Network, SSH, firewall, apt sources, container runtime (containerd).
2-initialize-kubeadm.shControl-plane init
`kubeadm init` with pod + service subnets, external API endpoint on a load-balanced hostname.
3-deploy-cilium.shCNI
Helm install Cilium with eBPF masquerading, Hubble UI, `kubeProxyReplacement: Strict`, MTU 1350.
3.1-join-worker-nodes.shWorker join
Generates join commands for workers. Labels ingress nodes so NGINX schedules there.
4-test-connectivity.shSmoke test
Deploys a test pod per node and verifies cross-node + private-VLAN routing before anything else runs.
5-kubernetes-dashboard.shDashboard
Optional — read-only dashboard behind Cloudflared Access.
6-dns.shCoreDNS config
Applies the ConfigMap with internal VLAN forwarders and static hosts.
7-ingress.shIngress
NGINX Ingress Controller Helm chart, labeled node placement, TLS via cert-manager + Cloudflare.
8-cloudflared-tunnel.shZero Trust ingress
Cloudflared Deployment in `infra` namespace, tunnel token injected from Vault via ESO.
9-app-bootstrap.shBootstrap services
Namespaces + ServiceAccounts + image-pull secrets + any bridge workloads your apps need before ArgoCD takes over.
10-argocd.shGitOps
Helm-installs ArgoCD, creates projects + AppSets per environment, configures the GitHub webhook.
The gotchas (write these down)
Cilium MTU 1350 inside Hetzner vSwitch 1400
Hetzner's vSwitch wraps packets in an extra encapsulation. Cilium defaults to the interface MTU (1400), and packets drop silently after node join. Set `MTU: 1350` in the Cilium Helm values. It belongs in a comment in the CNI script so the next person doesn't re-discover it at 1am.
enableMasqueradeToRouteSource: true
Without this, pod traffic exits with the pod IP (which the provider's vSwitch has no route back for). Setting it to `true` makes Cilium masquerade to the node's VLAN IP. Once you know, obvious. Before you know, baffling.
ArgoCD webhook requires GitHub auth
The `argocd` namespace needs a webhook secret plus a GitHub App deploy key. If the secret is wrong, ArgoCD falls back to 3-minute polling — fine for dev, painful for prod.
ESO ClusterSecretStore requires Vault AppRole bootstrap
Before ESO can read secrets, Vault needs the per-environment AppRoles configured with appropriate policies. Automate this with an Ansible playbook or a bootstrap script — don't do it by hand the first time, let alone the second.
Need this for your team?
I do Kubernetes bootstrap consulting: managed-to-bare-metal migrations, GitOps rollouts with ArgoCD + ESO + Vault, Cilium eBPF adoption, and Cloudflared Zero Trust ingress. Remote, paid engagements — typically 2–6 weeks per project. Based in Spain (CET) until October 2026, then Mexico (CST).
Email [email protected]Stateful-migration playbook
The other hard-infra story: an Elasticsearch major-version migration on a live platform, zero regression, with a shape-compare safety net and a per-service SPEC → Build → QA workflow.