Kyverno fleet rollout — policy-as-code on every cluster
Sep 18, 2025 — initial fleet-wide rollout. Apr–May 2026 — migration to ArgoCD-managed Kyverno + policy expansion. Kyverno is the policy engine that catches “no PDB on this Deployment” and the engine that generates VPAs.
The brief
Section titled “The brief”Most platforms ship kubectl apply policy as an afterthought: someone writes a checklist; engineers forget; production has services with no PodDisruptionBudgets, no resource requests, no liveness probes. The right answer is to encode the policy and let the cluster enforce it.
I rolled Kyverno fleet-wide on Sep 18, 2025 — announcement to #rnd-devops, every EKS cluster. By May 2026 the policy set had grown to include VPA generation, CRD drift suppression, and migrated from Terraform-managed Helm onto ArgoCD ApplicationSets.
What it enforces
Section titled “What it enforces”Auto-generated PodDisruptionBudgets
Section titled “Auto-generated PodDisruptionBudgets”apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: generate-pdbspec: generateExisting: true rules: - name: generate-pdb-for-deployment match: any: - resources: kinds: [Deployment] preconditions: all: - key: "{{ request.object.spec.replicas || `1` }}" operator: GreaterThan value: 1 generate: apiVersion: policy/v1 kind: PodDisruptionBudget name: "{{ request.object.metadata.name }}-pdb" namespace: "{{ request.object.metadata.namespace }}" synchronize: true data: spec: minAvailable: 1 selector: matchLabels: "{{ request.object.spec.selector.matchLabels }}"Every multi-replica Deployment that ships without a PDB gets one auto-generated. Eliminated the “we forgot to add a PDB” class of incident across the fleet.
Generated observation VPAs — the Kyverno × VPA tie-in
Section titled “Generated observation VPAs — the Kyverno × VPA tie-in”This is the more interesting policy. Every Deployment and StatefulSet across the fleet gets an auto-generated observation-mode VPA so we have right-sizing data without having to author per-workload VPAs.
The design call was to split the rule by kind rather than have one rule mutate both Deployment and StatefulSet:
rules: - name: generate-vpa-for-deployment match: { any: [{ resources: { kinds: [Deployment] } }] } generate: { apiVersion: autoscaling.k8s.io/v1, kind: VerticalPodAutoscaler, ... }
- name: generate-vpa-for-statefulset # <-- separate rule match: { any: [{ resources: { kinds: [StatefulSet] } }] } generate: { apiVersion: autoscaling.k8s.io/v1, kind: VerticalPodAutoscaler, ... }Why two rules instead of one mutating the kind selector? Cleaner, fewer side effects when the controller reconciles, easier to disable per-kind if we hit a regression. The merged-rule version worked but it was harder to reason about during the upgrade cycle.
CRD drift suppression
Section titled “CRD drift suppression”Kyverno’s chart renders labels: {} / annotations: {} on its CRDs; Kubernetes normalises them to null server-side; ArgoCD then shows a permanent OutOfSync for the CRD. Fix: broaden ignoreDifferences in the Kyverno ApplicationSet for apiextensions.k8s.io/CustomResourceDefinition at /metadata/labels and /metadata/annotations. This is the kind of operational detail that’s two lines of YAML and saves you from chasing phantom drift for a week.
How it’s deployed
Section titled “How it’s deployed”After the initial Sep 2025 fleet-wide rollout (via Terraform Helm), Kyverno was migrated to an ArgoCD ApplicationSet in April–May 2026. The ApplicationSet fans Kyverno to every workload cluster from a single Helm chart in Git; new clusters get Kyverno automatically.
Webhook HA — 2 replicas in prod for resilience against pod restarts during cert rotations. (The cert-manager / webhook race is its own gotcha; pin the cert-manager Application sync wave so the cert is renewed before the webhook reload.)
Operational notes
Section titled “Operational notes”- PDB on Kyverno itself. Kyverno’s admission and cleanup controllers each ship with 1 replica + a PDB
minAvailable=1. On a managed node group with no headroom, an EKS upgrade can’t evict them. Scale both to 2 replicas before upgrade pipelines run. (Full gotcha walk-through in the EKS upgrade playbook.) - Policy reports vs admission webhook. I run policies in
auditmode for the first week per fleet, then promote toenforceonce the policy reports show no surprises. - The generated-VPA pattern is the link. This is where Kyverno earns its keep beyond “validation engine” — it becomes the resource generation engine for the policy domain. Every workload gets a sidecar policy artifact (PDB, VPA) without any author-side opt-in.
Related
Section titled “Related”- “Automating Pod Disruption Budgets with Kyverno” — my Zencity Engineering article (Dec 2, 2025) on the auto-PDB ClusterPolicy and why it matters when Karpenter consolidates nodes
- EKS major-version upgrade playbook — the Kyverno PDB gotcha during upgrades
- GitOps engine — Terraform → ArgoCD — the ApplicationSet pattern Kyverno migrated onto
- Kyverno docs