Ahmad Asmar · Portfolio

A platform,
by hand.

Staff DevOps Platform Lead CKA Open to opportunities · GMT+3

§ I. Field notes + + +

Six+ years scaling cloud-native infrastructure across AWS, GCP, and Azure. The last three of those as senior platform owner for a SaaS-scale AWS-native platform: grown from a hybrid Azure+AWS estate into a 20-account, 4-region, 20-EKS-cluster system serving 25+ microservices and 20+ government customers across the US and UK. Sole DevOps engineer for the first 18 months, then primary IC for the next 18. The Azure-to-AWS migration close-out shipped with zero customer-visible downtime; the last fleet-wide EKS upgrade landed in two working days.

Before that: Freightos (2021–2023) — GCP, Kubernetes, GitOps with ArgoCD, monitoring; PDF Solutions (2020–2021) — cloud infrastructure-as-code; plus two years of IT-support roots at Partners for Sustainable Development and Palestine Telecommunications. Open-source maintainer of a Terraform module powering self-service GitLab runner fleets (3,767 downloads on the Terraform Registry, 2 years of maintenance) and a Claude Code skills marketplace for DevOps workflows (158 ★ · 32 forks).

3,767

Terraform Registry downloads

ahmedasmar/gitlab-docker-autoscaler-runner — v0.6.8, GPL-3.0

158 ★

devops-claude-skills

32 forks. Claude Code skills marketplace for DevOps

20 / 4

EKS clusters · AWS regions

us-east-1 · eu-west-1 · ca-central-1 · eu-west-2

37%

faster TF pipelines

8:52 → 5:36 fleet-wide via cache-key fix + Modern Terraform CI/CD

2 days

EKS 1.33 → 1.34 fleet upgrade

~10 production clusters, single-engineer led

downtime · Azure → AWS

18-month migration close-out, Dec 2024

§ II. Featured work + + +

№ 01

Terraform GitLab Runner module — scale-to-zero on AWS

Self-authored Terraform module powering self-service GitLab runner fleets: 100% Spot, attribute-based instance selection, Fleeting plugin for the Docker Autoscaler executor, multi-arch. The module that runs a production CI fleet at SaaS scale — and 3,767 downloads says other organisations had the same problem.

3,767 downloads Terraform Registry v0.6.8 GPL-3.0

→

№ 02

DevOps Claude Skills — a marketplace, not a one-off

Curated Claude Code skills for DevOps workflows: ArgoCD cluster onboarding via Pod Identity + AssumeRole, AWS SSO auth recovery, Terraform → ArgoCD migration, k8s triage, FinOps. Real community traction in 7 months.

158 ★ · 32 forks Claude Code 10+ skills

→

№ 03

Cross-account TargetGroupBinding — no NLB hop

ALB in one AWS account, EKS workloads in another — the textbook answer puts an NLB in each spoke account. With AWS LBC v3, you can register pod IPs directly into a cross-account target group. Validated on a pilot cluster, then rolled to all six staging clusters in a single day.

AWS LBC v3 EKS Pod Identity Istio Ambient

→

№ 04

GitOps engine — Terraform → ArgoCD across a fleet

3-month epic standing up dedicated devops clusters, the ApplicationSet pattern for multi-cluster fan-out, and cluster onboarding via Pod Identity + cross-account AssumeRole — no bearer tokens. Authored ~70% of the underlying GitOps repo.

299 commits in 3 months ArgoCD ApplicationSets Server-Side Diff

→

№ 05

Kyverno fleet rollout — policy-as-code on every cluster

Auto-PDB ClusterPolicy on every multi-replica Deployment fleet-wide. Observation-VPA generation via a separate-by-kind rule design. CRD drift suppression for ArgoCD. Migrated from Terraform Helm onto an ApplicationSet.

Fleet-wide Sep 2025 Kyverno VPA ArgoCD

→

№ 06

Crossplane — IaC v2 bootstrap

Crossplane v2.5.3 (Upbound) bootstrapped on dedicated devops clusters in Pipeline mode. The first usable Composition — AppWorkloadBucket with S3 versioning + lifecycle + CORS — shipped to prod. Declarative AWS resources reconciled by k8s controllers instead of terraform apply.

v2.5.3 in prod Crossplane Upbound XR / Composition

→

§ III. Practice + + +

Multi-account AWS at scale

20 SSO-managed accounts across staging + prod orgs. Transit Gateway as the only inter-VPC primitive (no peering). IAM Identity Center via JumpCloud SAML. SCPs for guardrails. AWS provider 4 → 6 lifecycle managed in shared modules.

Kubernetes platform

20 EKS clusters on Bottlerocket 1.59 + Karpenter 1.12 with >=gen5 instance generation gating. 6 major k8s upgrades over 3 years with zero rollbacks. Pod Identity replacing IRSA fleet-wide. AWS LBC v3 with cross-account TargetGroupBinding.

Policy & governance

Kyverno fleet-wide with auto-PDB generation for every workload missing one, ClusterPolicy generating observation VPAs across all deployments + statefulsets, and ignoreDifferences patterns for CRD cosmetic-drift suppression.

IaC v2 — Crossplane

Crossplane v2.5.3 (Upbound) bootstrapped on shared devops clusters in Pipeline mode. AppWorkloadBucket Composition with S3 versioning, lifecycle, and CORS shipped. Webhook scaled to 2 replicas in prod; requests == limits for predictable footprint.

Autoscaling — VPA + Karpenter

VPA live on every staging and prod cluster — admission controller + a Kyverno ClusterPolicy that generates observation VPAs for every Deployment and StatefulSet (via two separate rules to keep the design clean). Karpenter NodePools restricted to gen5+ instances after the 2024 NLB-m3 incident.

Service mesh — Istio Ambient + Gateway API

12-phase migration plan; shipped Phases 0/1/2.0/2.1/2.2 to all 6 staging clusters in a 2-week sprint (125 commits across 8 repos). Sidecar-less Ambient over per-pod sidecar — no restart-on-upgrade. HTTPRoute auto-generation from monochart.

Modern Terraform CI/CD

Validate → Checkov → plan-in-MR → Infracost → auto-apply on merge. Plan and cost diff posted as MR comments so reviewers see what's about to change in AWS and how much it costs. Cut fleet pipelines from 8:52 to 5:36.

Observability

Datadog primary — APM, logs, synthetics, Operator-managed agents on EKS. SKU renegotiation (Ephemeral Infra + APM, Infra vCPU) eliminated the container commit floor. Custom monitors as Terraform.

§ IV. Published + + +

M · 01

Automating Pod Disruption Budgets with Kyverno

How a SaaS platform used Kyverno to auto-generate PDBs for every microservice that lacked one — preventing service downtime when Karpenter consolidates nodes and closing the operational gap of "we forgot to add a PDB."

Zencity Engineering Dec 2, 2025 Kyverno Karpenter

↗

M · 02

Building a Local Dev Platform with Kubernetes, Tilt, and local GitLab pipelines

A local dev platform that mirrors production: k3d + Traefik locally, EKS in the cloud, identical Kustomize bases and overlays across environments. Tilt for rapid feedback; GitLab CI for immutable image promotion from staging to prod.

Medium Apr 9, 2026 k3d Tilt GitLab CI

↗

§ V. Writing + + +

A · 01

Scale-to-zero GitLab runners on AWS

How a CI workload arrives, an EC2 Spot instance boots, runs the job, and the ASG returns to zero. Attribute-based selection, price-capacity-optimized allocation, deliberate capacity_rebalance = false, Packer AMI builder.

GitLab CI EC2 Spot Karpenter-adjacent

→

A · 02

Cross-account TargetGroupBinding (AWS LBC v3)

The IAM trust, the security-group topology, and the readiness-gate ordering that makes the NLB hop go away. Three IAM gotchas account for > 90% of the debugging time; they're catalogued here so the next person doesn't have to find them live.

AWS LBC v3 IAM Pod Identity

→

A · 03

Terraform → ArgoCD — migrating an EKS fleet without downtime

Dedicated devops clusters. Self-managing app-of-apps bootstrap. ApplicationSet fan-out. Server-Side Diff at the controller level. The migration sequence that takes an addon off Terraform Helm without restarting workloads.

ArgoCD ApplicationSets Server-Side Diff

→

A · 04

EKS major-version upgrade — fleet playbook

Honed across six k8s versions (1.22 → 1.34) on roughly fifteen production clusters. Module bumps first, GitOps versions separate, cluster_version last, one cluster at a time. The Kyverno PDB gotcha. The ArgoCD OOM gotcha. The NLB-vs-gen-3-instance gotcha. The unlock-runbook you need ready.

EKS Karpenter Bottlerocket ArgoCD

→

A · 05

Modern Terraform CI/CD on GitLab

Validate → Checkov → plan → Infracost → auto-apply. The OIDC dance, the resource_group: trick that makes concurrent merges safe, and the cache-key fix that bought a 37% pipeline speedup fleet-wide.

GitLab CI Checkov Infracost OIDC

→

A · 06

EKS Pod Identity — fleet rollout

Replacing IRSA with Pod Identity across 10 production + 5 staging EKS clusters in two days. Why before_compute = true is non-optional. The cross-account variant. Why AWS LBC v3 needs the SA annotation removed.

Pod Identity IAM AWS LBC v3

→

A platform,by hand.