Skip to content

Scale-to-zero GitLab runners

The Terraform module implements a pattern that’s straightforward in the abstract but has a lot of moving parts in production. This page walks through the design.

┌──────────────────────────────────────────────────────────────────────┐
│ GitLab CI job submitted │
└──────────────────────────────────────────────────────────────────────┘
▼ poll / webhook
┌──────────────────────────────────────────────────────────────────────┐
│ GitLab Runner Manager (t4g.small, always-on, ARM64) │
│ - registered with GitLab (auth_token) │
│ - configured with the Docker Autoscaler executor │
│ - uses the Fleeting plugin: provider = aws │
└──────────────────────────────────────────────────────────────────────┘
▼ Fleeting → SetDesiredCapacity(+1) on the ASG
┌──────────────────────────────────────────────────────────────────────┐
│ Worker ASG │
│ min_size = 0, desired_capacity = 0 (literal scale-to-zero) │
│ MixedInstancesPolicy: │
│ - attribute-based selection (vCPU + memory + arch) │
│ - 100% Spot (on_demand_base_capacity = 0) │
│ - spot_allocation_strategy = price-capacity-optimized │
│ - capacity_rebalance = false │
│ - protect_from_scale_in = true │
└──────────────────────────────────────────────────────────────────────┘
▼ EC2 spot instance launches with custom AMI (Packer)
┌──────────────────────────────────────────────────────────────────────┐
│ Worker EC2 (lifetime: ~minutes per job, 2 jobs / instance default) │
│ - Docker pre-installed in the AMI │
│ - Fleeting plugin SSHes in & runs the GitLab job in Docker │
│ - On job end: instance terminated by Fleeting (or scale-in) │
└──────────────────────────────────────────────────────────────────────┘
▼ Job artifacts + cache pushed to S3
┌──────────────────────────────────────────────────────────────────────┐
│ S3 cache bucket (isolated per runner, 30-day lifecycle) │
└──────────────────────────────────────────────────────────────────────┘
▼ ASG scales back to zero
[done]

The manager is the only piece that’s always-on — it has to receive the job webhook from GitLab. t4g.small is the cheapest viable instance: Graviton ARM64, burstable, 1 GiB memory. Operating cost: a few dollars / month.

docker-machine is deprecated. Fleeting is GitLab’s modern AWS-native provider plugin. Critically, it talks to the ASG via the AWS API (SetDesiredCapacity) instead of provisioning EC2 directly — meaning the ASG owns the launch config, the Spot strategy, the IAM, the user-data.

on_demand_base_capacity = 0
on_demand_percentage_above_base_capacity = 0

For CI workloads, an interrupted job retries on a fresh runner. The economics massively favour Spot — 70–90% off On-Demand at fleet scale. Production-critical services are different; CI is the textbook case for 100% Spot.

Three Spot allocation strategies exist:

StrategyWhen
lowest-priceCost-only optimisation. Interrupts more. Wrong for CI.
capacity-optimizedPicks deepest pool. Costs more. Useful for long-running batch.
price-capacity-optimizedBalances both. AWS’s current recommendation. Right default for CI.

capacity_rebalance = false — opposite of what you might think

Section titled “capacity_rebalance = false — opposite of what you might think”

When this is true, AWS proactively replaces instances that have a rising interruption forecast. For long-running services that’s helpful. For CI it’s harmful — it surfaces as “instance disappeared mid-build” failures. Better to let the job finish and let actual Spot interruption notices handle next-job placement.

The ASG can’t externally terminate a runner while a job is running. Without this, you’d see ASG cool-down triggered scale-in killing in-flight jobs.

vcpu_count_min = 2
vcpu_count_max = 4
memory_mib_min = 8192
memory_mib_max = 16384
allowed_instance_types = ["c*", "m*", "r*"]
burstable_performance = "included"
local_storage_types = ["ssd"]
instance_generations = ["current"]

Critical for production Spot stability. A fixed m5.large ASG hits UnfulfillableCapacity the moment that single type is scarce in the AZ. With attribute-based selection, AWS picks from any current-gen c/m/r-family type matching the spec — a much deeper pool.

T-series is included via burstable_performance = "included" because for short CI jobs the burst credits absorb the CPU cost. For long-running jobs you’d excluded it.

GitLab Runner’s capacity_per_instance lets a single worker serve N jobs in sequence before being terminated. Setting it to 2 amortises boot cost meaningfully — without paying the latency cost of N≥3 where a stuck job blocks the second slot.

enable_s3_cache = true
s3_cache_expiration_days = 30

Each runner pool gets its own bucket. Lifecycle policy bounds cost. Critical for Node.js / Python jobs where the node_modules / .venv is the bulk of the work.

docker-ami.pkr.hcl produces a base AMI with Docker pre-installed and the cgroup/seccomp / containerd configuration runners need. Without it, every cold-start pays a Docker-install tax (~30–60s). With it, cold-start is just kernel + Docker daemon.

A fleet running 200 CI jobs/day at ~5 min/job, 2-job reuse → 100 instance-hours/day. On a c6g.large Spot at $0.025/hr that’s **$2.50/day in worker compute** plus the t4g.small manager ($8/month). Compare to a fixed 4× c6g.large On-Demand always-on fleet ($240/month) — roughly 7-8× cheaper for the same throughput, with better instance availability via attribute-based selection.