Back to browse
GitHub Repository

Kubernetes operator for managing SLOs with error budget tracking

32 starsGo

SLOK – SLO composition with traffic-weighted service chains in K8s

by lep_qq·Mar 4, 2026·1 point·0 comments

Post Description

I've been building a Kubernetes operator for SLO management (SLOK) and just shipped a feature I haven't seen elsewhere: WEIGHTED_ROUTES composition.

The problem

Most SLO tools treat a user journey as a simple chain: if any service fails, the whole thing fails. But real traffic doesn't work that way. In a checkout flow, 90% of users might skip the coupon service entirely. If the coupon service has 99.5% availability, does that really pull your checkout SLO down to 99.5%? No — because most of your users never touch it.

The model

WEIGHTED_ROUTES lets you describe which percentage of traffic flows through which service chain. Each chain is an implicit AND (all services in the chain must succeed). The composed error rate is:

e_total = 1 - Σ( weight_i × Π(1 - e_j) ) For the checkout example (90% skip coupon, 10% use it):

e_total = 1 - ( 0.9 × (1 - e_base) × (1 - e_payments) + 0.1 × (1 - e_base) × (1 - e_coupon) × (1 - e_payments) ) SLOK translates this formula directly into Prometheus recording rules wired into the standard multi-window burn rate pipeline.

The YAML

kind: SLOComposition spec: target: 99.9 window: 30d objectives: - name: base ref: { name: checkout-base-slo } - name: payments ref: { name: payments-slo } - name: coupon ref: { name: coupon-slo } composition: type: WEIGHTED_ROUTES params: routes: - name: no-coupon weight: 0.9 chain: [base, payments] - name: with-coupon weight: 0.1 chain: [base, coupon, payments] alerting: burnRateAlerts: enabled: true The operator generates the PrometheusRules automatically. You get burn rate alerts on the composed SLO, not just on individual services.

Other things SLOK does

AND_MIN composition (worst-case across services) Built-in SLI templates for http-availability, http-latency, kubernetes-apiserver Automatic error budget tracking exposed in .status Event correlation: when a burn rate spike is detected, SLOK creates an SLOCorrelation resource listing recent Deployments, ConfigMap changes, and cluster events that may have caused it — with an optional LLM-enhanced summary (Llama 3.3 70B via Groq) WEIGHTED_ROUTES is alpha. Feedback on the API shape is welcome.

Repo: https://github.com/federicolepera/slok

Similar Projects

Security●●Solid

New NPM Supply chain Attack?

Docker isolation + tcpdump catches malicious npm installs before they touch your machine.

Solve My ProblemBig Brain
adamgonda
2024d ago
Security●●Solid

Gohpts-IPv4/IPv6/TCP/UDP Transparent Proxy with ARP/NDP/Rdnss Spoofing

Built-in IPv6 NDP spoofing turns SOCKS chains into transparent proxies.

WizardryShip It
shadowy-pycoder
103mo ago