SLO basics
Topic: Monitoring basics
Summary
Define Service Level Objectives as target availability or latency. Use for alerting and capacity. Example: 99.9 percent uptime or p99 under 500ms. Use when you need to formalize reliability targets.
Intent: How-to
Quick answer
- Choose indicator: availability or latency. Set target: 99.9 percent or p99 under 500ms. Measure over 30-day window typically.
- Alert on error budget burn rate or when SLO is at risk. Do not alert on every SLO breach; use budget-based alerting.
- Review SLO with product and eng. Adjust targets and error budget policy. Document in runbook.
Prerequisites
Steps
-
Choose indicator and target
Availability or latency. Set target and window. Get agreement from stakeholders.
-
Measure and budget
Implement measurement. Compute error budget. Define burn-rate alerting.
-
Review
Review SLO and budget consumption. Adjust targets or capacity. Document.
Summary
Define SLO (availability or latency); measure and track error budget; alert on burn rate; review regularly.
Prerequisites
Steps
Step 1: Choose indicator and target
Availability or latency; target and window; stakeholder agreement.
Step 2: Measure and budget
Implement measurement; error budget; burn-rate alerts.
Step 3: Review
Review SLO and budget; adjust; document.
Verification
- SLO tracked; alerts fire on budget burn; runbook updated.
Troubleshooting
Always in budget — Target may be loose; tighten or add latency SLO. Always over — Improve reliability or relax target.