SRE Playbook Cheat Sheet

Last Updated: November 21, 2025

SLI/SLO Primer

Concept Goal
SLI Measured signal (latency, error rate)
SLO Target for the SLI with burn-rate
Error budget Budget to spend on launches/incidents

Incident Steps

Declare incident
Notify stakeholders + update status page
Create severity channels
Use dedicated chat + triage doc
Contain blast radius
Rollback or reroute as needed

Postmortem Signal

Document timeline, impact, and prevention actions; share with teams and track follow-ups.

💡 Pro Tip: Tie every new feature to an SLO impact and rehearse runbooks quarterly.
← Back to Data Science & ML | Browse all categories | View all cheat sheets