All DevOps & Cloud Guides

AKS Backup Patterns (Batch 8)

Backup etcd state and persistent volumes via Velero or Azure Backup. Quick reference guide with examples and best practices. Updated November 2025.

Alert Flood Mitigation

Silence noisy alerts, throttle them, and adjust routing. Quick reference guide with examples and best practices. Updated November 2025.

Alert Flood Mitigation (Batch 8)

Use silences, grouping, and rate limiting to tame flood. Quick reference guide with examples and best practices. Updated November 2025.

Ansible

IT automation and configuration management. Use ansible-vault to secure sensitive variables Quick reference guide with examples and best practices. Updated November 2025.

Ansible Playbooks

Define playbooks with hosts, tasks, handlers, and roles Quick reference guide with examples and best practices. Updated November 2025.

Argo CD Sync Waves

Use sync waves to control deployment order and gates during GitOps syncs. Quick reference guide with examples and best practices. Updated November 2025.

ArgoCD

GitOps continuous delivery for Kubernetes. ArgoCD continuously monitors Git for changes Quick reference guide with examples and best practices. Updated November 2025.

Automation Ops Checklist

Document automation workflows, approval steps, and fallbacks. Quick reference guide with examples and best practices. Updated November 2025.

AWS Cost Operations

Monitor budgets, schedule reports, and optimize spend Quick reference guide with examples and best practices. Updated November 2025.

AWS Cost Optimization

Analyze spend, rightsizing opportunities, and savings plans monthly. Quick reference guide with examples and best practices. Updated November 2025.

AWS EKS Bottlerocket

Use Bottlerocket AMIs for immutable node pools with tuned kubelets. Quick reference guide with examples and best practices. Updated November 2025.

AWS EKS Encryption 1

Encrypt secrets and etcd Quick reference guide with examples and best practices. Updated November 2025.

AWS EKS Encryption 11

Encrypt secrets and etcd Quick reference guide with examples and best practices. Updated November 2025.

AWS EKS Encryption 21

Encrypt secrets and etcd Quick reference guide with examples and best practices. Updated November 2025.

AWS EKS Encryption 31

Encrypt secrets and etcd Quick reference guide with examples and best practices. Updated November 2025.

AWS EKS Encryption 41

Encrypt secrets and etcd Quick reference guide with examples and best practices. Updated November 2025.

AWS GuardDuty Response

Classify GuardDuty findings, correlate context, and trigger remediation. Quick reference guide with examples and best practices. Updated November 2025.

AWS IAM Access Advisor

Use Access Advisor data to remove stale IAM actions from roles. Quick reference guide with examples and best practices. Updated November 2025.

AWS IAM Policies

Design IAM policies and test with the simulator Quick reference guide with examples and best practices. Updated November 2025.

AWS Lambda Health Probes

Send regular synthetic requests to Lambda entry points to detect slowdowns. Quick reference guide with examples and best practices. Updated November 2025.

AWS Lambda Powertools

Leverage Powertools decorators for structured logs, telemetry, and idempotency. Quick reference guide with examples and best practices. Updated November 2025.

AWS Monitoring

CloudWatch alarms, X-Ray tracing, and dashboards Quick reference guide with examples and best practices. Updated November 2025.

AWS Resilience Patterns

Design resilient AWS workloads with retries, multi-AZ + fallback Quick reference guide with examples and best practices. Updated November 2025.

AWS S3 Policy Review (Batch 8)

Check public access block settings and fine-tune bucket policies. Quick reference guide with examples and best practices. Updated November 2025.

AWS Security Hub Audit

Quick diagnostics for AWS Security Hub, IAM, CloudTrail, and GuardDuty Quick reference guide with examples and best practices. Updated November 2025.

AWS Services

Amazon Web Services essentials. Navigate AWS cloud services efficiently Quick reference guide with examples and best practices. Updated November 2025.

AWS SNS+SQS Anti-Patterns

Avoid infinite retries, missing DLQs, and unbounded queue growth. Quick reference guide with examples and best practices. Updated November 2025.

Azure Container Instances

Run container workloads without Kubernetes by relying on ACI. Quick reference guide with examples and best practices. Updated November 2025.

Azure Cost Alerts (Batch 8)

Send budget alerts when spend approaches limits. Quick reference guide with examples and best practices. Updated November 2025.

Azure DevOps Pipeline Cheats

Design multi-stage pipelines with reusable templates and approvals. Quick reference guide with examples and best practices. Updated November 2025.

Azure Function Proxies

Add proxies for path rewrites, auth, and caching in front of Functions. Quick reference guide with examples and best practices. Updated November 2025.

Azure Functions

HTTP/timer/cosmos triggers plus durable functions Quick reference guide with examples and best practices. Updated November 2025.

Azure Logging

Collect diagnostics + logs in Azure Quick reference guide with examples and best practices. Updated November 2025.

Azure Monitor Log Analytics (Batch 8)

Run Kusto queries and alerts within Log Analytics. Quick reference guide with examples and best practices. Updated November 2025.

Azure Monitor-Based Alerts

Build metric alerts with dynamic thresholds and action groups. Quick reference guide with examples and best practices. Updated November 2025.

Azure Policy Baselines

Define policy sets that cover identity, network, storage, and cost guardrails. Quick reference guide with examples and best practices. Updated November 2025.

Azure Private Endpoints

Expose Azure services via private endpoints tied to VNets. Quick reference guide with examples and best practices. Updated November 2025.

Azure Spot VM Operations

Use Azure Spot VMs while handling eviction notices and fallback hosts. Quick reference guide with examples and best practices. Updated November 2025.

Azure Workload Identity 10

Managed identity best practices Quick reference guide with examples and best practices. Updated November 2025.

Azure Workload Identity 20

Managed identity best practices Quick reference guide with examples and best practices. Updated November 2025.

Azure Workload Identity 30

Managed identity best practices Quick reference guide with examples and best practices. Updated November 2025.

Azure Workload Identity 40

Managed identity best practices Quick reference guide with examples and best practices. Updated November 2025.

Azure Workload Identity 50

Managed identity best practices Quick reference guide with examples and best practices. Updated November 2025.

Blue/Green CI/CD (Batch 8)

Use stacks for green/blue deployments and swap load balancers once ready. Quick reference guide with examples and best practices. Updated November 2025.

Chaos Engineering Controls

Introduce controlled faults to prove resilience while minimizing blast radius. Quick reference guide with examples and best practices. Updated November 2025.

CI/CD Best Practices

Continuous Integration and Deployment Quick reference guide with examples and best practices. Updated November 2025.

CI/CD Branch Gating

Require passing builds, approvals, and gating policies before merges. Quick reference guide with examples and best practices. Updated November 2025.

CI/CD Canary Workflows

Orchestrate canary deployments using traffic splits and health gates. Quick reference guide with examples and best practices. Updated November 2025.

CI/CD Canary Workflows (Batch 8)

Deploy small percentages of traffic to new builds using feature flags and health gates. Quick reference guide with examples and best practices. Updated November 2025.

CI/CD Security

Gate pipelines with scans, secrets hygiene, and approvals Quick reference guide with examples and best practices. Updated November 2025.

CI/CD Signing & Notarization

Sign releases and optionally notarize containers/binaries. Quick reference guide with examples and best practices. Updated November 2025.

CircleCI

Cloud-based CI/CD platform. Use CircleCI orbs to simplify configuration Quick reference guide with examples and best practices. Updated November 2025.

Cloud Basics

AWS, GCP, and Azure fundamentals Quick reference guide with examples and best practices. Updated November 2025.

Cloudflare Workers

Deploy edge functions, persist data, and configure routes Quick reference guide with examples and best practices. Updated November 2025.

Consul Service Mesh

Use Consul intentions, ACLs, and proxies to secure service-to-service traffic. Quick reference guide with examples and best practices. Updated November 2025.

Data Version Control

Use DVC/git-lfs to version data plus experiments Quick reference guide with examples and best practices. Updated November 2025.

Datadog

Monitoring and observability platform. Datadog provides unified view across infrastructure and apps Quick reference guide with examples and best practices. Updated November 2025.

DigitalOcean

DigitalOcean droplets, spaces, app platform, and management Quick reference guide with examples and best practices. Updated November 2025.

Docker

Container management and Docker commands Quick reference guide with examples and best practices. Updated November 2025.

Docker BuildKit Optimizations

Use BuildKit features like cache imports, secret mounts, and parallel build stages to shrink build time Quick reference guide with examples and best practices. Updated November 2025.

Docker Compose Health

Add health probes to compose services and restart policies. Quick reference guide with examples and best practices. Updated November 2025.

Docker Compose Secrets (Batch 8)

Reference secrets objects rather than inline env vars. Quick reference guide with examples and best practices. Updated November 2025.

Docker Healthcheck Definitions

Define `HEALTHCHECK` commands and intervals for each service. Quick reference guide with examples and best practices. Updated November 2025.

Docker Image Layer Caching

Structure Dockerfiles so stable layers are reused across builds. Quick reference guide with examples and best practices. Updated November 2025.

Docker Nexus Proxy

Proxy Docker Hub via Nexus to cache layers and limit outbound costs. Quick reference guide with examples and best practices. Updated November 2025.

Docker Non-Root Best Practices

Switch to non-root users and drop capabilities during image build. Quick reference guide with examples and best practices. Updated November 2025.

Docker Practices

Slim images, multistage builds, and secure runtime habits Quick reference guide with examples and best practices. Updated November 2025.

Dockerfile Security (Batch 8)

Use multi-stage builds, drop root, and scan images. Quick reference guide with examples and best practices. Updated November 2025.

EKS Fargate Spot

Run bursty or fault-tolerant pods on Fargate Spot to reduce spend. Quick reference guide with examples and best practices. Updated November 2025.

EKS IAM Roles for Service Accounts (Batch 8)

Bind IAM roles to service accounts using IRSA. Quick reference guide with examples and best practices. Updated November 2025.

GCP DNS Peering (Batch 8)

Peering Cloud DNS zones into VPCs for custom domain resolution. Quick reference guide with examples and best practices. Updated November 2025.

GCP IAM Conditions

Limit access by enforcing conditions on service accounts or users. Quick reference guide with examples and best practices. Updated November 2025.

GCP Private Service Connect

Expose services privately via Private Service Connect endpoints. Quick reference guide with examples and best practices. Updated November 2025.

GCP Secret Manager Lifecycle

Version secrets, rotate them automatically, and track IAM bindings. Quick reference guide with examples and best practices. Updated November 2025.

GCP Service Mesh Primer

Configure Anthos Service Mesh traffic policies and telemetry. Quick reference guide with examples and best practices. Updated November 2025.

Git

Git version control cheat sheet with essential commands for branching, merging, committing, and collaboration. Quick reference guide with examples and best practices. Updated November 2025.

Git Branching Strategies

Choose models (GitHub flow, GitLab flow, trunk-based) and keep merges predictable Quick reference guide with examples and best practices. Updated November 2025.

Git CLI Tips

Speed up CLI workflows with aliases, rerere, and stash Quick reference guide with examples and best practices. Updated November 2025.

Git Commit Message Standards

Use conventional prefixes, scopes, and tidy bodies for automation. Quick reference guide with examples and best practices. Updated November 2025.

Git LFS Workflows

Keep large binaries in Git LFS and track quota usage. Quick reference guide with examples and best practices. Updated November 2025.

Git Secret Lint

Scan staged files with `git-secrets` or `detect-secrets` before commits. Quick reference guide with examples and best practices. Updated November 2025.

Git Workflows (Trunk, Feature, Shape)

Compare trunk-based, feature, and shape-up workflows. Quick reference guide with examples and best practices. Updated November 2025.

GitHub Actions

CI/CD automation for GitHub. Use actions/cache to speed up workflow runs Quick reference guide with examples and best practices. Updated November 2025.

GitHub Actions Advanced

Advanced GitHub Actions - workflows, matrix builds, and reusable actions Quick reference guide with examples and best practices. Updated November 2025.

GitHub Actions Matrix Builds

Define matrices to test across environments efficiently. Quick reference guide with examples and best practices. Updated November 2025.

GitHub Actions Security

Secure your GitHub Actions pipelines with scans, secrets, and policies Quick reference guide with examples and best practices. Updated November 2025.

GitLab CI/CD

Define YAML jobs, caching, and rules Quick reference guide with examples and best practices. Updated November 2025.

GitOps

GitOps principles and tools - ArgoCD and FluxCD Quick reference guide with examples and best practices. Updated November 2025.

GitOps Observability

Monitor GitOps controllers for drift, errors, and delivery latency Quick reference guide with examples and best practices. Updated November 2025.

GitOps Workflows

Declarative delivery, automation, and drift detection Quick reference guide with examples and best practices. Updated November 2025.

GKE Resource Quota Monitoring

Monitor `ResourceQuota` usage per namespace so teams avoid hot nodes. Quick reference guide with examples and best practices. Updated November 2025.

Google Cloud Platform

GCP services overview. Use BigQuery for large-scale data analytics Quick reference guide with examples and best practices. Updated November 2025.

Grafana

Data visualization and monitoring. Use template variables to create reusable dashboards Quick reference guide with examples and best practices. Updated November 2025.

Grafana Dashboard Shorthand

Quickly assemble dashboards with templates, panel links, and alert rules. Quick reference guide with examples and best practices. Updated November 2025.

Grafana Panel Library

Share JSON panels as a library and version them in code. Quick reference guide with examples and best practices. Updated November 2025.

Grafana Snapshot Share

Generate snapshots to share dashboards without login credentials. Quick reference guide with examples and best practices. Updated November 2025.

Grafana Template Variables

Use template variables for environment, region, and service filters. Quick reference guide with examples and best practices. Updated November 2025.

GuardDuty Threat Hunter (Batch 8)

Correlate findings with AWS config and CloudTrail for quick hunting. Quick reference guide with examples and best practices. Updated November 2025.

HashiCorp Consul

Service mesh and discovery. Consul provides both service discovery and mesh Quick reference guide with examples and best practices. Updated November 2025.

HashiCorp Vault

Secrets and encryption management. Use dynamic secrets instead of static credentials Quick reference guide with examples and best practices. Updated November 2025.

Helm

Kubernetes package manager. Use helm template to preview manifests before install Quick reference guide with examples and best practices. Updated November 2025.

Helm Chart Structure

Organize templates, values, and hooks for reusable Helm packages Quick reference guide with examples and best practices. Updated November 2025.

Helm Chart Templating

Structure values, templates, and helpers for maintainable charts. Quick reference guide with examples and best practices. Updated November 2025.

Helm Linting

Run `helm lint` and schema checks to catch template issues. Quick reference guide with examples and best practices. Updated November 2025.

Istio

Service mesh platform. Istio adds observability, security, and control to microservices Quick reference guide with examples and best practices. Updated November 2025.

Istio Ingress Gateway

Expose services through Istio Gateways with TLS policies and routing rules. Quick reference guide with examples and best practices. Updated November 2025.

Jenkins

Open-source automation server. Use declarative pipelines for simpler syntax Quick reference guide with examples and best practices. Updated November 2025.

Kubernetes

K8s commands and concepts Quick reference guide with examples and best practices. Updated November 2025.

Kubernetes Cost Optimization

Node pools, schedule tuning, and spot workloads Quick reference guide with examples and best practices. Updated November 2025.

Kubernetes GitOps Checklist

Capture repo layout, bootstrap repo credentials, and validate sync targets. Quick reference guide with examples and best practices. Updated November 2025.

Kubernetes Liveness Patterns (Batch 8)

Use probe endpoints to restart stuck containers. Quick reference guide with examples and best practices. Updated November 2025.

Kubernetes Network Policies

Define ingress and egress rules to limit pod communication. Quick reference guide with examples and best practices. Updated November 2025.

Kubernetes Operator Patterns

Structure Kubernetes operators with clear CRDs, leader election, and retries. Quick reference guide with examples and best practices. Updated November 2025.

Kubernetes Operators

Design CRDs, controllers, and reconciliation loops for reliable automation Quick reference guide with examples and best practices. Updated November 2025.

Kubernetes Pod Troubleshooting

Triaging pod restarts, CrashLoops, and silent failures. Quick reference guide with examples and best practices. Updated November 2025.

Kubernetes RuntimeClass

Define RuntimeClass to select alternate container runtimes (gvisor, kata). Quick reference guide with examples and best practices. Updated November 2025.

Kubernetes Security

Network policies, RBAC, and supply-chain protections Quick reference guide with examples and best practices. Updated November 2025.

Kubernetes StorageClasses

Tune provisioners, reclaim policies, and binding modes per workload. Quick reference guide with examples and best practices. Updated November 2025.

Kubernetes Telemetry

Collect metrics, traces, and logs centrally Quick reference guide with examples and best practices. Updated November 2025.

Lambda Async Invocation Tips

Handle async events with DLQs and idempotent functions. Quick reference guide with examples and best practices. Updated November 2025.

Lambda@Edge

Associate functions with CloudFront events and monitor latency Quick reference guide with examples and best practices. Updated November 2025.

Linux Audit Rules

Use auditctl to watch sensitive files and commands. Quick reference guide with examples and best practices. Updated November 2025.

Linux Commands

Essential Linux terminal commands Quick reference guide with examples and best practices. Updated November 2025.

Linux iptables Tables (Batch 8)

Understand filter, nat, and raw tables frames. Quick reference guide with examples and best practices. Updated November 2025.

Linux OOM Score Tuning

Lower OOM scores for critical daemons so the killer prefers others. Quick reference guide with examples and best practices. Updated November 2025.

Linux Systemd Timers

Use timers instead of cron for predictable unit execution. Quick reference guide with examples and best practices. Updated November 2025.

Microsoft Azure

Azure cloud services essentials. Use Azure Cost Management to monitor spending Quick reference guide with examples and best practices. Updated November 2025.

Monitoring Drilldowns

Connect high-level panels to deeper dashboards or APIs for troubleshooting. Quick reference guide with examples and best practices. Updated November 2025.

Multi-Cloud IAM Mapping

Document equivalent roles, service accounts, and policies for each cloud. Quick reference guide with examples and best practices. Updated November 2025.

Nginx

Nginx web server configuration Quick reference guide with examples and best practices. Updated November 2025.

Observability as Code

Store observability configs under version control Quick reference guide with examples and best practices. Updated November 2025.

Observability Dogfooding

Teams should consume their own metrics, traces, and dashboards before release. Quick reference guide with examples and best practices. Updated November 2025.

OpenTelemetry Instrumentation

Capture consistent telemetry for services and propagate context across boundaries. Quick reference guide with examples and best practices. Updated November 2025.

Prometheus

Open-source monitoring and alerting. Use Grafana for visualizing Prometheus metrics Quick reference guide with examples and best practices. Updated November 2025.

Prometheus Alert Rates

Document alert noise sources and mute via rate limiting or silences. Quick reference guide with examples and best practices. Updated November 2025.

Prometheus Alerting Best Practices

Add runbooks, use routes, and silence noise to prevent fatigue. Quick reference guide with examples and best practices. Updated November 2025.

Prometheus Blackbox Prober

Probe HTTP, TCP, and ICMP endpoints from your observability stack. Quick reference guide with examples and best practices. Updated November 2025.

Prometheus Query Best Practices

Avoid costly queries while keeping dashboards accurate. Quick reference guide with examples and best practices. Updated November 2025.

Pulumi

Infrastructure as Code with programming languages. Pulumi uses real programming languages, not DSLs Quick reference guide with examples and best practices. Updated November 2025.

Serverless Cost Controls

Prevent runaway bills via concurrency, schedules, and budgets. Quick reference guide with examples and best practices. Updated November 2025.

SRE Incident Response Playbook

Runbooks for status pages, incident channels, and after action reviews. Quick reference guide with examples and best practices. Updated November 2025.

SRE On-call

Define rotations, alert behavior, and runbook links Quick reference guide with examples and best practices. Updated November 2025.

Systemd Services

Craft service + timer units, enable/disable, and inspect logs Quick reference guide with examples and best practices. Updated November 2025.

Terraform

Infrastructure as Code tool. Use terraform workspaces to manage multiple environments Quick reference guide with examples and best practices. Updated November 2025.

Terraform Cloud

Remote workspaces, policy sets, and secure runs in Terraform Cloud Quick reference guide with examples and best practices. Updated November 2025.

Terraform Drift Detection

Catch resource drift before it surprises production. Quick reference guide with examples and best practices. Updated November 2025.

Terraform Module Registry (Batch 8)

Host modules on Terraform Registry or private registries with versioning. Quick reference guide with examples and best practices. Updated November 2025.

Terraform Modules

Module structure, inputs/outputs, and versioning Quick reference guide with examples and best practices. Updated November 2025.

Terraform Provider Security 18

Limit provider privileges Quick reference guide with examples and best practices. Updated November 2025.

Terraform Provider Security 28

Limit provider privileges Quick reference guide with examples and best practices. Updated November 2025.

Terraform Provider Security 38

Limit provider privileges Quick reference guide with examples and best practices. Updated November 2025.

Terraform Provider Security 48

Limit provider privileges Quick reference guide with examples and best practices. Updated November 2025.

Terraform Provider Security 8

Limit provider privileges Quick reference guide with examples and best practices. Updated November 2025.

Terraform Secure Backends

Configure remote backends with encryption and locking. Quick reference guide with examples and best practices. Updated November 2025.

Terraform Security Scanner

Run scanners, enforce policies, and catch drift Quick reference guide with examples and best practices. Updated November 2025.

Terraform Testing

Run fmt/validate/plan, policy tests, and module unit tests Quick reference guide with examples and best practices. Updated November 2025.

Terraform Workspace Strategy

Use named workspaces for staging, prod, and experiments. Quick reference guide with examples and best practices. Updated November 2025.

Travel Digital Nomad Kit

Pack chargers, backups, and routines for nomadic productivity. Quick reference guide with examples and best practices. Updated November 2025.

Vault Dynamic Secrets

Issue temporary credentials for databases, cloud APIs, or SSH. Quick reference guide with examples and best practices. Updated November 2025.

Vault Dynamic SSH

Issue short-lived SSH certs via Vault's SSH secrets engine. Quick reference guide with examples and best practices. Updated November 2025.

Vault KV Versioning

Use KV v2 to store versioned secrets and recover past values. Quick reference guide with examples and best practices. Updated November 2025.

Vault Secrets Engines

Enable engines, generate dynamic credentials, and renew leases Quick reference guide with examples and best practices. Updated November 2025.

Vault Secrets Rotation

Rotate secrets and revoke leases via Vault cron jobs. Quick reference guide with examples and best practices. Updated November 2025.