SRE & Monitoring
Site reliability engineering, monitoring, alerting, and incident management
Overview
Site Reliability Engineering for AssetHandler.
Repository
- GitHub: assethandler-sre
Monitoring Stack
| Component | Technology |
|---|---|
| Metrics | Prometheus |
| Visualization | Grafana |
| Logging | Azure Log Analytics |
| Tracing | OpenTelemetry |
| Alerting | Prometheus Alertmanager |
SLIs & SLOs
- Availability targets
- Latency targets
- Error rate thresholds
On-Call
- Incident response procedures
- Runbooks
- Escalation paths