Maintenance SOPs
Standard operating procedures for routine maintenance of Burdenoff products.
Maintenance Schedule
Daily Tasks
- Monitor system health
- Review error logs
- Check alert status
- Respond to incidents
- Review metrics
Weekly Tasks
- Update dependencies
- Review security advisories
- Clean up old data
- Database maintenance
- Team sync meeting
Monthly Tasks
- Security patches
- Performance review
- Cost optimization
- Backup verification
- Documentation updates
Quarterly Tasks
- Major version updates
- Infrastructure review
- DR testing
- Security audit
- Team retrospective
Dependency Management
Node.js Dependencies
Check for Updates
# Check outdated packages
npm outdated
# Check for security vulnerabilities
npm audit
# View vulnerability details
npm audit --json
Update Dependencies
# Update to latest minor/patch versions
npm update
# Update to latest major versions (carefully!)
npm install [package]@latest
# Fix vulnerabilities
npm audit fix
# Fix vulnerabilities (breaking changes)
npm audit fix --force
Process
- Check for updates weekly
- Review changelog for breaking changes
- Update in development branch
- Run all tests
- Test manually
- Deploy to alpha
- Monitor for issues
- Deploy to production
Python Dependencies
Check for Updates
# Check outdated packages
poetry show --outdated
# Check for security vulnerabilities
poetry run safety check
poetry run pip-audit
Update Dependencies
# Update all dependencies
poetry update
# Update specific package
poetry update [package]
# Update to latest version
poetry add [package]@latest
Process
- Review updates weekly
- Check for breaking changes
- Update pyproject.toml
- Run
poetry update - Run all tests
- Deploy and monitor
Database Maintenance
PostgreSQL Maintenance
Vacuum Database
# Connect to database
kubectl exec -it postgres-0 -- psql -U postgres
# Vacuum all tables
VACUUM ANALYZE;
# Vacuum specific table
VACUUM ANALYZE users;
# Full vacuum (locks tables)
VACUUM FULL;
Reindex Database
# Reindex database
REINDEX DATABASE mydatabase;
# Reindex table
REINDEX TABLE users;
# Reindex index
REINDEX INDEX users_email_idx;
Check Database Size
# Database size
SELECT pg_size_pretty(pg_database_size('mydatabase'));
# Table sizes
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;
Analyze Query Performance
# Enable pg_stat_statements
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
# View slow queries
SELECT
query,
calls,
total_time,
mean_time,
max_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;
Redis Maintenance
Check Memory Usage
kubectl exec -it redis-0 -- redis-cli INFO memory
# Key metrics:
# used_memory
# used_memory_peak
# maxmemory
Clear Cache
# Clear all keys (use carefully!)
kubectl exec -it redis-0 -- redis-cli FLUSHALL
# Clear specific database
kubectl exec -it redis-0 -- redis-cli -n 0 FLUSHDB
# Delete specific pattern
kubectl exec -it redis-0 -- redis-cli --scan --pattern "user:*" | \
xargs kubectl exec -it redis-0 -- redis-cli DEL
Log Management
Log Rotation
# Docker log rotation (docker-compose.yml)
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
Log Cleanup
# Clean up old logs
find /var/log -name "*.log" -mtime +30 -delete
# Clean up Kubernetes logs
kubectl delete pod --field-selector=status.phase==Succeeded -n [namespace]
Log Analysis
# Check error rate
kubectl logs deployment/[product]-backend -n [namespace] | \
grep ERROR | wc -l
# Find specific errors
kubectl logs deployment/[product]-backend -n [namespace] | \
grep "database connection"
# Export logs
kubectl logs deployment/[product]-backend -n [namespace] > logs.txt
Storage Management
Disk Usage
Check Disk Usage
# Node disk usage
kubectl get nodes -o custom-columns=NAME:.metadata.name,DISK:.status.allocatable.ephemeral-storage
# Pod disk usage
kubectl top pods -n [namespace] --containers
# PVC usage
kubectl get pvc -n [namespace]
kubectl describe pvc [pvc-name] -n [namespace]
Clean Up Storage
# Remove unused Docker images
docker system prune -a --volumes
# Remove old builds
rm -rf node_modules/.cache
rm -rf .next
rm -rf dist
# Clean up ACR
az acr repository list --name burdenoff --output table
az acr repository delete --name burdenoff --image [image:tag]
Backup Management
Database Backups
# Manual backup
kubectl exec -it postgres-0 -- pg_dump -U postgres mydatabase | \
gzip > backup-$(date +%Y%m%d).sql.gz
# Upload to Azure Blob
az storage blob upload \
--account-name burdenoffbackups \
--container-name db-backups \
--name backup-$(date +%Y%m%d).sql.gz \
--file backup-$(date +%Y%m%d).sql.gz
Verify Backups
# List backups
az storage blob list \
--account-name burdenoffbackups \
--container-name db-backups \
--output table
# Test restore (in test environment)
kubectl exec -i postgres-0 -- psql -U postgres testdb < backup.sql
# Verify data
kubectl exec -it postgres-0 -- psql -U postgres testdb -c "\dt"
Backup Retention
- Daily backups: Keep 7 days
- Weekly backups: Keep 4 weeks
- Monthly backups: Keep 12 months
Security Maintenance
Certificate Management
Check Certificate Expiration
# Check certificate
echo | openssl s_client -servername [domain] -connect [domain]:443 2>/dev/null | \
openssl x509 -noout -dates
# Check all certificates
kubectl get certificates -A
Renew Certificates
# cert-manager auto-renews, but can force renewal
kubectl delete certificate [cert-name] -n [namespace]
# Check renewal status
kubectl describe certificate [cert-name] -n [namespace]
Secret Rotation
Rotate Database Password
# 1. Generate new password
NEW_PASSWORD=$(openssl rand -base64 32)
# 2. Update database
kubectl exec -it postgres-0 -- psql -U postgres -c \
"ALTER USER postgres PASSWORD '$NEW_PASSWORD';"
# 3. Update Kubernetes secret
kubectl create secret generic db-credentials \
--from-literal=password=$NEW_PASSWORD \
--dry-run=client -o yaml | \
kubectl apply -f -
# 4. Restart pods
kubectl rollout restart deployment/[product]-backend
Rotate JWT Secret
# 1. Generate new secret
NEW_SECRET=$(openssl rand -hex 32)
# 2. Update secret
kubectl create secret generic jwt-secret \
--from-literal=secret=$NEW_SECRET \
--dry-run=client -o yaml | \
kubectl apply -f -
# 3. Restart pods
kubectl rollout restart deployment/[product]-backend
# 4. Invalidate old tokens (if needed)
# Users will need to re-login
Security Scans
Container Security
# Scan Docker image
trivy image [image:tag]
# Scan for high/critical vulnerabilities only
trivy image --severity HIGH,CRITICAL [image:tag]
Dependency Security
# Node.js
npm audit
npm audit fix
# Python
poetry run safety check
poetry run pip-audit
Performance Optimization
Database Optimization
Analyze Queries
# Enable query logging
kubectl exec -it postgres-0 -- psql -U postgres -c \
"ALTER SYSTEM SET log_min_duration_statement = 1000;"
# Reload configuration
kubectl exec -it postgres-0 -- psql -U postgres -c \
"SELECT pg_reload_conf();"
# View slow queries
kubectl logs postgres-0 | grep "duration:"
Optimize Indexes
# Find missing indexes
SELECT
schemaname,
tablename,
attname,
n_distinct,
correlation
FROM pg_stats
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY n_distinct DESC;
# Create index
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
Application Optimization
Cache Optimization
# Check cache hit rate
kubectl exec -it redis-0 -- redis-cli INFO stats | grep hit
# Clear stale cache keys
kubectl exec -it redis-0 -- redis-cli --scan --pattern "cache:old:*" | \
xargs kubectl exec -it redis-0 -- redis-cli DEL
Resource Optimization
# Check resource usage
kubectl top pods -n [namespace]
kubectl top nodes
# Adjust resource limits if needed
kubectl edit deployment/[product]-backend -n [namespace]
Cost Optimization
Resource Right-Sizing
# Check actual resource usage
kubectl top pods -n [namespace] --containers
# Compare with requested resources
kubectl get pods -n [namespace] -o json | \
jq '.items[] | {name: .metadata.name, requests: .spec.containers[].resources.requests}'
# Adjust if over/under-provisioned
Clean Up Unused Resources
# Delete old ReplicaSets
kubectl delete replicaset --field-selector=status.replicas=0 -n [namespace]
# Delete completed jobs
kubectl delete job --field-selector=status.successful=1 -n [namespace]
# Delete evicted pods
kubectl delete pod --field-selector=status.phase==Failed -n [namespace]
Monitoring Maintenance
Update Dashboards
- Review metrics
- Add new panels
- Remove obsolete metrics
- Update thresholds
Update Alerts
- Review alert rules
- Adjust thresholds
- Add new alerts
- Remove obsolete alerts
Review Incidents
- Analyze trends
- Update runbooks
- Improve monitoring
- Share learnings
Documentation Maintenance
Update Documentation
- Review for accuracy
- Add new features
- Remove deprecated info
- Update examples
- Fix broken links
Changelog
- Document changes
- Version releases
- Migration guides
- Breaking changes