Skip to main content

Maintenance SOPs

Standard operating procedures for routine maintenance of Burdenoff products.

Maintenance Schedule

Daily Tasks

  • Monitor system health
  • Review error logs
  • Check alert status
  • Respond to incidents
  • Review metrics

Weekly Tasks

  • Update dependencies
  • Review security advisories
  • Clean up old data
  • Database maintenance
  • Team sync meeting

Monthly Tasks

  • Security patches
  • Performance review
  • Cost optimization
  • Backup verification
  • Documentation updates

Quarterly Tasks

  • Major version updates
  • Infrastructure review
  • DR testing
  • Security audit
  • Team retrospective

Dependency Management

Node.js Dependencies

Check for Updates

# Check outdated packages
npm outdated

# Check for security vulnerabilities
npm audit

# View vulnerability details
npm audit --json

Update Dependencies

# Update to latest minor/patch versions
npm update

# Update to latest major versions (carefully!)
npm install [package]@latest

# Fix vulnerabilities
npm audit fix

# Fix vulnerabilities (breaking changes)
npm audit fix --force

Process

  1. Check for updates weekly
  2. Review changelog for breaking changes
  3. Update in development branch
  4. Run all tests
  5. Test manually
  6. Deploy to alpha
  7. Monitor for issues
  8. Deploy to production

Python Dependencies

Check for Updates

# Check outdated packages
poetry show --outdated

# Check for security vulnerabilities
poetry run safety check
poetry run pip-audit

Update Dependencies

# Update all dependencies
poetry update

# Update specific package
poetry update [package]

# Update to latest version
poetry add [package]@latest

Process

  1. Review updates weekly
  2. Check for breaking changes
  3. Update pyproject.toml
  4. Run poetry update
  5. Run all tests
  6. Deploy and monitor

Database Maintenance

PostgreSQL Maintenance

Vacuum Database

# Connect to database
kubectl exec -it postgres-0 -- psql -U postgres

# Vacuum all tables
VACUUM ANALYZE;

# Vacuum specific table
VACUUM ANALYZE users;

# Full vacuum (locks tables)
VACUUM FULL;

Reindex Database

# Reindex database
REINDEX DATABASE mydatabase;

# Reindex table
REINDEX TABLE users;

# Reindex index
REINDEX INDEX users_email_idx;

Check Database Size

# Database size
SELECT pg_size_pretty(pg_database_size('mydatabase'));

# Table sizes
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;

Analyze Query Performance

# Enable pg_stat_statements
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

# View slow queries
SELECT
query,
calls,
total_time,
mean_time,
max_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;

Redis Maintenance

Check Memory Usage

kubectl exec -it redis-0 -- redis-cli INFO memory

# Key metrics:
# used_memory
# used_memory_peak
# maxmemory

Clear Cache

# Clear all keys (use carefully!)
kubectl exec -it redis-0 -- redis-cli FLUSHALL

# Clear specific database
kubectl exec -it redis-0 -- redis-cli -n 0 FLUSHDB

# Delete specific pattern
kubectl exec -it redis-0 -- redis-cli --scan --pattern "user:*" | \
xargs kubectl exec -it redis-0 -- redis-cli DEL

Log Management

Log Rotation

# Docker log rotation (docker-compose.yml)
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"

Log Cleanup

# Clean up old logs
find /var/log -name "*.log" -mtime +30 -delete

# Clean up Kubernetes logs
kubectl delete pod --field-selector=status.phase==Succeeded -n [namespace]

Log Analysis

# Check error rate
kubectl logs deployment/[product]-backend -n [namespace] | \
grep ERROR | wc -l

# Find specific errors
kubectl logs deployment/[product]-backend -n [namespace] | \
grep "database connection"

# Export logs
kubectl logs deployment/[product]-backend -n [namespace] > logs.txt

Storage Management

Disk Usage

Check Disk Usage

# Node disk usage
kubectl get nodes -o custom-columns=NAME:.metadata.name,DISK:.status.allocatable.ephemeral-storage

# Pod disk usage
kubectl top pods -n [namespace] --containers

# PVC usage
kubectl get pvc -n [namespace]
kubectl describe pvc [pvc-name] -n [namespace]

Clean Up Storage

# Remove unused Docker images
docker system prune -a --volumes

# Remove old builds
rm -rf node_modules/.cache
rm -rf .next
rm -rf dist

# Clean up ACR
az acr repository list --name burdenoff --output table
az acr repository delete --name burdenoff --image [image:tag]

Backup Management

Database Backups

# Manual backup
kubectl exec -it postgres-0 -- pg_dump -U postgres mydatabase | \
gzip > backup-$(date +%Y%m%d).sql.gz

# Upload to Azure Blob
az storage blob upload \
--account-name burdenoffbackups \
--container-name db-backups \
--name backup-$(date +%Y%m%d).sql.gz \
--file backup-$(date +%Y%m%d).sql.gz

Verify Backups

# List backups
az storage blob list \
--account-name burdenoffbackups \
--container-name db-backups \
--output table

# Test restore (in test environment)
kubectl exec -i postgres-0 -- psql -U postgres testdb < backup.sql

# Verify data
kubectl exec -it postgres-0 -- psql -U postgres testdb -c "\dt"

Backup Retention

  • Daily backups: Keep 7 days
  • Weekly backups: Keep 4 weeks
  • Monthly backups: Keep 12 months

Security Maintenance

Certificate Management

Check Certificate Expiration

# Check certificate
echo | openssl s_client -servername [domain] -connect [domain]:443 2>/dev/null | \
openssl x509 -noout -dates

# Check all certificates
kubectl get certificates -A

Renew Certificates

# cert-manager auto-renews, but can force renewal
kubectl delete certificate [cert-name] -n [namespace]

# Check renewal status
kubectl describe certificate [cert-name] -n [namespace]

Secret Rotation

Rotate Database Password

# 1. Generate new password
NEW_PASSWORD=$(openssl rand -base64 32)

# 2. Update database
kubectl exec -it postgres-0 -- psql -U postgres -c \
"ALTER USER postgres PASSWORD '$NEW_PASSWORD';"

# 3. Update Kubernetes secret
kubectl create secret generic db-credentials \
--from-literal=password=$NEW_PASSWORD \
--dry-run=client -o yaml | \
kubectl apply -f -

# 4. Restart pods
kubectl rollout restart deployment/[product]-backend

Rotate JWT Secret

# 1. Generate new secret
NEW_SECRET=$(openssl rand -hex 32)

# 2. Update secret
kubectl create secret generic jwt-secret \
--from-literal=secret=$NEW_SECRET \
--dry-run=client -o yaml | \
kubectl apply -f -

# 3. Restart pods
kubectl rollout restart deployment/[product]-backend

# 4. Invalidate old tokens (if needed)
# Users will need to re-login

Security Scans

Container Security

# Scan Docker image
trivy image [image:tag]

# Scan for high/critical vulnerabilities only
trivy image --severity HIGH,CRITICAL [image:tag]

Dependency Security

# Node.js
npm audit
npm audit fix

# Python
poetry run safety check
poetry run pip-audit

Performance Optimization

Database Optimization

Analyze Queries

# Enable query logging
kubectl exec -it postgres-0 -- psql -U postgres -c \
"ALTER SYSTEM SET log_min_duration_statement = 1000;"

# Reload configuration
kubectl exec -it postgres-0 -- psql -U postgres -c \
"SELECT pg_reload_conf();"

# View slow queries
kubectl logs postgres-0 | grep "duration:"

Optimize Indexes

# Find missing indexes
SELECT
schemaname,
tablename,
attname,
n_distinct,
correlation
FROM pg_stats
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY n_distinct DESC;

# Create index
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

Application Optimization

Cache Optimization

# Check cache hit rate
kubectl exec -it redis-0 -- redis-cli INFO stats | grep hit

# Clear stale cache keys
kubectl exec -it redis-0 -- redis-cli --scan --pattern "cache:old:*" | \
xargs kubectl exec -it redis-0 -- redis-cli DEL

Resource Optimization

# Check resource usage
kubectl top pods -n [namespace]
kubectl top nodes

# Adjust resource limits if needed
kubectl edit deployment/[product]-backend -n [namespace]

Cost Optimization

Resource Right-Sizing

# Check actual resource usage
kubectl top pods -n [namespace] --containers

# Compare with requested resources
kubectl get pods -n [namespace] -o json | \
jq '.items[] | {name: .metadata.name, requests: .spec.containers[].resources.requests}'

# Adjust if over/under-provisioned

Clean Up Unused Resources

# Delete old ReplicaSets
kubectl delete replicaset --field-selector=status.replicas=0 -n [namespace]

# Delete completed jobs
kubectl delete job --field-selector=status.successful=1 -n [namespace]

# Delete evicted pods
kubectl delete pod --field-selector=status.phase==Failed -n [namespace]

Monitoring Maintenance

Update Dashboards

  • Review metrics
  • Add new panels
  • Remove obsolete metrics
  • Update thresholds

Update Alerts

  • Review alert rules
  • Adjust thresholds
  • Add new alerts
  • Remove obsolete alerts

Review Incidents

  • Analyze trends
  • Update runbooks
  • Improve monitoring
  • Share learnings

Documentation Maintenance

Update Documentation

  • Review for accuracy
  • Add new features
  • Remove deprecated info
  • Update examples
  • Fix broken links

Changelog

  • Document changes
  • Version releases
  • Migration guides
  • Breaking changes

Next Steps