Maintenance SOPs

Standard operating procedures for routine maintenance of Burdenoff products.

Maintenance Schedule

Daily Tasks

Monitor system health
Review error logs
Check alert status
Respond to incidents
Review metrics

Weekly Tasks

Update dependencies
Review security advisories
Clean up old data
Database maintenance
Team sync meeting

Monthly Tasks

Security patches
Performance review
Cost optimization
Backup verification
Documentation updates

Quarterly Tasks

Major version updates
Infrastructure review
DR testing
Security audit
Team retrospective

Dependency Management

Node.js Dependencies

Check for Updates

# Check outdated packages
npm outdated

# Check for security vulnerabilities
npm audit

# View vulnerability details
npm audit --json

Update Dependencies

# Update to latest minor/patch versions
npm update

# Update to latest major versions (carefully!)
npm install [package]@latest

# Fix vulnerabilities
npm audit fix

# Fix vulnerabilities (breaking changes)
npm audit fix --force

Process

Check for updates weekly
Review changelog for breaking changes
Update in development branch
Run all tests
Test manually
Deploy to alpha
Monitor for issues
Deploy to production

Python Dependencies

Check for Updates

# Check outdated packages
poetry show --outdated

# Check for security vulnerabilities
poetry run safety check
poetry run pip-audit

Update Dependencies

# Update all dependencies
poetry update

# Update specific package
poetry update [package]

# Update to latest version
poetry add [package]@latest

Process

Review updates weekly
Check for breaking changes
Update pyproject.toml
Run poetry update
Run all tests
Deploy and monitor

Database Maintenance

PostgreSQL Maintenance

Vacuum Database

# Connect to database
kubectl exec -it postgres-0 -- psql -U postgres

# Vacuum all tables
VACUUM ANALYZE;

# Vacuum specific table
VACUUM ANALYZE users;

# Full vacuum (locks tables)
VACUUM FULL;

Reindex Database

# Reindex database
REINDEX DATABASE mydatabase;

# Reindex table
REINDEX TABLE users;

# Reindex index
REINDEX INDEX users_email_idx;

Check Database Size

# Database size
SELECT pg_size_pretty(pg_database_size('mydatabase'));

# Table sizes
SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;

Analyze Query Performance

# Enable pg_stat_statements
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

# View slow queries
SELECT
  query,
  calls,
  total_time,
  mean_time,
  max_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;

Redis Maintenance

Check Memory Usage

kubectl exec -it redis-0 -- redis-cli INFO memory

# Key metrics:
# used_memory
# used_memory_peak
# maxmemory

Clear Cache

# Clear all keys (use carefully!)
kubectl exec -it redis-0 -- redis-cli FLUSHALL

# Clear specific database
kubectl exec -it redis-0 -- redis-cli -n 0 FLUSHDB

# Delete specific pattern
kubectl exec -it redis-0 -- redis-cli --scan --pattern "user:*" | \
  xargs kubectl exec -it redis-0 -- redis-cli DEL

Log Management

Log Rotation

# Docker log rotation (docker-compose.yml)
logging:
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "3"

Log Cleanup

# Clean up old logs
find /var/log -name "*.log" -mtime +30 -delete

# Clean up Kubernetes logs
kubectl delete pod --field-selector=status.phase==Succeeded -n [namespace]

Log Analysis

# Check error rate
kubectl logs deployment/[product]-backend -n [namespace] | \
  grep ERROR | wc -l

# Find specific errors
kubectl logs deployment/[product]-backend -n [namespace] | \
  grep "database connection"

# Export logs
kubectl logs deployment/[product]-backend -n [namespace] > logs.txt

Storage Management

Disk Usage

Check Disk Usage

# Node disk usage
kubectl get nodes -o custom-columns=NAME:.metadata.name,DISK:.status.allocatable.ephemeral-storage

# Pod disk usage
kubectl top pods -n [namespace] --containers

# PVC usage
kubectl get pvc -n [namespace]
kubectl describe pvc [pvc-name] -n [namespace]

Clean Up Storage

# Remove unused Docker images
docker system prune -a --volumes

# Remove old builds
rm -rf node_modules/.cache
rm -rf .next
rm -rf dist

# Clean up ACR
az acr repository list --name burdenoff --output table
az acr repository delete --name burdenoff --image [image:tag]

Backup Management

Database Backups

# Manual backup
kubectl exec -it postgres-0 -- pg_dump -U postgres mydatabase | \
  gzip > backup-$(date +%Y%m%d).sql.gz

# Upload to Azure Blob
az storage blob upload \
  --account-name burdenoffbackups \
  --container-name db-backups \
  --name backup-$(date +%Y%m%d).sql.gz \
  --file backup-$(date +%Y%m%d).sql.gz

Verify Backups

# List backups
az storage blob list \
  --account-name burdenoffbackups \
  --container-name db-backups \
  --output table

# Test restore (in test environment)
kubectl exec -i postgres-0 -- psql -U postgres testdb < backup.sql

# Verify data
kubectl exec -it postgres-0 -- psql -U postgres testdb -c "\dt"

Backup Retention

Daily backups: Keep 7 days
Weekly backups: Keep 4 weeks
Monthly backups: Keep 12 months

Security Maintenance

Certificate Management

Check Certificate Expiration

# Check certificate
echo | openssl s_client -servername [domain] -connect [domain]:443 2>/dev/null | \
  openssl x509 -noout -dates

# Check all certificates
kubectl get certificates -A

Renew Certificates

# cert-manager auto-renews, but can force renewal
kubectl delete certificate [cert-name] -n [namespace]

# Check renewal status
kubectl describe certificate [cert-name] -n [namespace]

Secret Rotation

Rotate Database Password

# 1. Generate new password
NEW_PASSWORD=$(openssl rand -base64 32)

# 2. Update database
kubectl exec -it postgres-0 -- psql -U postgres -c \
  "ALTER USER postgres PASSWORD '$NEW_PASSWORD';"

# 3. Update Kubernetes secret
kubectl create secret generic db-credentials \
  --from-literal=password=$NEW_PASSWORD \
  --dry-run=client -o yaml | \
  kubectl apply -f -

# 4. Restart pods
kubectl rollout restart deployment/[product]-backend

Rotate JWT Secret

# 1. Generate new secret
NEW_SECRET=$(openssl rand -hex 32)

# 2. Update secret
kubectl create secret generic jwt-secret \
  --from-literal=secret=$NEW_SECRET \
  --dry-run=client -o yaml | \
  kubectl apply -f -

# 3. Restart pods
kubectl rollout restart deployment/[product]-backend

# 4. Invalidate old tokens (if needed)
# Users will need to re-login

Security Scans

Container Security

# Scan Docker image
trivy image [image:tag]

# Scan for high/critical vulnerabilities only
trivy image --severity HIGH,CRITICAL [image:tag]

Dependency Security

# Node.js
npm audit
npm audit fix

# Python
poetry run safety check
poetry run pip-audit

Performance Optimization

Database Optimization

Analyze Queries

# Enable query logging
kubectl exec -it postgres-0 -- psql -U postgres -c \
  "ALTER SYSTEM SET log_min_duration_statement = 1000;"

# Reload configuration
kubectl exec -it postgres-0 -- psql -U postgres -c \
  "SELECT pg_reload_conf();"

# View slow queries
kubectl logs postgres-0 | grep "duration:"

Optimize Indexes

# Find missing indexes
SELECT
  schemaname,
  tablename,
  attname,
  n_distinct,
  correlation
FROM pg_stats
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY n_distinct DESC;

# Create index
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

Application Optimization

Cache Optimization

# Check cache hit rate
kubectl exec -it redis-0 -- redis-cli INFO stats | grep hit

# Clear stale cache keys
kubectl exec -it redis-0 -- redis-cli --scan --pattern "cache:old:*" | \
  xargs kubectl exec -it redis-0 -- redis-cli DEL

Resource Optimization

# Check resource usage
kubectl top pods -n [namespace]
kubectl top nodes

# Adjust resource limits if needed
kubectl edit deployment/[product]-backend -n [namespace]

Cost Optimization

Resource Right-Sizing

# Check actual resource usage
kubectl top pods -n [namespace] --containers

# Compare with requested resources
kubectl get pods -n [namespace] -o json | \
  jq '.items[] | {name: .metadata.name, requests: .spec.containers[].resources.requests}'

# Adjust if over/under-provisioned

Clean Up Unused Resources

# Delete old ReplicaSets
kubectl delete replicaset --field-selector=status.replicas=0 -n [namespace]

# Delete completed jobs
kubectl delete job --field-selector=status.successful=1 -n [namespace]

# Delete evicted pods
kubectl delete pod --field-selector=status.phase==Failed -n [namespace]

Monitoring Maintenance

Update Dashboards

Review metrics
Add new panels
Remove obsolete metrics
Update thresholds

Update Alerts

Review alert rules
Adjust thresholds
Add new alerts
Remove obsolete alerts

Review Incidents

Analyze trends
Update runbooks
Improve monitoring
Share learnings

Documentation Maintenance

Update Documentation

Review for accuracy
Add new features
Remove deprecated info
Update examples
Fix broken links

Changelog

Document changes
Version releases
Migration guides
Breaking changes

Maintenance Schedule​

Daily Tasks​

Weekly Tasks​

Monthly Tasks​

Quarterly Tasks​

Dependency Management​

Node.js Dependencies​

Check for Updates​

Update Dependencies​

Process​

Python Dependencies​

Check for Updates​

Update Dependencies​

Process​

Database Maintenance​

PostgreSQL Maintenance​

Vacuum Database​

Reindex Database​

Check Database Size​

Analyze Query Performance​

Redis Maintenance​

Check Memory Usage​

Clear Cache​

Log Management​

Log Rotation​

Log Cleanup​

Log Analysis​

Storage Management​

Disk Usage​

Check Disk Usage​

Clean Up Storage​

Backup Management​

Database Backups​

Verify Backups​

Backup Retention​

Security Maintenance​

Certificate Management​

Check Certificate Expiration​

Renew Certificates​

Secret Rotation​

Rotate Database Password​

Rotate JWT Secret​

Security Scans​

Container Security​

Dependency Security​

Performance Optimization​

Database Optimization​

Analyze Queries​

Optimize Indexes​

Application Optimization​

Cache Optimization​

Resource Optimization​

Cost Optimization​

Resource Right-Sizing​

Clean Up Unused Resources​

Monitoring Maintenance​

Update Dashboards​

Update Alerts​

Review Incidents​

Documentation Maintenance​

Update Documentation​

Changelog​

Next Steps​

Maintenance Schedule

Daily Tasks

Weekly Tasks

Monthly Tasks

Quarterly Tasks

Dependency Management

Node.js Dependencies

Check for Updates

Update Dependencies

Process

Python Dependencies

Check for Updates

Update Dependencies

Process

Database Maintenance

PostgreSQL Maintenance

Vacuum Database

Reindex Database

Check Database Size

Analyze Query Performance

Redis Maintenance

Check Memory Usage

Clear Cache

Log Management

Log Rotation

Log Cleanup

Log Analysis

Storage Management

Disk Usage

Check Disk Usage

Clean Up Storage

Backup Management

Database Backups

Verify Backups

Backup Retention

Security Maintenance

Certificate Management

Check Certificate Expiration

Renew Certificates

Secret Rotation

Rotate Database Password

Rotate JWT Secret

Security Scans

Container Security

Dependency Security

Performance Optimization

Database Optimization

Analyze Queries

Optimize Indexes

Application Optimization

Cache Optimization

Resource Optimization

Cost Optimization

Resource Right-Sizing

Clean Up Unused Resources

Monitoring Maintenance

Update Dashboards

Update Alerts

Review Incidents

Documentation Maintenance

Update Documentation

Changelog

Next Steps