Infrastructure Guide
Infrastructure as Code and cloud infrastructure documentation.
Infrastructure Stack
Cloud Provider
- Primary: Microsoft Azure
- Regions: As needed for latency
- Services: AKS, ACR, Key Vault, Log Analytics
Infrastructure as Code
- Tool: Terraform
- State: Remote (Azure Storage)
- Modules: Reusable across products
Azure Kubernetes Service (AKS)
Cluster Configuration
- Node pools: Multiple (system, user)
- Auto-scaling: Enabled
- Network: Azure CNI
- Security: RBAC, Network Policies
Resource Organization
- Namespaces: Per product/environment
- Resource Quotas: Enforced
- Network Policies: Isolation between products
Container Registry (ACR)
Image Management
- Registry: Azure Container Registry
- Naming:
[product]-[service]:[tag] - Retention: 90 days for old images
- Scanning: Enabled for vulnerabilities
Image Tags
latest: Latest build[git-sha]: Specific commit[branch]: Latest for branch
Terraform Structure
cloudops/
├── modules/
│ ├── aks/
│ ├── acr/
│ ├── networking/
│ └── monitoring/
├── environments/
│ ├── alpha/
│ └── production/
├── main.tf
├── variables.tf
└── outputs.tf
Usage
cd [product]-cloudops
terraform init
terraform plan
terraform apply
Helm Charts
Chart Structure
helm/
├── Chart.yaml
├── values.yaml
├── values-alpha.yaml
├── values-production.yaml
└── templates/
├── deployment.yaml
├── service.yaml
├── ingress.yaml
├── hpa.yaml
└── serviceaccount.yaml
Deployment
helm install [product] ./helm \
-f helm/values-alpha.yaml \
--namespace [product]-alpha
Network Architecture
Ingress
- Controller: NGINX Ingress
- TLS: Let's Encrypt certificates
- Rate Limiting: Enabled
DNS
- Provider: Cloudflare
- Records: Automated via Terraform
- Backup: DNS records backed up at
/Users/vignesh/official/algoshred/products/backups/dns
Security
Secrets Management
- Azure Key Vault: Production secrets
- Kubernetes Secrets: Non-sensitive config
- GitHub Secrets: CI/CD credentials
Network Security
- Network Policies: Pod-to-pod isolation
- Firewalls: Azure Firewall
- DDoS Protection: Azure DDoS Standard
Monitoring & Logging
Azure Log Analytics
- Centralized logging
- Query language (KQL)
- Alerts and dashboards
Prometheus
- Metrics collection
- Custom metrics
- Alerting rules
Disaster Recovery
Backups
- Database: Daily automated backups
- Retention: 30 days
- Testing: Monthly restore tests
High Availability
- Multi-AZ: Enabled
- Auto-healing: Kubernetes liveness probes
- Failover: Automatic
Cost Optimization
Strategies
- Right-sizing resources
- Auto-scaling
- Spot instances for non-critical workloads
- Resource tagging for cost allocation
Monitoring
- Azure Cost Management
- Budget alerts
- Usage reports