CLAUDE.md — DevOps / Infrastructure Project
Project Basics
This is an infrastructure-as-code project using Terraform with AWS (adjust for your provider). All infrastructure changes must go through code — no manual console changes. Use HCL for Terraform configs, YAML for CI/CD pipelines, and shell scripts for automation.
Terraform Conventions
- One directory per environment:
environments/{dev,staging,prod}/ - Shared modules in
modules/— reusable, parameterized, versioned - State stored remotely (S3 + DynamoDB lock, or Terraform Cloud)
- Variables in
variables.tf, outputs inoutputs.tf, providers inproviders.tf - Use
terraform fmtbefore every commit - Tag all resources with:
Environment,Project,ManagedBy=terraform
Safety Rules
- NEVER run
terraform applyon production without explicit user approval - NEVER run
terraform destroywithout triple-confirming the target environment - Always run
terraform planfirst and review the output before applying - Use
-targetsparingly — it creates state drift - Lock state files during operations — never bypass locks
- Keep production and non-production state files completely separate
Docker
- Multi-stage builds: build stage with full toolchain, runtime with minimal image
- Pin base image versions:
node:20.11-alpinenotnode:latest - Non-root user in production containers
- Health checks in every Dockerfile:
HEALTHCHECK CMD ... .dockerignoremust exclude:.git,node_modules,.env,*.md- Scan images for vulnerabilities:
docker scout,trivy, or equivalent
CI/CD Pipelines
- All pipelines defined in code (GitHub Actions, GitLab CI, etc.)
- Pipeline stages: lint → test → build → security scan → deploy
- Secrets in CI/CD secrets manager — never in pipeline files
- Deploy to staging automatically, production requires manual approval
- Rollback plan documented for every deployment step
Secrets Management
- Never hardcode secrets in any file — use secret managers (AWS Secrets Manager, Vault, etc.)
- Environment variables for runtime secrets
.envfiles only for local development — never committed- Rotate secrets on schedule, immediately on suspected compromise
- Audit secret access logs periodically
Monitoring & Alerting
- Every service must have: health check endpoint, structured logging, basic metrics
- Alert on: error rate spikes, latency P99 degradation, resource utilization > 80%
- Dashboard for every service in Grafana/CloudWatch/Datadog
- Runbooks linked from alerts — every alert should have a documented response
Networking
- Private subnets for application workloads, public subnets only for load balancers
- Security groups: principle of least privilege, no
0.0.0.0/0ingress except HTTPS on ALB - Use parameter store or service discovery — never hardcode IPs
- TLS everywhere: terminate at load balancer, internal traffic over private network
Verification
terraform fmt -checkpassesterraform validatepassesterraform planshows only expected changestflintorcheckovpasses (if configured)- Docker images build successfully
- Pipeline runs pass on the branch
Git & GitHub
- Do not apply infrastructure changes without explicit permission
- Do not perform destructive operations (destroy, force-unlock) without asking first
- Infrastructure PRs require at least one review before merge
- All Terraform changes must include plan output in PR description