Elevate requirements to organizational/architectural policy

- Security: no IAM in service repos, no custom auth, no direct external calls
- Architecture: no cross-cloud SDKs, no cross-service DB access, no hardcoded tenant/env config
- DevOps: Foxtrot-compatible Helm (no custom ingress), no infra provisioning in service repos, no pinned infra versions
- Cost: resource tagging, no unbounded allocation, no per-tenant infra
- Updated checker and demo to match
- These are NOT static code analysis — they catch organizational policy violations that SonarQube/Checkstyle miss
This commit is contained in:
Max Mayfield
2026-03-07 07:41:27 +00:00
parent a7728c6266
commit e323c45cb0
10 changed files with 265 additions and 198 deletions

View File

@@ -1,36 +1,44 @@
# DevOps Requirements
Phase: deployment
Enforcement: informational (graduating to blocking Q3 2026)
Enforcement: informational
## OPS-001: Health Check Endpoint
## OPS-001: Foxtrot-Compatible Helm Chart
Every deployable service MUST expose a `/health` or `/actuator/health` endpoint that returns 200 when healthy.
Every deployable service MUST include a Helm chart that honors the Foxtrot deployment contract.
**Rule:** New services must include a health check. Existing services adding deployment config must verify health endpoint exists.
**Rule:** The Helm chart must:
- Use the standard Foxtrot base chart as a dependency (or implement its interface)
- Expose `values.yaml` with the required Foxtrot parameters (replicas, resources, env, configMap references)
- Support the standard lifecycle hooks (pre-deploy validation, health check, rollback trigger)
- Not define its own ingress/networking — Foxtrot manages routing
**Test:** Check that service has a health endpoint registered (grep for health route registration).
**Test:** Validate Helm chart structure: check for Foxtrot base chart dependency, required values keys, no ingress resource definitions.
## OPS-002: Structured Logging
## OPS-002: No Infrastructure Provisioning in Service Repos
All log statements MUST use structured logging (JSON format) with at minimum: timestamp, level, service name, correlation ID.
Service repositories MUST NOT provision infrastructure (databases, queues, storage, networking). Infrastructure is managed through the dedicated infrastructure repos.
**Rule:** No `System.out.println`, `console.log` for production logging. Use the logging framework with structured output.
**Rule:** No Terraform, CloudFormation, or Pulumi resource definitions in service repos. Services declare their infrastructure dependencies in a manifest; the platform provisions them.
**Test:** Grep for `System.out.print`, `System.err.print`, raw `console.log` in non-test source files.
**Test:** Scan for `*.tf`, `*.template.yaml` (CFN), `Pulumi.*` files in service repos.
## OPS-003: Resource Limits
## OPS-003: Standard Observability Contract
All Kubernetes deployment manifests MUST specify CPU and memory requests and limits.
Every service MUST expose metrics, health, and readiness endpoints in the standard format.
**Rule:** No unbounded resource consumption. Every container spec must have `resources.requests` and `resources.limits`.
**Rule:**
- `/health` or `/actuator/health` — returns 200 when healthy
- `/ready` or `/actuator/ready` — returns 200 when ready to accept traffic
- Prometheus metrics endpoint at `/metrics` or `/actuator/prometheus`
- Structured JSON logging with correlation ID propagation
**Test:** Parse YAML deployment files, verify `resources` block present with both `requests` and `limits`.
**Test:** Check for health/ready endpoint registration in code. Verify logging config outputs JSON format.
## OPS-004: Rollback Safety
## OPS-004: No Pinned Infrastructure Versions
Database migrations MUST be backward-compatible with the previous service version (N-1 compatibility).
Service Helm charts MUST NOT pin specific infrastructure versions (database versions, queue versions, runtime versions).
**Rule:** No column renames or drops without a multi-step migration. Additive changes only in a single release.
**Rule:** Infrastructure version management is handled by the platform team. Services declare compatibility ranges, not exact versions. No `image: postgres:14.2` in service charts.
**Test:** Scan migration SQL for `DROP COLUMN`, `RENAME COLUMN`, `ALTER TYPE` — flag for manual review.
**Test:** Scan Helm values and templates for hardcoded infrastructure image tags.