Deployment Order¶
The full sequence of what gets installed, in what order, and how each step is version-controlled. Everything after Phase 0 lives in Git — OpenTofu modules, Ansible playbooks, OCP manifests, ACM Policies, ArgoCD ApplicationSets. All versioned, all auditable, all re-runnable.
Principle: GitOps for everything¶
| Layer | Source of truth | Applied by |
|---|---|---|
| VM provisioning | brac-poc-infrastructure repo (OpenTofu) |
Terrakube |
| VM configuration | brac-poc-ansible repo (Ansible playbooks) |
AWX |
| OpenShift cluster config + operators + workloads | openshift-platform-gitops repo |
OpenShift GitOps (ArgoCD) via RHACM pull-mode |
| ACM policies (compliance baseline) | openshift-platform-gitops/policies/ |
ACM Policy controller |
| Observability config, service-mesh rules, alerting | openshift-platform-gitops/components/ |
ArgoCD ApplicationSet |
Exception: Phase 0 bootstrap — the minimum required to get OpenTofu + Ansible + Git running. Once Phase 0 is done, every subsequent change goes through a MR.
Phased deployment sequence¶
flowchart TB
P0["Phase 0 — Manual bootstrap<br/>(only manual phase)"]:::phase0
P1["Phase 1 — IaC substrate<br/>OpenSSL root CA, Git repos, Vault, MinIO"]:::substrate
P2["Phase 2 — Automation plane<br/>AWX + Terrakube via their own VMs"]:::automation
P3["Phase 3 — Platform tools tier<br/>GitLab, Jenkins, Nexus"]:::platform
P4["Phase 4 — Identity + workflow tier<br/>Keycloak, WSO2, Temporal, n8n"]:::apps
P5["Phase 5 — Observability tier<br/>Splunk, SigNoz, ClickHouse, Redis"]:::obs
P6["Phase 6 — OpenShift hub clusters<br/>hub-dc SNO → RHACM → hub-dr"]:::ocp
P7["Phase 7 — OpenShift spoke clusters<br/>spoke-dc → spoke-dr via ACM/ZTP"]:::ocp
P8["Phase 8 — OCP operators + workloads<br/>via GitOps ApplicationSets"]:::workload
P9["Phase 9 — Demo apps<br/>OTel Demo + Bookinfo + custom<br/>traffic generator"]:::demo
P10["Phase 10 — DR drill rehearsals"]:::dr
P0 --> P1 --> P2 --> P3 --> P4 --> P5 --> P6 --> P7 --> P8 --> P9 --> P10
classDef phase0 fill:#424242,stroke:#212121,color:#fff
classDef substrate fill:#1b5e20,stroke:#0d3812,color:#fff
classDef automation fill:#004d40,stroke:#00251a,color:#fff
classDef platform fill:#01579b,stroke:#002f6c,color:#fff
classDef apps fill:#4a148c,stroke:#12005e,color:#fff
classDef obs fill:#006064,stroke:#003333,color:#fff
classDef ocp fill:#b71c1c,stroke:#7f0000,color:#fff
classDef workload fill:#6a1b9a,stroke:#38006b,color:#fff
classDef demo fill:#bf360c,stroke:#870000,color:#fff
classDef dr fill:#ef6c00,stroke:#b53d00,color:#fff
Phase 0 — Manual bootstrap (one-time, ~2 hours)¶
Only phase where we run commands directly. After this, everything is Git-driven.
| Step | On | What | Why |
|---|---|---|---|
| 0.1 | Mac | Generate OpenSSL root CA (4096-bit RSA, 10-year) | Trust anchor for Vault intermediate CA |
| 0.2 | Mac | Create 2 empty Git repos: brac-poc-infrastructure, brac-poc-ansible on the existing staxv GitLab (temporarily) |
Source of truth starting point |
| 0.3 | Mac | Provision brac-poc-ops-runner-vm1-dc via kubectl apply of a KubeVirt VM CR (one-time hand-written manifest) |
We need a jump host before anything else |
| 0.4 | ops-runner | Install OpenTofu, Ansible, oc, kubectl, helm, vault CLI, mc, openssl, gh, git |
Tool belt |
| 0.5 | ops-runner | Import OpenSSL root CA material (transferred from Mac) | CA lives here now, per your direction |
| 0.6 | ops-runner | Clone the 2 Git repos, scaffold OpenTofu module skeletons | Ready for Phase 1 |
Total manual commands: ~30-40 lines of shell/YAML. Captured in Phase0-Bootstrap.md (runbook).
Phase 1 — IaC substrate (~3 hours)¶
Everything from here on: edit Git → MR → CI → merge → tool applies.
| Step | Git repo | Target | Purpose |
|---|---|---|---|
| 1.1 | brac-poc-infrastructure |
Terrakube (not yet — still using tofu CLI) | ops-runner-vm1-dr (DR twin of ops-runner) |
| 1.2 | brac-poc-infrastructure |
vault-vm1/2/3-dc + vault-vm1/2/3-dr |
6 Vault VMs |
| 1.3 | brac-poc-ansible |
Vault cluster init | Ansible role vault: install binary, configure Raft, initialise, generate 5 unseal shares (hand out to custodians), unseal, enable PKI secrets engines |
| 1.4 | brac-poc-ansible |
Vault intermediate CA | Generate CSR from Vault → sign with OpenSSL root CA on ops-runner → upload signed cert to Vault |
| 1.5 | brac-poc-infrastructure |
minio-vm1/2/3-dc/dr |
6 MinIO VMs |
| 1.6 | brac-poc-ansible |
MinIO cluster | Install MinIO, configure distributed setup, enable site-replication DC ↔ DR |
| 1.7 | brac-poc-ansible |
Cross-cutting buckets | Create buckets: vault-snapshots, acm-hub-backup, velero, gitlab-backup, jenkins-backup, clickhouse-archive, nexus-blobs, splunk-frozen |
Why this order: Vault before MinIO because Vault Raft snapshot backups go to MinIO in phase 1.7 cron (chicken-and-egg broken by a 30-min gap with manual snapshot in between).
Phase 2 — Automation plane (~2 hours)¶
| Step | Action | Why |
|---|---|---|
| 2.1 | Provision awx-vm1-dc/dr + awx-pg-vm1-dc/dr via OpenTofu CLI |
AWX + its PG |
| 2.2 | Install AWX (Ansible role from brac-poc-ansible) |
GUI for Ansible runs |
| 2.3 | Provision terrakube-vm1-dc/dr + terrakube-pg-vm1-dc/dr |
Terrakube + its PG |
| 2.4 | Install Terrakube (Ansible role) | GUI for OpenTofu runs |
| 2.5 | Configure AWX: connect to brac-poc-ansible repo, create inventory for DC+DR VMs, create job templates per role |
AWX ready to drive playbooks |
| 2.6 | Configure Terrakube: connect to brac-poc-infrastructure repo, create workspaces dc, dr, state stored in MinIO |
Terrakube ready to drive VM provisioning |
Switch over: from 2.7 onwards, no more CLI tofu apply on ops-runner — every future infra change goes through Terrakube UI. Every config change goes through AWX UI. oc apply reserved for OCP GitOps bootstrap only.
Phase 3 — Platform tools tier (~4 hours)¶
In parallel (Terrakube workspace runs concurrently):
| Step | VMs | What |
|---|---|---|
| 3.1 | gitlab-vm1-dc/dr + gitlab-pg-vm1-dc/dr |
GitLab CE + its PG |
| 3.2 | jenkins-vm1-dc/dr |
Jenkins LTS (filesystem state) |
| 3.3 | nexus-vm1-dc/dr |
Nexus OSS, S3 blob store pointed at MinIO |
| 3.4 | Git repos migration | Move brac-poc-infrastructure + brac-poc-ansible + create openshift-platform-gitops on our own GitLab CE (no longer on staxv GitLab) |
After 3.4, we're fully self-hosted.
Phase 4 — Identity + workflow tier (~4 hours)¶
| Step | VMs | What |
|---|---|---|
| 4.1 | keycloak-vm on hub clusters via GitOps |
Keycloak deployed on hubs (per Decision #017) — not VMs — via hub-platform ApplicationSet later. Keycloak is the only exception to the VM-tier rule. |
| 4.2 | wso2-is-vm1/2-dc/dr + wso2-is-pg-vm1-dc/dr |
WSO2 IS cluster + PG — federated with Keycloak (see IDENTITY-STRATEGY.md) |
| 4.3 | wso2-apim-<profile>-vm1-dc/dr (5 profiles × 2 sites = 10 VMs) + shared wso2-apim-pg-vm1-dc/dr |
Distributed APIM |
| 4.4 | temporal-vm1-dc/dr + PG |
Workflow engine |
| 4.5 | n8n-vm1-dc/dr + PG |
No-code automation |
Keycloak deployment order: hubs must exist first (Phase 6) → Keycloak installed via ArgoCD → then WSO2 IS federates. So actually Phase 4.2+ happens after Phase 6-7. Reordered in the runbook.
Phase 5 — Observability + cache tier (~3 hours)¶
| Step | VMs |
|---|---|
| 5.1 | redis-vm1/2/3-dc/dr (Redis + Sentinel combined mode) |
| 5.2 | clickhouse-vm1-dc/dr + ClickHouse Keeper sidecar |
| 5.3 | signoz-vm1-dc/dr (stateless UI → ClickHouse) |
| 5.4 | splunk-vm1-dc/dr (Free edition, 500 MB/day) |
Phase 6 — OpenShift hub clusters (~6 hours)¶
| Step | Cluster | Action |
|---|---|---|
| 6.1 | hub-dc (SNO) | Assisted Installer: 1-node SNO install, FIPS on, Tang encryption (Tang server from hub-dc itself or ops-runner for bootstrap), static networking |
| 6.2 | hub-dc | Day-2: apply install manifests via POST /v2/clusters/{id}/manifests (see OCP-COMPLIANCE-CONSIDERATIONS.md) |
| 6.3 | hub-dc | Manual oc apply: OpenShift GitOps Subscription (the only oc command after Phase 0) |
| 6.4 | hub-dc | Manual oc apply: root ArgoCD Application pointing at openshift-platform-gitops/bootstrap/ |
| 6.5 | hub-dc | ArgoCD auto-syncs: RHACM + MCE + ACS Central + Compliance + COO + RHBK Keycloak + Logging + Loki + Tempo + OTel + External Secrets + cert-manager |
| 6.6 | hub-dr | Provision hub-dr via RHACM + Assisted Installer (as a ManagedCluster) |
| 6.7 | hub-dr | Klusterlet pulls its own config from hub-dc's ArgoCD via pull-mode ApplicationSet |
Only Steps 6.3 + 6.4 use oc apply. Everything else is Git.
Phase 7 — OpenShift spoke clusters (~6 hours)¶
| Step | Cluster | Action |
|---|---|---|
| 7.1 | spoke-dc | ACM-provisioned via ZTP (zero-touch provisioning) or Assisted Installer registered to hub-dc |
| 7.2 | spoke-dc | RHACM klusterlet auto-installed |
| 7.3 | spoke-dc | GitOps add-on OR manual OpenShift GitOps Subscription (per pull-mode setup) |
| 7.4 | spoke-dc | ArgoCD pull-mode receives ApplicationSets targeting spoke role |
| 7.5 | spoke-dr | Same sequence as 7.1-7.4 |
Phase 8 — OCP operators + workloads (via GitOps, ~3 hours to settle)¶
All triggered automatically by merges to openshift-platform-gitops/main:
| Layer | ApplicationSet | Deploys |
|---|---|---|
| All clusters | all-clusters-baseline |
Cert-manager Issuers, OperatorHub source config, PSA defaults, sysctl, auditd |
| Hub only | hub-platform |
RHACM MultiClusterHub, ACS Central, RHBK Keycloak realm brac-poc, COO, Logging, Loki, Tempo |
| Spoke only | spoke-platform |
Compliance Operator ScanSettingBindings, ACS SecuredCluster, External Secrets Operator, AMQ Streams (Kafka) operator install |
| Spoke workloads | spoke-workloads |
Kafka cluster CR, schema registry, sample-app namespaces, OTel collector, service-mesh if we enable it |
| ACM policies | (Policy + PlacementBinding) | Audit profile, allowed registries, default-deny NetworkPolicy, file integrity rules |
Phase 9 — Demo applications (~2 hours)¶
Demo workloads deployed on spoke-dc (and replicated to spoke-dr via ApplicationSet).
| Demo | Deploys | Purpose |
|---|---|---|
| OpenTelemetry Demo (Astronomy Shop) | Helm chart opentelemetry-demo from open-telemetry repo, committed as manifests in GitOps |
Multi-language microservices (Java, Go, .NET, Python, Ruby, JS, etc.) with built-in traffic generator. Traces + metrics + logs visible in SigNoz + Tempo + Loki. |
| Bookinfo (Istio sample) | bookinfo.yaml from istio/samples, committed as kustomize base |
4-service canonical demo (productpage, details, reviews, ratings) for canary/routing/service-mesh demos |
Custom traffic generator (brac-poc-demo-app) |
Small Go service we build — see next doc-needed | Banking-flavored workflow: simulated loan-approval requests, payment-settlement flows, realistic log output, Prometheus metrics, OTel traces |
Details in OBSERVABILITY-DEMOS.md.
Phase 10 — DR drill rehearsals (~2 days)¶
See DR-DRILL-PLAYBOOK.md. Execute drills 1 through 7. Record video of full-site drill for demo day.
Master checklist¶
- Phase 0 bootstrap scripts committed
- Phase 0 runbook: every manual command documented
- Phase 1-5 OpenTofu modules committed to
brac-poc-infrastructure - Phase 1-5 Ansible roles committed to
brac-poc-ansible - Phase 6-9 manifests committed to
openshift-platform-gitops - Every version pinned (per
OPERATOR-CATALOG.md+VM-TIER-ARCHITECTURE.md) - GitLab CI enforces: tofu-fmt, tofu-validate, tflint, ansible-lint, yamllint, kubeconform, gitleaks
- Terrakube workspaces + AWX job templates pointed at the repos
- Nothing deployed without a corresponding Git commit
Single rule¶
If it's not in Git, it didn't happen. oc apply outside Phase 0 / Phase 6.3-6.4 = policy violation. Break-glass exceptions logged as GitLab issues with resolution within 24h.
Created: 2026-04-24 · Owner: Project Lead + DevOps Lead · Status: ready for execution