Skip to content

RHACM Hub-Spoke Architecture

End-to-end architecture for the 4-cluster DC/DR deployment of the BRAC Bank OpenShift POC, using Red Hat Advanced Cluster Management (RHACM) + OpenShift GitOps for multi-cluster orchestration.


The four clusters

flowchart TB
    subgraph DC["๐ŸŸข DC site โ€” ACTIVE"]
        direction LR
        HUB_DC["<b>hub-dc</b><br/>SNO ยท 64c/64G/1TB<br/>api.hub-dc.opp.brac-poc.comptech-lab.com"]
        SPOKE_DC["<b>spoke-dc</b><br/>compact 3-node ยท 192c/192G/3TB<br/>api.spoke-dc.opp.brac-poc.comptech-lab.com"]
    end

    subgraph DR["๐ŸŸ  DR site โ€” HOT STANDBY"]
        direction LR
        HUB_DR["<b>hub-dr</b><br/>SNO ยท 64c/64G/1TB<br/>api.hub-dr.opp.brac-poc.comptech-lab.com"]
        SPOKE_DR["<b>spoke-dr</b><br/>compact 3-node ยท 192c/192G/3TB<br/>api.spoke-dr.opp.brac-poc.comptech-lab.com"]
    end

    HUB_DC -.->|ACM cluster-backup<br/>etcd state sync| HUB_DR
    SPOKE_DC -.->|workload sync<br/>(ArgoCD ApplicationSet<br/>targeting both spokes)| SPOKE_DR

    HUB_DC -->|<b>manages</b>:<br/>ACM ManagedCluster<br/>Policy ยท ArgoCD| SPOKE_DC
    HUB_DR -.->|<b>manages (standby)</b>| SPOKE_DR

    classDef active fill:#2e7d32,stroke:#1b5e20,color:#fff
    classDef standby fill:#ef6c00,stroke:#e65100,color:#fff
    class HUB_DC,SPOKE_DC active
    class HUB_DR,SPOKE_DR standby

Cluster roles

Cluster Topology Role What runs here
hub-dc SNO (single node, 64c/64GB/1TB) Primary management RHACM multicluster hub, ACS Central, OpenShift GitOps (ArgoCD), Compliance Operator, Cluster Observability Operator, Keycloak
hub-dr SNO (same specs as hub-dc) DR management hot-standby Same operators as hub-dc; ACM cluster-backup restores config on failover
spoke-dc Compact 3-node (3 ร— 64c/64GB/1TB โ†’ 192c/192GB/3TB) Primary workload cluster All 9 POC components (Kafka, Redis, OTel/SigNoz, WSO2, middleware, JBoss, Trivy, Nexus, sample apps)
spoke-dr Compact 3-node (same specs as spoke-dc) DR workload hot-standby Same workloads as spoke-dc; ArgoCD ApplicationSets target both spokes

Total hardware: 512 vCPU + 512 GB RAM + 8 TB storage across 4 clusters. Fits comfortably within the 768 vCPU / 2.5 TB / 23 TB available on dl385-2 + gold-1 + gold-2 (67% CPU, 20% RAM, 35% storage).


Responsibilities: hub vs spoke

Concern Hub (hub-dc / hub-dr) Spoke (spoke-dc / spoke-dr)
Cluster management RHACM registers, monitors, policy-enforces spokes Reports health/status to hub
Identity provider Keycloak hosted here (OIDC for all 4 clusters) Consumes OIDC from hub-Keycloak
Security posture ACS Central โ€” aggregates findings from all managed clusters ACS SecuredCluster agent reports to hub
Compliance scans Scans the hub itself; aggregates results from spokes Compliance Operator + PCI-DSS/CIS scans run here too
Observability Cluster Observability Operator โ€” collects cross-cluster metrics Ships telemetry to hub observability
GitOps control plane OpenShift GitOps (ArgoCD) source of truth ArgoCD agent (pull mode) or hub-push target
Workload hosting โŒ Not the place for POC workloads โœ… All 9 POC components run here

Rule: workloads live on spokes. Hubs are for the management fabric only.


Domain plan

Zone / pattern Scope Served by
brac-poc.comptech-lab.com Root project domain Cloudflare (delegated to PowerDNS for subzones)
opp.brac-poc.comptech-lab.com OpenShift umbrella zone PowerDNS internal + delegated on Cloudflare
api.<cluster-name>.opp.brac-poc.comptech-lab.com Per-cluster Kubernetes API HAProxy TCP-LB โ†’ cluster API VIP
*.routes.<cluster-name>.opp.brac-poc.comptech-lab.com OCP default ingress routes (*.apps.* is replaced by *.routes.* per our convention) HAProxy TCP-LB โ†’ cluster ingress VIP
*.apps.brac-poc.comptech-lab.com Platform tools OUTSIDE the OCP clusters (docs site, Keycloak in front, monitoring consoles, etc.) rke2-ingress-nginx on hosting platform (current convention โ€” not OCP)

Concrete records (partial โ€” full list to be drafted in a dedicated DNS + certs plan):

``` api.hub-dc.opp.brac-poc.comptech-lab.com โ†’ HAProxy โ†’ hub-dc API (:6443) api.hub-dr.opp.brac-poc.comptech-lab.com โ†’ HAProxy โ†’ hub-dr API (:6443) api.spoke-dc.opp.brac-poc.comptech-lab.com โ†’ HAProxy โ†’ spoke-dc API VIP api.spoke-dr.opp.brac-poc.comptech-lab.com โ†’ HAProxy โ†’ spoke-dr API VIP

.routes.hub-dc.opp.brac-poc.comptech-lab.com โ†’ HAProxy โ†’ hub-dc ingress router .routes.hub-dr.opp.brac-poc.comptech-lab.com โ†’ HAProxy โ†’ hub-dr ingress router .routes.spoke-dc.opp.brac-poc.comptech-lab.com โ†’ HAProxy โ†’ spoke-dc ingress router .routes.spoke-dr.opp.brac-poc.comptech-lab.com โ†’ HAProxy โ†’ spoke-dr ingress router ```

TLS: wildcard certs from Let's Encrypt via cert-manager using the Cloudflare DNS-01 solver (existing). Each *.routes.<cluster>.opp... wildcard gets its own Certificate resource; all automatically provisioned by a single bootstrap script.


DR failover model (active-passive)

Steady state: hub-dc + spoke-dc are active; hub-dr + spoke-dr are hot standbys synced via ACM's cluster-backup operator + ArgoCD ApplicationSets targeting both spokes simultaneously.

Failover trigger: hub-dc or spoke-dc becomes unreachable.

What flips: 1. HAProxy backends โ€” API and routes DNS/LB points switch from -dc endpoints to -dr endpoints (health-check-driven). 2. ACM's cluster-backup-chart restores critical CRs (Placement, Policy, GitOpsCluster, Keycloak realms) onto hub-dr. 3. Workloads on spoke-dr are already running (hot); clients reach them through updated DNS/HAProxy.

POC scope: we demonstrate the topology + a scripted "simulate DC failure" drill. Fully automated failover orchestration is Phase 2.


GitOps control plane (summary)

All cluster configuration + workload deployment flows through Git. See GITOPS-ARCHITECTURE.md for the full design. In short:

  • Repo: openshift-platform-gitops on the self-hosted GitLab (deployed on spoke-dc as part of Issue #4)
  • Pattern: App-of-Apps + ApplicationSets, with pull-mode propagation from hub ArgoCD to spoke ArgoCD instances
  • Integration: RHACM GitOpsCluster + Placement + ManagedClusterSetBinding wire managed clusters into hub ArgoCD's known-destinations list
  • Policies: ACM Policy enforces baseline config (compliance, security, RBAC); ArgoCD handles application deployment

Operator catalog (Red Hat only)

Per DECISION #024, only Red Hat operators are installed; no community operators. See OPERATOR-CATALOG.md for the full matrix of each operator + channel + target cluster.

Headline list:

Operator Channel (pre-install verification required) Where
Advanced Cluster Management for Kubernetes release-2.13 hub-dc, hub-dr
Red Hat Advanced Cluster Security for Kubernetes (ACS Central) stable hub-dc, hub-dr
OpenShift GitOps gitops-1.18 hub-dc, hub-dr + (pull add-on) spokes
Compliance Operator stable all 4 clusters
Cluster Observability Operator stable hub-dc, hub-dr
Red Hat Build of Keycloak stable-v26 hub-dc, hub-dr
(spoke-only) OpenShift Data Foundation stable-4.21 spoke-dc, spoke-dr (if needed for workload storage)

Process: before each operator install, check the Red Hat ecosystem catalog / OperatorHub for the current recommended channel and version; pin the specific CSV for reproducibility.


Bootstrap order (install sequence)

flowchart LR
    A[1. Provision hub-dc SNO] --> B[2. Day-2 ops:<br/>OIDC, audit profile,<br/>TLS, registries]
    B --> C[3. Install<br/>OpenShift GitOps<br/>on hub-dc]
    C --> D[4. Install RHACM<br/>on hub-dc]
    D --> E[5. Provision hub-dr +<br/>spoke-dc + spoke-dr<br/>via ACM + ZTP]
    E --> F[6. GitOpsCluster +<br/>Placement wire<br/>clusters to ArgoCD]
    F --> G[7. Root ApplicationSet<br/>syncs all operators<br/>+ workloads]
    G --> H[8. ACM Policies enforce<br/>compliance baseline<br/>on all 4 clusters]

Only step 1 and step 3 involve manual oc usage (installer + initial GitOps subscription). Everything from step 4 onward is Git-driven.


References


Created: 2026-04-24 ยท Status: Ready for execution ยท Owner: Project Lead