VM-Tier Architecture¶
All tools and workloads outside OpenShift run as VMs in a DC/DR pattern. This doc is the single source of truth for the VM inventory, naming, sizing, dependencies, and the IaC flow.
Naming convention¶
<app>[-<profile>]-vm<N>-<dc|dr>
Examples:
minio-vm1-dc(3-node MinIO, first node, DC site)minio-vm1-dr(its DR counterpart)vault-vm2-dc(3-node Vault Raft, second node, DC site)wso2-apim-gateway-vm1-dc(distributed APIM, gateway profile, DC)wso2-apim-publisher-vm1-dc(distributed APIM, publisher profile, DC)gitlab-pg-vm1-dc(dedicated PostgreSQL for GitLab, DC)
Full VM inventory¶
| Tier | App | DC VMs | DR VMs | VM spec | Dep PG | Notes |
|---|---|---|---|---|---|---|
| Ops | brac-poc-ops-runner |
-vm1-dc |
-vm1-dr |
4c/8G/100G | — | Jump host, dual-homed (br-real + br26), runs OpenTofu + Ansible + AWX + Terrakube locally at first, then split |
| Secrets | Vault (Raft HA) | vault-vm1/2/3-dc |
vault-vm1/2/3-dr |
4c/8G/60G | — | Raft integrated storage, 3-of-5 unseal share |
| Storage | MinIO (distributed) | minio-vm1/2/3-dc |
minio-vm1/2/3-dr |
4c/16G/500G | — | Erasure-coded (EC:2); bucket-level replication DC → DR |
| Cache | Redis + Sentinel | redis-vm1/2/3-dc |
redis-vm1/2/3-dr |
4c/8G/40G | — | Combined mode (redis-server + sentinel per VM); 1 master + 5 replicas |
| SCM | GitLab CE | gitlab-vm1-dc |
gitlab-vm1-dr |
8c/16G/200G | gitlab-pg-vm1-dc/dr |
Single node per site + dedicated PG |
| CI | Jenkins LTS | jenkins-vm1-dc |
jenkins-vm1-dr |
4c/8G/80G | — | Filesystem state; backup to MinIO |
| Artifacts | Nexus OSS | nexus-vm1-dc |
nexus-vm1-dr |
4c/8G/200G | — | Blob store backed by MinIO (S3 mode) |
| API Mgmt | WSO2 APIM Publisher | wso2-apim-publisher-vm1-dc |
wso2-apim-publisher-vm1-dr |
4c/8G/40G | shared wso2-apim-pg-vm1-dc/dr |
|
| WSO2 APIM DevPortal | wso2-apim-devportal-vm1-dc |
wso2-apim-devportal-vm1-dr |
4c/8G/40G | shared APIM PG | ||
| WSO2 APIM Gateway | wso2-apim-gateway-vm1-dc |
wso2-apim-gateway-vm1-dr |
4c/8G/40G | shared APIM PG | ||
| WSO2 APIM Traffic Mgr | wso2-apim-tm-vm1-dc |
wso2-apim-tm-vm1-dr |
4c/8G/40G | shared APIM PG | ||
| WSO2 APIM Key Mgr | wso2-apim-km-vm1-dc |
wso2-apim-km-vm1-dr |
4c/8G/40G | shared APIM PG | ||
| Identity | WSO2 IS (clustered) | wso2-is-vm1/2-dc |
wso2-is-vm1/2-dr |
4c/8G/40G | wso2-is-pg-vm1-dc/dr |
2-node active-active per site |
| Workflow | Temporal | temporal-vm1-dc |
temporal-vm1-dr |
4c/8G/60G | temporal-pg-vm1-dc/dr |
server + UI + workers on same VM |
| Workflow | n8n | n8n-vm1-dc |
n8n-vm1-dr |
4c/8G/40G | n8n-pg-vm1-dc/dr |
|
| SIEM | Splunk Free | splunk-vm1-dc |
splunk-vm1-dr |
4c/8G/200G | — | Free tier (500 MB/day cap), 2-day retention |
| Obs demo | SigNoz (UI + backend) | signoz-vm1-dc |
signoz-vm1-dr |
4c/8G/40G | — | Stateless; reads from ClickHouse |
| Obs demo | ClickHouse | clickhouse-vm1-dc |
clickhouse-vm1-dr |
8c/16G/400G | — | 1 shard × 2 replicas (DC + DR); ClickHouse Keeper sidecar |
| IaC GUI | AWX (Ansible) | awx-vm1-dc |
awx-vm1-dr |
4c/8G/40G | awx-pg-vm1-dc/dr |
Can colocate with Terrakube; splitting for clarity |
| IaC GUI | Terrakube (OpenTofu) | terrakube-vm1-dc |
terrakube-vm1-dr |
4c/8G/40G | terrakube-pg-vm1-dc/dr |
|
| Security (SAST) | SonarQube CE | sonarqube-vm1-dc |
sonarqube-vm1-dr |
4c/8G/80G | sonarqube-pg-vm1-dc/dr |
GitLab CI quality gate; exposed at sonarqube.apps.brac-poc... |
| Security (vuln scan) | OpenVAS / Greenbone CE | openvas-vm1-dc |
openvas-vm1-dr |
4c/8G/100G | — | Daily VM-fleet + network scans (replaces Nessus unless license) |
Summary count¶
| Category | VMs |
|---|---|
| Ops runner | 2 |
| Vault | 6 |
| MinIO | 6 |
| Redis | 6 |
| GitLab + PG | 4 |
| Jenkins | 2 |
| Nexus | 2 |
| WSO2 APIM (5 profiles × 2 sites) + shared PG | 12 |
| WSO2 IS (2 × 2 sites) + PG | 6 |
| Temporal + PG | 4 |
| n8n + PG | 4 |
| Splunk | 2 |
| SigNoz + ClickHouse | 4 |
| AWX + Terrakube + PGs | 8 |
| SonarQube + PG | 4 |
| OpenVAS | 2 |
| Total VM count | 74 |
| OpenShift nodes on top | 8 (2 SNO + 6 compact spoke) |
| Grand total | 76 VMs |
Network layout¶
All VMs get dual NICs:
- br-real (
59.153.29.96/27): public-ish LAN, only the ops-runner has a real public IP; others have br-real IPs for OpenShift API/routes and HAProxy ingress - br26 (
26.26.0.0/16): private cluster/VM network, used for VM↔VM traffic, DB replication, ACM hub↔spoke
Static IP plan (subset — full table in memory project_vm_network_plan.md):
| VM | br-real | br26 |
|---|---|---|
brac-poc-ops-runner-vm1-dc |
59.153.29.101 (public) |
26.26.200.5 |
brac-poc-ops-runner-vm1-dr |
59.153.29.102 |
26.26.200.6 |
vault-vm1-dc |
— | 26.26.200.10 |
vault-vm2-dc |
— | 26.26.200.11 |
vault-vm3-dc |
— | 26.26.200.12 |
minio-vm1/2/3-dc |
— | 26.26.200.20-22 |
minio-vm1/2/3-dr |
— | 26.26.200.23-25 |
(... ClickHouse, Redis, etc. follow the same 26.26.200.x progression) |
||
| OCP cluster nodes | Per cluster VIP | 26.26.210.x range |
Per-VM cloud-init defaults¶
```yaml
cloud-config¶
hostname:
Per-VM Ansible roles add: - ops-runner-vm: OpenTofu, Ansible, AWX CLI, Terrakube CLI, oc, kubectl, helm, openshift-install, vault, mc, openssl, gh, ansible, docker - vault-vm: Vault OSS, systemd service, Raft config, TLS material - minio-vm: MinIO server, systemd service, erasure-coded data dirs - ... etc. (roles defined in Ansible inventory, version-pinned)
Tool versions (locked)¶
All operator versions from OPERATOR-CATALOG.md. VM apps pinned:
| App | Version |
|---|---|
| Ubuntu (all VMs) | 24.04 Server (Noble) cloud image |
| OpenTofu | 1.8.x (latest stable) |
| Ansible | 11.x (core 2.19) |
| AWX | 25.x (latest stable) |
| Terrakube | 2.x (latest stable) |
| Vault OSS | 1.19.x |
| MinIO | Server RELEASE.2025.xx.xx (latest LTS) |
| Redis OSS | 7.4 LTS |
| GitLab CE | 17.x latest |
| Jenkins LTS | 2.452.x |
| Nexus OSS | 3.76.x |
| WSO2 APIM | 4.4.0 |
| WSO2 IS | 7.1.0 |
| Temporal | 1.27.x |
| n8n | 1.76.x (Community) |
| Splunk | Free 9.3.x |
| SigNoz | 0.56.x |
| ClickHouse | 24.x LTS |
| PostgreSQL (per-app) | 16.x |
IaC flow — OpenTofu + Ansible, managed via AWX + Terrakube¶
flowchart LR
Eng["Engineer"] -->|"SSH to ops-runner-dc<br/>OR use AWX/Terrakube UI"| Opsrunner["ops-runner-vm1-dc"]
Opsrunner -->|kubevirt-manifests<br/>OR libvirt API| Cluster["staxv KubeVirt<br/>on RKE2"]
Cluster -->|creates VMs| VMs["Target VMs"]
Git["GitLab:<br/>brac-poc-infrastructure<br/>(OpenTofu modules)<br/>brac-poc-ansible<br/>(playbooks + roles)"] -.-> Opsrunner
Git -.-> Terrakube
Git -.-> AWX
Terrakube["Terrakube<br/>(OpenTofu UI)"] -->|runs tofu apply| Cluster
AWX["AWX<br/>(Ansible UI)"] -->|runs playbooks<br/>against inventory| VMs
Vault[("Vault<br/>secrets store")] -.->|"External<br/>Secrets"| VMs
Vault -.-> AWX
Vault -.-> Terrakube
Two Git repos for IaC¶
brac-poc-infrastructure(OpenTofu modules)modules/kubevirt-vm/— generic VM creation modulemodules/postgresql-vm/— PG VM with initial DB setupenvs/dc/— DC environment (one.tfper VM tier)envs/dr/— DR environment- Git-driven via Terrakube workspaces
brac-poc-ansible(configuration playbooks)inventory/dc.yml,inventory/dr.ymlroles/vault/,roles/minio/,roles/wso2-apim-gateway/, etc.- Executed via AWX Job Templates
PR workflow¶
``` Engineer opens MR on brac-poc-infrastructure → GitLab CI: tofu fmt + tofu validate + tflint → Review + approve → Merge to main → Terrakube webhook → tofu plan on Terrakube UI → Human approves plan → tofu apply → VMs provisioned
Engineer opens MR on brac-poc-ansible → GitLab CI: ansible-lint + yamllint + molecule (where applicable) → Review + approve → Merge to main → AWX webhook → Job Template runs against new VMs → Ansible configures + installs tools ```
Every VM lifecycle change is a Git MR. No SSH-as-root-to-configure drift.
AWX + Terrakube bootstrap chicken-and-egg¶
Both of these tools need VMs themselves. Bootstrap order:
- ops-runner provisioned manually (OpenTofu CLI on Mac → KubeVirt manifests → staxv cluster)
- On ops-runner, install OpenTofu + Ansible CLI
- Using OpenTofu CLI from ops-runner, provision AWX + Terrakube VMs
- Configure AWX + Terrakube via Ansible (ops-runner runs playbooks)
- From now on, all subsequent VM provisioning goes through Terrakube; all config via AWX
Total manual steps: ~3 (ops-runner provision, AWX/Terrakube provision, initial AWX/Terrakube config). After that, everything is self-service via the two UIs.
How dashboards are reached¶
All web UIs on VMs are exposed through the staxv HAProxy (existing, at 59.153.29.115) with host-header routing. Example records:
| Hostname | → HAProxy → backend VM:port |
|---|---|
minio.apps.brac-poc.comptech-lab.com |
minio-vm1-dc:9001 (console) |
redis.apps.brac-poc.comptech-lab.com |
Redis Insight on one VM |
temporal.apps.brac-poc.comptech-lab.com |
temporal-vm1-dc:8080 (UI) |
n8n.apps.brac-poc.comptech-lab.com |
n8n-vm1-dc:5678 |
signoz.apps.brac-poc.comptech-lab.com |
signoz-vm1-dc:3301 |
splunk.apps.brac-poc.comptech-lab.com |
splunk-vm1-dc:8000 |
awx.apps.brac-poc.comptech-lab.com |
awx-vm1-dc:443 |
terrakube.apps.brac-poc.comptech-lab.com |
terrakube-vm1-dc:8080 |
gitlab.apps.brac-poc.comptech-lab.com |
gitlab-vm1-dc:443 |
jenkins.apps.brac-poc.comptech-lab.com |
jenkins-vm1-dc:8080 |
nexus.apps.brac-poc.comptech-lab.com |
nexus-vm1-dc:8081 |
wso2-publisher.apps.brac-poc.comptech-lab.com |
wso2-apim-publisher-vm1-dc:9443 |
wso2-devportal.apps.brac-poc.comptech-lab.com |
wso2-apim-devportal-vm1-dc:9443 |
vault.apps.brac-poc.comptech-lab.com |
vault-vm1-dc:8200 |
TLS termination at HAProxy with certs from cert-manager + Cloudflare DNS-01 (existing setup). Host-based routing in HAProxy ConfigMap.
Capacity¶
| Tier | vCPU sum | RAM sum |
|---|---|---|
| All 68 VMs @ default | 280 | 600 GB |
| OCP (2 SNO hubs + 2 compact spokes with 32c/64G nodes) | 320 | 384 GB |
| Grand total | 600 vCPU · 984 GB | |
| Hardware (dl385-2 + gold-1 + gold-2) | 768 logical vCPU · 2.5 TB | |
| Utilisation | 78% CPU · 39% RAM |
With KubeVirt CPU overcommit (VMs rarely run at 100%), effective utilisation is well under the limit.
Created: 2026-04-24 · Owner: Infrastructure Lead · Status: ready for execution