Skip to content

VM-Tier Architecture

All tools and workloads outside OpenShift run as VMs in a DC/DR pattern. This doc is the single source of truth for the VM inventory, naming, sizing, dependencies, and the IaC flow.


Naming convention

<app>[-<profile>]-vm<N>-<dc|dr>

Examples:

  • minio-vm1-dc (3-node MinIO, first node, DC site)
  • minio-vm1-dr (its DR counterpart)
  • vault-vm2-dc (3-node Vault Raft, second node, DC site)
  • wso2-apim-gateway-vm1-dc (distributed APIM, gateway profile, DC)
  • wso2-apim-publisher-vm1-dc (distributed APIM, publisher profile, DC)
  • gitlab-pg-vm1-dc (dedicated PostgreSQL for GitLab, DC)

Full VM inventory

Tier App DC VMs DR VMs VM spec Dep PG Notes
Ops brac-poc-ops-runner -vm1-dc -vm1-dr 4c/8G/100G Jump host, dual-homed (br-real + br26), runs OpenTofu + Ansible + AWX + Terrakube locally at first, then split
Secrets Vault (Raft HA) vault-vm1/2/3-dc vault-vm1/2/3-dr 4c/8G/60G Raft integrated storage, 3-of-5 unseal share
Storage MinIO (distributed) minio-vm1/2/3-dc minio-vm1/2/3-dr 4c/16G/500G Erasure-coded (EC:2); bucket-level replication DC → DR
Cache Redis + Sentinel redis-vm1/2/3-dc redis-vm1/2/3-dr 4c/8G/40G Combined mode (redis-server + sentinel per VM); 1 master + 5 replicas
SCM GitLab CE gitlab-vm1-dc gitlab-vm1-dr 8c/16G/200G gitlab-pg-vm1-dc/dr Single node per site + dedicated PG
CI Jenkins LTS jenkins-vm1-dc jenkins-vm1-dr 4c/8G/80G Filesystem state; backup to MinIO
Artifacts Nexus OSS nexus-vm1-dc nexus-vm1-dr 4c/8G/200G Blob store backed by MinIO (S3 mode)
API Mgmt WSO2 APIM Publisher wso2-apim-publisher-vm1-dc wso2-apim-publisher-vm1-dr 4c/8G/40G shared wso2-apim-pg-vm1-dc/dr
WSO2 APIM DevPortal wso2-apim-devportal-vm1-dc wso2-apim-devportal-vm1-dr 4c/8G/40G shared APIM PG
WSO2 APIM Gateway wso2-apim-gateway-vm1-dc wso2-apim-gateway-vm1-dr 4c/8G/40G shared APIM PG
WSO2 APIM Traffic Mgr wso2-apim-tm-vm1-dc wso2-apim-tm-vm1-dr 4c/8G/40G shared APIM PG
WSO2 APIM Key Mgr wso2-apim-km-vm1-dc wso2-apim-km-vm1-dr 4c/8G/40G shared APIM PG
Identity WSO2 IS (clustered) wso2-is-vm1/2-dc wso2-is-vm1/2-dr 4c/8G/40G wso2-is-pg-vm1-dc/dr 2-node active-active per site
Workflow Temporal temporal-vm1-dc temporal-vm1-dr 4c/8G/60G temporal-pg-vm1-dc/dr server + UI + workers on same VM
Workflow n8n n8n-vm1-dc n8n-vm1-dr 4c/8G/40G n8n-pg-vm1-dc/dr
SIEM Splunk Free splunk-vm1-dc splunk-vm1-dr 4c/8G/200G Free tier (500 MB/day cap), 2-day retention
Obs demo SigNoz (UI + backend) signoz-vm1-dc signoz-vm1-dr 4c/8G/40G Stateless; reads from ClickHouse
Obs demo ClickHouse clickhouse-vm1-dc clickhouse-vm1-dr 8c/16G/400G 1 shard × 2 replicas (DC + DR); ClickHouse Keeper sidecar
IaC GUI AWX (Ansible) awx-vm1-dc awx-vm1-dr 4c/8G/40G awx-pg-vm1-dc/dr Can colocate with Terrakube; splitting for clarity
IaC GUI Terrakube (OpenTofu) terrakube-vm1-dc terrakube-vm1-dr 4c/8G/40G terrakube-pg-vm1-dc/dr
Security (SAST) SonarQube CE sonarqube-vm1-dc sonarqube-vm1-dr 4c/8G/80G sonarqube-pg-vm1-dc/dr GitLab CI quality gate; exposed at sonarqube.apps.brac-poc...
Security (vuln scan) OpenVAS / Greenbone CE openvas-vm1-dc openvas-vm1-dr 4c/8G/100G Daily VM-fleet + network scans (replaces Nessus unless license)

Summary count

Category VMs
Ops runner 2
Vault 6
MinIO 6
Redis 6
GitLab + PG 4
Jenkins 2
Nexus 2
WSO2 APIM (5 profiles × 2 sites) + shared PG 12
WSO2 IS (2 × 2 sites) + PG 6
Temporal + PG 4
n8n + PG 4
Splunk 2
SigNoz + ClickHouse 4
AWX + Terrakube + PGs 8
SonarQube + PG 4
OpenVAS 2
Total VM count 74
OpenShift nodes on top 8 (2 SNO + 6 compact spoke)
Grand total 76 VMs

Network layout

All VMs get dual NICs:

  • br-real (59.153.29.96/27): public-ish LAN, only the ops-runner has a real public IP; others have br-real IPs for OpenShift API/routes and HAProxy ingress
  • br26 (26.26.0.0/16): private cluster/VM network, used for VM↔VM traffic, DB replication, ACM hub↔spoke

Static IP plan (subset — full table in memory project_vm_network_plan.md):

VM br-real br26
brac-poc-ops-runner-vm1-dc 59.153.29.101 (public) 26.26.200.5
brac-poc-ops-runner-vm1-dr 59.153.29.102 26.26.200.6
vault-vm1-dc 26.26.200.10
vault-vm2-dc 26.26.200.11
vault-vm3-dc 26.26.200.12
minio-vm1/2/3-dc 26.26.200.20-22
minio-vm1/2/3-dr 26.26.200.23-25
(... ClickHouse, Redis, etc. follow the same 26.26.200.x progression)
OCP cluster nodes Per cluster VIP 26.26.210.x range

Per-VM cloud-init defaults

```yaml

cloud-config

hostname: fqdn: .internal users: - name: ze sudo: ALL=(ALL) NOPASSWD:ALL shell: /bin/bash lock_passwd: false ssh_authorized_keys: # injected from memory/feedback_vm_ssh_keys.md - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPRZR/OKzy7ZQVAJmAYxwbdS5ad/7hlxGYDcOgbW4IMv zeshaq@gmail.com # Mac - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAqC+FI1uhXDr1s4+xG3mTbuEFNt0cGKUdCxNQaKjI3G ze@ze # dl385-2 # ops-runner pubkey appended after its creation ssh_pwauth: true chpasswd: expire: false list: | ze:ze package_update: true package_upgrade: false # keep boot fast; Ansible handles updates later packages: - qemu-guest-agent - chrony - openssh-server - curl - jq - htop - tmux - net-tools - ca-certificates runcmd: - systemctl enable --now qemu-guest-agent chrony ```

Per-VM Ansible roles add: - ops-runner-vm: OpenTofu, Ansible, AWX CLI, Terrakube CLI, oc, kubectl, helm, openshift-install, vault, mc, openssl, gh, ansible, docker - vault-vm: Vault OSS, systemd service, Raft config, TLS material - minio-vm: MinIO server, systemd service, erasure-coded data dirs - ... etc. (roles defined in Ansible inventory, version-pinned)


Tool versions (locked)

All operator versions from OPERATOR-CATALOG.md. VM apps pinned:

App Version
Ubuntu (all VMs) 24.04 Server (Noble) cloud image
OpenTofu 1.8.x (latest stable)
Ansible 11.x (core 2.19)
AWX 25.x (latest stable)
Terrakube 2.x (latest stable)
Vault OSS 1.19.x
MinIO Server RELEASE.2025.xx.xx (latest LTS)
Redis OSS 7.4 LTS
GitLab CE 17.x latest
Jenkins LTS 2.452.x
Nexus OSS 3.76.x
WSO2 APIM 4.4.0
WSO2 IS 7.1.0
Temporal 1.27.x
n8n 1.76.x (Community)
Splunk Free 9.3.x
SigNoz 0.56.x
ClickHouse 24.x LTS
PostgreSQL (per-app) 16.x

IaC flow — OpenTofu + Ansible, managed via AWX + Terrakube

flowchart LR
    Eng["Engineer"] -->|"SSH to ops-runner-dc<br/>OR use AWX/Terrakube UI"| Opsrunner["ops-runner-vm1-dc"]
    Opsrunner -->|kubevirt-manifests<br/>OR libvirt API| Cluster["staxv KubeVirt<br/>on RKE2"]
    Cluster -->|creates VMs| VMs["Target VMs"]

    Git["GitLab:<br/>brac-poc-infrastructure<br/>(OpenTofu modules)<br/>brac-poc-ansible<br/>(playbooks + roles)"] -.-> Opsrunner
    Git -.-> Terrakube
    Git -.-> AWX

    Terrakube["Terrakube<br/>(OpenTofu UI)"] -->|runs tofu apply| Cluster
    AWX["AWX<br/>(Ansible UI)"] -->|runs playbooks<br/>against inventory| VMs

    Vault[("Vault<br/>secrets store")] -.->|"External<br/>Secrets"| VMs
    Vault -.-> AWX
    Vault -.-> Terrakube

Two Git repos for IaC

  1. brac-poc-infrastructure (OpenTofu modules)
  2. modules/kubevirt-vm/ — generic VM creation module
  3. modules/postgresql-vm/ — PG VM with initial DB setup
  4. envs/dc/ — DC environment (one .tf per VM tier)
  5. envs/dr/ — DR environment
  6. Git-driven via Terrakube workspaces
  7. brac-poc-ansible (configuration playbooks)
  8. inventory/dc.yml, inventory/dr.yml
  9. roles/vault/, roles/minio/, roles/wso2-apim-gateway/, etc.
  10. Executed via AWX Job Templates

PR workflow

``` Engineer opens MR on brac-poc-infrastructure → GitLab CI: tofu fmt + tofu validate + tflint → Review + approve → Merge to main → Terrakube webhook → tofu plan on Terrakube UI → Human approves plan → tofu apply → VMs provisioned

Engineer opens MR on brac-poc-ansible → GitLab CI: ansible-lint + yamllint + molecule (where applicable) → Review + approve → Merge to main → AWX webhook → Job Template runs against new VMs → Ansible configures + installs tools ```

Every VM lifecycle change is a Git MR. No SSH-as-root-to-configure drift.


AWX + Terrakube bootstrap chicken-and-egg

Both of these tools need VMs themselves. Bootstrap order:

  1. ops-runner provisioned manually (OpenTofu CLI on Mac → KubeVirt manifests → staxv cluster)
  2. On ops-runner, install OpenTofu + Ansible CLI
  3. Using OpenTofu CLI from ops-runner, provision AWX + Terrakube VMs
  4. Configure AWX + Terrakube via Ansible (ops-runner runs playbooks)
  5. From now on, all subsequent VM provisioning goes through Terrakube; all config via AWX

Total manual steps: ~3 (ops-runner provision, AWX/Terrakube provision, initial AWX/Terrakube config). After that, everything is self-service via the two UIs.


How dashboards are reached

All web UIs on VMs are exposed through the staxv HAProxy (existing, at 59.153.29.115) with host-header routing. Example records:

Hostname → HAProxy → backend VM:port
minio.apps.brac-poc.comptech-lab.com minio-vm1-dc:9001 (console)
redis.apps.brac-poc.comptech-lab.com Redis Insight on one VM
temporal.apps.brac-poc.comptech-lab.com temporal-vm1-dc:8080 (UI)
n8n.apps.brac-poc.comptech-lab.com n8n-vm1-dc:5678
signoz.apps.brac-poc.comptech-lab.com signoz-vm1-dc:3301
splunk.apps.brac-poc.comptech-lab.com splunk-vm1-dc:8000
awx.apps.brac-poc.comptech-lab.com awx-vm1-dc:443
terrakube.apps.brac-poc.comptech-lab.com terrakube-vm1-dc:8080
gitlab.apps.brac-poc.comptech-lab.com gitlab-vm1-dc:443
jenkins.apps.brac-poc.comptech-lab.com jenkins-vm1-dc:8080
nexus.apps.brac-poc.comptech-lab.com nexus-vm1-dc:8081
wso2-publisher.apps.brac-poc.comptech-lab.com wso2-apim-publisher-vm1-dc:9443
wso2-devportal.apps.brac-poc.comptech-lab.com wso2-apim-devportal-vm1-dc:9443
vault.apps.brac-poc.comptech-lab.com vault-vm1-dc:8200

TLS termination at HAProxy with certs from cert-manager + Cloudflare DNS-01 (existing setup). Host-based routing in HAProxy ConfigMap.


Capacity

Tier vCPU sum RAM sum
All 68 VMs @ default 280 600 GB
OCP (2 SNO hubs + 2 compact spokes with 32c/64G nodes) 320 384 GB
Grand total 600 vCPU · 984 GB
Hardware (dl385-2 + gold-1 + gold-2) 768 logical vCPU · 2.5 TB
Utilisation 78% CPU · 39% RAM

With KubeVirt CPU overcommit (VMs rarely run at 100%), effective utilisation is well under the limit.


Created: 2026-04-24 · Owner: Infrastructure Lead · Status: ready for execution