OpenShift Assisted Installer Deployment Plan¶

Status: Ready to implement (in an execution session, on feature branch infra/ocp-assisted-installer) Target: OCP 4.21.9 via full-ISO + static networking + Tang disk encryption Runs on: dedicated ops VM (brac-ops-runner), invoked from engineer workstation via SSH

Architecture recap¶

``` brac-ops-runner VM (internal 10.x.x.x) │ ├─ talks to Red Hat Assisted Installer API (api.openshift.com) │ via offline-token → access-token exchange │ ├─ generates install-config values + NMState YAML per host ├─ downloads discovery ISO, uploads to the hosting platform as bootable image │ └─ creates OCP VMs (bootstrap + 3 masters + 3 workers) each with: stable MAC address bridge interface on the internal VM subnet (static IP) Tang-client disk encryption CDROM = discovery ISO

Tang server (Deployment in openshift-brac ns on the hosting platform) → internal LoadBalancer IP → serves keys to OCP nodes at boot for LUKS unlock

haproxy-brac (dedicated TCP LB for the OCP cluster) → fronts api:6443, api-int:6443, 22623, 443, 80 → TCP-LBs to masters/workers

PowerDNS brac-poc.comptech-lab.com zone → api / api-int / *.apps point to haproxy-brac ```

Script inventory¶

All scripts live in scripts/ocp/ on feature branch. Designed to run on brac-ops-runner (which has curl, jq, kubectl, openshift-install, oc pre-installed plus any hosting-platform CLIs needed to create VMs).

#	Script	Purpose
01	`token.sh`	Exchange offline token → access token; cache for ~10 min. Sourced by every other script.
02	`tang-deploy.sh`	Deploy Tang server as Deployment + Service with IP from `lan-pool-26`; extract thumbprint
03	`cluster-create.sh`	`POST /v2/clusters` — register cluster definition with disk_encryption, VIPs, pull-secret, SSH keys. Writes `cluster_id.txt`
04	`infra-env-create.sh`	`POST /v2/infra-envs` — build `static_network_config` array (one entry per host with NMState + MAC), `image_type: full-iso`. Writes `infra_env_id.txt`
05	`iso-download.sh`	`GET /v2/infra-envs/{id}/downloads/image` → save as `discovery.iso` on the runner. Upload as a bootable image to the hosting platform.
06	`vms-create.sh`	Apply VM manifests on the hosting platform for bootstrap + 3 masters + 3 workers. Each manifest pins MAC, attaches bridge interface with the static IP for its role, attaches the ISO as CDROM, sizes root disk, sets boot order.
07	`hosts-configure.sh`	Poll `GET /v2/infra-envs/{id}/hosts` until all 7 hosts registered. Then `PATCH` each host with hostname, role (master/worker/bootstrap), `installation_disk_id`.
08	`install.sh`	Wait for cluster `status: ready`, then `POST /v2/clusters/{id}/actions/install`. Poll cluster + host progress, print summary every 30s.
09	`credentials.sh`	Download kubeconfig + kubeadmin-password to `./creds/`. Print the console URL and login string.
10	`teardown.sh`	Delete VMs, delete DataVolumes, delete Tang deployment, `DELETE /v2/clusters/{id}` + `/v2/infra-envs/{id}`. Idempotent.
—	`deploy-all.sh`	Orchestrator: runs 01 through 09 with timing + pause-on-error. Acceptance test = `oc get nodes` returns 7 Ready at end.

Inputs (one config file)¶

scripts/ocp/config.env (gitignored; not committed — kept on runner):

```bash

--- cluster ---¶

export CLUSTER_NAME="brac-poc" export OCP_VERSION="4.21.9" export BASE_DOMAIN="comptech-lab.com"

--- networking (external VIPs via haproxy-brac) ---¶

export API_VIP="59.153.29.101" export INGRESS_VIP="59.153.29.101"

--- disk encryption ---¶

export ENCRYPTION_MODE="tang" # or "tpmv2" export TANG_URL="http://26.26.199.110:7500" export TANG_THUMBPRINT="" # populated by tang-deploy.sh

--- paths ---¶

export TOKEN_FILE="/Users/ze/Documents/Brac-POC/redhat-offline-token" export PULL_SECRET="/Users/ze/Documents/Brac-POC/redhat-pull-secret" export SSH_KEYS_FILE="./ssh-keys.txt" # concatenated pubkeys (engineer Mac + hosting platform admin + brac-ops-runner)

--- host plan (referenced by infra-env-create.sh to build NMState + MAC list) ---¶

Format: name:mac:ip:role (prefix-len is /16, gateway 26.26.0.1, DNS 26.26.199.100)¶

export HOSTS=( "bootstrap:52:54:00:00:01:10:26.26.200.10:auto-assign" "master-1:52:54:00:00:01:11:26.26.200.11:master" "master-2:52:54:00:00:01:12:26.26.200.12:master" "master-3:52:54:00:00:01:13:26.26.200.13:master" "worker-1:52:54:00:00:01:21:26.26.200.21:worker" "worker-2:52:54:00:00:01:22:26.26.200.22:worker" "worker-3:52:54:00:00:01:23:26.26.200.23:worker" ) ```

Dependencies (must exist before `deploy-all.sh` runs)¶

brac-ops-runner VM created, SSH reachable via ProxyJump
Tools on runner: curl, jq, kubectl, virtctl, openshift-install (optional fallback), oc
Kubeconfig for the hosting platform copied to brac-ops-runner:~/.kube/config (so the runner can create VM manifests)
PowerDNS zone brac-poc.comptech-lab.com created with records:
api.brac-poc A 59.153.29.101
api-int.brac-poc A 59.153.29.101
*.apps.brac-poc A 59.153.29.101
Per-host A records: ocp-master-1.brac-poc → 26.26.200.11, etc.
Cloudflare NS delegation: brac-poc.comptech-lab.com NS → PowerDNS public IP (or we put records directly in Cloudflare; see DNS architecture memory)
haproxy-brac deployed and health-checking the VMs' IPs (backends will be empty/DOWN until VMs boot — acceptable)
VM network attachments created on the hosting platform: one for the internal br26 bridge (static IPAM) and one for the public br-real bridge (for haproxy-brac's dual-homing)

Key API decisions + gotchas¶

Authentication¶

client_id=cloud-services (matches azp in our offline JWT).
Token endpoint: https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token.
Access token expires ~15 min — cache with expiry file check, refresh before each long phase.

Image type: `full-iso`¶

~1 GB per infra-env; downloaded once to the runner, then uploaded as a DataVolume for all 7 VMs.
Trade-off vs minimal-iso: full ISO boots offline; minimal fetches RHCOS from internet at boot. Full avoids network-at-first-boot failure modes.

`user_managed_networking` vs. VIPs¶

With user_managed_networking: false, Assisted expects the VIPs to live in the same subnet as the nodes. Our nodes are on 26.26.200.0/24 but the VIP is 59.153.29.101 on the external subnet.
Likely needs user_managed_networking: true (pure external LB model — Assisted skips in-cluster VIP management and relies on our HAProxy). Confirm during execution by trying false first; if cluster validation fails on VIP subnet mismatch, flip to true.

Disk encryption (Tang)¶

Tang server's thumbprint is fetched AFTER Tang is deployed (Tang generates keys on first start). tang-deploy.sh waits for readiness, curls http://tang:7500/adv, computes thumbprint via jose or step-cli.
tang_servers in the API is a JSON-string of an array (double-encoded), not an array: "[{\"url\":\"...\",\"thumbprint\":\"...\"}]".
enable_on: all encrypts master + worker root disks.

NMState static config¶

static_network_config is an array, one entry per host; MAC in mac_interface_map must match exactly the NIC MAC the host will have on first boot.
network_yaml is the NMState YAML as a string (with \n escape or multi-line literal in JSON via jq).

Host registration & disk¶

After ISO boots, discovery agent POSTs to Assisted Installer and hosts show up in GET /v2/infra-envs/{id}/hosts.
inventory.disks[*] lists available disks; use the first NVMe/virtio-block (>100GiB) as installation_disk_id.
Set role and hostname via PATCH /v2/infra-envs/{id}/hosts/{host_id} before install.

Install validation gates¶

Cluster transitions insufficient → ready only when all hosts pass validations (network, DNS, disk size, API reachability). The validations_info field lists each failing check — script should print these when cluster is stuck.

Execution order for the execution session¶

Precondition check (first step of deploy-all.sh):
Tang deployed + reachable at $TANG_URL/adv
PowerDNS resolves api.brac-poc.comptech-lab.com and *.apps.brac-poc to 59.153.29.101
HAProxy reachable at 59.153.29.101:6443 (TCP connect succeeds; backends may be DOWN)
Hosting platform healthy; storage provisioner has ≥500 GiB free
NADs br26-static and br-real-brac exist in openshift-brac
Run scripts 01–09 in order; 10 (teardown.sh) only on failure or deliberate reset.
Acceptance: oc --kubeconfig ./creds/kubeconfig get nodes shows 6 nodes Ready (3 masters + 3 workers; bootstrap already gone).

What this plan intentionally does NOT include¶

OCP operators/workloads (Compliance, ACS, OTel, WSO2, etc.) — those belong to Phase 2 issues, deployed after cluster is up.
ODF storage on OCP — separate issue; runs on workers after install.
CI/CD integration (GitLab/Jenkins HA) — separate issue; not dependent on OCP install.

memory/reference_assisted_installer_api.md — REST endpoint details, payload shapes
memory/project_vm_network_plan.md — MAC+IP+VM table
memory/project_dns_lb_architecture.md — DNS + HAProxy roles
memory/feedback_vm_ssh_keys.md — pubkeys to inject into every VM

Next action: open an execution session on branch infra/ocp-assisted-installer, read this plan + referenced memory, implement scripts 01–10. Estimated: 4–6 hours for a working install including the Tang server.