Skip to content

OpenShift Assisted Installer Deployment Plan

Status: Ready to implement (in an execution session, on feature branch infra/ocp-assisted-installer) Target: OCP 4.21.9 via full-ISO + static networking + Tang disk encryption Runs on: dedicated ops VM (brac-ops-runner), invoked from engineer workstation via SSH


Architecture recap

``` brac-ops-runner VM (internal 10.x.x.x) │ ├─ talks to Red Hat Assisted Installer API (api.openshift.com) │ via offline-token → access-token exchange │ ├─ generates install-config values + NMState YAML per host ├─ downloads discovery ISO, uploads to the hosting platform as bootable image │ └─ creates OCP VMs (bootstrap + 3 masters + 3 workers) each with: stable MAC address bridge interface on the internal VM subnet (static IP) Tang-client disk encryption CDROM = discovery ISO

Tang server (Deployment in openshift-brac ns on the hosting platform) → internal LoadBalancer IP → serves keys to OCP nodes at boot for LUKS unlock

haproxy-brac (dedicated TCP LB for the OCP cluster) → fronts api:6443, api-int:6443, 22623, 443, 80 → TCP-LBs to masters/workers

PowerDNS brac-poc.comptech-lab.com zone → api / api-int / *.apps point to haproxy-brac ```


Script inventory

All scripts live in scripts/ocp/ on feature branch. Designed to run on brac-ops-runner (which has curl, jq, kubectl, openshift-install, oc pre-installed plus any hosting-platform CLIs needed to create VMs).

# Script Purpose
01 token.sh Exchange offline token → access token; cache for ~10 min. Sourced by every other script.
02 tang-deploy.sh Deploy Tang server as Deployment + Service with IP from lan-pool-26; extract thumbprint
03 cluster-create.sh POST /v2/clusters — register cluster definition with disk_encryption, VIPs, pull-secret, SSH keys. Writes cluster_id.txt
04 infra-env-create.sh POST /v2/infra-envs — build static_network_config array (one entry per host with NMState + MAC), image_type: full-iso. Writes infra_env_id.txt
05 iso-download.sh GET /v2/infra-envs/{id}/downloads/image → save as discovery.iso on the runner. Upload as a bootable image to the hosting platform.
06 vms-create.sh Apply VM manifests on the hosting platform for bootstrap + 3 masters + 3 workers. Each manifest pins MAC, attaches bridge interface with the static IP for its role, attaches the ISO as CDROM, sizes root disk, sets boot order.
07 hosts-configure.sh Poll GET /v2/infra-envs/{id}/hosts until all 7 hosts registered. Then PATCH each host with hostname, role (master/worker/bootstrap), installation_disk_id.
08 install.sh Wait for cluster status: ready, then POST /v2/clusters/{id}/actions/install. Poll cluster + host progress, print summary every 30s.
09 credentials.sh Download kubeconfig + kubeadmin-password to ./creds/. Print the console URL and login string.
10 teardown.sh Delete VMs, delete DataVolumes, delete Tang deployment, DELETE /v2/clusters/{id} + /v2/infra-envs/{id}. Idempotent.
deploy-all.sh Orchestrator: runs 01 through 09 with timing + pause-on-error. Acceptance test = oc get nodes returns 7 Ready at end.

Inputs (one config file)

scripts/ocp/config.env (gitignored; not committed — kept on runner):

```bash

--- cluster ---

export CLUSTER_NAME="brac-poc" export OCP_VERSION="4.21.9" export BASE_DOMAIN="comptech-lab.com"

--- networking (external VIPs via haproxy-brac) ---

export API_VIP="59.153.29.101" export INGRESS_VIP="59.153.29.101"

--- disk encryption ---

export ENCRYPTION_MODE="tang" # or "tpmv2" export TANG_URL="http://26.26.199.110:7500" export TANG_THUMBPRINT="" # populated by tang-deploy.sh

--- paths ---

export TOKEN_FILE="/Users/ze/Documents/Brac-POC/redhat-offline-token" export PULL_SECRET="/Users/ze/Documents/Brac-POC/redhat-pull-secret" export SSH_KEYS_FILE="./ssh-keys.txt" # concatenated pubkeys (engineer Mac + hosting platform admin + brac-ops-runner)

--- host plan (referenced by infra-env-create.sh to build NMState + MAC list) ---

Format: name:mac:ip:role (prefix-len is /16, gateway 26.26.0.1, DNS 26.26.199.100)

export HOSTS=( "bootstrap:52:54:00:00:01:10:26.26.200.10:auto-assign" "master-1:52:54:00:00:01:11:26.26.200.11:master" "master-2:52:54:00:00:01:12:26.26.200.12:master" "master-3:52:54:00:00:01:13:26.26.200.13:master" "worker-1:52:54:00:00:01:21:26.26.200.21:worker" "worker-2:52:54:00:00:01:22:26.26.200.22:worker" "worker-3:52:54:00:00:01:23:26.26.200.23:worker" ) ```


Dependencies (must exist before deploy-all.sh runs)

  1. brac-ops-runner VM created, SSH reachable via ProxyJump
  2. Tools on runner: curl, jq, kubectl, virtctl, openshift-install (optional fallback), oc
  3. Kubeconfig for the hosting platform copied to brac-ops-runner:~/.kube/config (so the runner can create VM manifests)
  4. PowerDNS zone brac-poc.comptech-lab.com created with records:
  5. api.brac-poc A 59.153.29.101
  6. api-int.brac-poc A 59.153.29.101
  7. *.apps.brac-poc A 59.153.29.101
  8. Per-host A records: ocp-master-1.brac-poc26.26.200.11, etc.
  9. Cloudflare NS delegation: brac-poc.comptech-lab.com NS → PowerDNS public IP (or we put records directly in Cloudflare; see DNS architecture memory)
  10. haproxy-brac deployed and health-checking the VMs' IPs (backends will be empty/DOWN until VMs boot — acceptable)
  11. VM network attachments created on the hosting platform: one for the internal br26 bridge (static IPAM) and one for the public br-real bridge (for haproxy-brac's dual-homing)

Key API decisions + gotchas

Authentication

  • client_id=cloud-services (matches azp in our offline JWT).
  • Token endpoint: https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token.
  • Access token expires ~15 min — cache with expiry file check, refresh before each long phase.

Image type: full-iso

  • ~1 GB per infra-env; downloaded once to the runner, then uploaded as a DataVolume for all 7 VMs.
  • Trade-off vs minimal-iso: full ISO boots offline; minimal fetches RHCOS from internet at boot. Full avoids network-at-first-boot failure modes.

user_managed_networking vs. VIPs

  • With user_managed_networking: false, Assisted expects the VIPs to live in the same subnet as the nodes. Our nodes are on 26.26.200.0/24 but the VIP is 59.153.29.101 on the external subnet.
  • Likely needs user_managed_networking: true (pure external LB model — Assisted skips in-cluster VIP management and relies on our HAProxy). Confirm during execution by trying false first; if cluster validation fails on VIP subnet mismatch, flip to true.

Disk encryption (Tang)

  • Tang server's thumbprint is fetched AFTER Tang is deployed (Tang generates keys on first start). tang-deploy.sh waits for readiness, curls http://tang:7500/adv, computes thumbprint via jose or step-cli.
  • tang_servers in the API is a JSON-string of an array (double-encoded), not an array: "[{\"url\":\"...\",\"thumbprint\":\"...\"}]".
  • enable_on: all encrypts master + worker root disks.

NMState static config

  • static_network_config is an array, one entry per host; MAC in mac_interface_map must match exactly the NIC MAC the host will have on first boot.
  • network_yaml is the NMState YAML as a string (with \n escape or multi-line literal in JSON via jq).

Host registration & disk

  • After ISO boots, discovery agent POSTs to Assisted Installer and hosts show up in GET /v2/infra-envs/{id}/hosts.
  • inventory.disks[*] lists available disks; use the first NVMe/virtio-block (>100GiB) as installation_disk_id.
  • Set role and hostname via PATCH /v2/infra-envs/{id}/hosts/{host_id} before install.

Install validation gates

  • Cluster transitions insufficientready only when all hosts pass validations (network, DNS, disk size, API reachability). The validations_info field lists each failing check — script should print these when cluster is stuck.

Execution order for the execution session

  1. Precondition check (first step of deploy-all.sh):
  2. Tang deployed + reachable at $TANG_URL/adv
  3. PowerDNS resolves api.brac-poc.comptech-lab.com and *.apps.brac-poc to 59.153.29.101
  4. HAProxy reachable at 59.153.29.101:6443 (TCP connect succeeds; backends may be DOWN)
  5. Hosting platform healthy; storage provisioner has ≥500 GiB free
  6. NADs br26-static and br-real-brac exist in openshift-brac
  7. Run scripts 01–09 in order; 10 (teardown.sh) only on failure or deliberate reset.
  8. Acceptance: oc --kubeconfig ./creds/kubeconfig get nodes shows 6 nodes Ready (3 masters + 3 workers; bootstrap already gone).

What this plan intentionally does NOT include

  • OCP operators/workloads (Compliance, ACS, OTel, WSO2, etc.) — those belong to Phase 2 issues, deployed after cluster is up.
  • ODF storage on OCP — separate issue; runs on workers after install.
  • CI/CD integration (GitLab/Jenkins HA) — separate issue; not dependent on OCP install.
  • memory/reference_assisted_installer_api.md — REST endpoint details, payload shapes
  • memory/project_vm_network_plan.md — MAC+IP+VM table
  • memory/project_dns_lb_architecture.md — DNS + HAProxy roles
  • memory/feedback_vm_ssh_keys.md — pubkeys to inject into every VM

Next action: open an execution session on branch infra/ocp-assisted-installer, read this plan + referenced memory, implement scripts 01–10. Estimated: 4–6 hours for a working install including the Tang server.