Decision Log¶

Records all architectural decisions, scope choices, and trade-offs made during the POC. Audit trail for BRAC Bank.

Why This Matters¶

BRAC Bank will ask: - "Why OpenTelemetry instead of Datadog?" - "Why Kafka instead of direct push?" - "Why did you skip JBoss?" - "Why this infrastructure topology?"

Answer: "It's in the Decision Log. Here's the reasoning, alternatives considered, and who approved it."

Decision Log Template¶

Below is the long-form template, shown as a worked example (Decision #001 — OpenTelemetry vs proprietary APM). For quick decisions, use the condensed table format shown in the actual decision entries further down.

Long-form template (worked example)

Decision #001 — OpenTelemetry vs Proprietary APM

Field	Value
Date	2026-04-24
Category	Observability architecture
Owner	Platform Lead
Approver	Project Lead
Status	✅ Approved

Context

Need comprehensive observability (logs, metrics, traces)
BRAC has no vendor preference stated
6-day aggressive timeline, must avoid vendor lock-in

Decision: use OpenTelemetry (OTel) with SigNoz backend.

Alternatives considered

#	Option	Pros	Cons	Why rejected
1	Datadog APM	Enterprise-grade, fully managed	Vendor lock-in, expensive licensing	Contradicts BRAC neutrality preference
2	Elastic Stack (ELK)	Open source, good for logs	Weak tracing, fragmented	Inferior unified experience
3	Prometheus + Loki + Jaeger	All open source, CNCF projects	Three separate systems, complex integration	Worse UX than SigNoz

Chosen-option rationale

✅ Vendor-neutral (CNCF-backed)
✅ Single platform for logs, metrics, traces (unified UX)
✅ Scalable with Kafka buffering
✅ Cost-effective (self-managed)
✅ Future flexibility (swap backends if needed)

Trade-offs

Cost: operational overhead — we manage OTel, SigNoz, ClickHouse ourselves
Cost: learning curve — team must learn OTel SDK
Gain: flexibility — not locked to any vendor

Impact

On Phase 2 critical path (5h for full stack)
Affects sample-app instrumentation
Influences testing strategy (load test required)

Approved by: Project Lead ✅ · Security Lead ✅ · BRAC Bank contact (pre-approval) ✅

Notes

If SigNoz too slow, fallback to Grafana Loki (logs only)
Cold archiving to object storage for cost
Retention policy: 2-day hot, auto-archive cold

Pre-POC Strategic Decisions¶

DECISION #001: OpenTelemetry vs Proprietary APM¶

Status: ✅ APPROVED | Owner: Platform Lead | Date: 2026-04-24

Aspect	Details
Chosen	OpenTelemetry + SigNoz
Why	Vendor neutrality, unified UX, cost-effective
Alternative	Datadog (rejected: vendor lock-in)
Trade-off	Operational overhead vs flexibility
Approval	Project Lead, Security Lead

DECISION #002: Terraform for Infrastructure¶

Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24

Aspect	Details
Chosen	Terraform + public modules
Why	Automation, repeatability, auditability
Alternative	Manual provisioning (rejected: timeline)
Trade-off	Module updates may break vs scalability
Approval	Project Lead

DECISION #003: Single 3-Node Cluster vs HA K8s Master¶

Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24

Aspect	Details
Chosen	Single etcd, single master (cost/time efficient)
Why	6-day POC, not production. Single point works.
Alternative	HA K8s master (adds 1-2 days)
Trade-off	No K8s HA, but single master is standard for POC
Limitation	If master fails, cluster down. Acceptable risk.
Approval	Infrastructure Lead, Project Lead

DECISION #004: Kafka KRaft vs ZooKeeper¶

Status: ✅ APPROVED | Owner: Platform Lead | Date: 2026-04-24

Aspect	Details
Chosen	Kafka KRaft (no ZooKeeper)
Why	Simpler, modern, fewer dependencies
Alternative	ZooKeeper (rejected: extra complexity)
Trade-off	KRaft newer but stable (GA in 3.3+)
Approval	Platform Lead

DECISION #005: Skip Load Testing (Out of Scope)¶

Status: ✅ APPROVED | Owner: Project Lead | Date: 2026-04-24

Aspect	Details
Chosen	No load testing / performance benchmarks
Why	6-day POC, load testing adds 2+ days
Alternative	Include load testing (rejected: timeline)
Trade-off	Don't know performance limits
Noted for Phase 2	Load testing should be Phase 2 item
Approval	Project Lead, BRAC Bank (implied)

DECISION #006: GitLab HA vs Jenkins Only¶

Status: ✅ APPROVED | Owner: DevOps Lead | Date: 2026-04-24

Aspect	Details
Chosen	Both GitLab HA and Jenkins HA
Why	BRAC has both, showcase both capabilities
Alternative	GitLab only (rejected: incomplete demo)
Trade-off	Takes longer (2x CI/CD work)
Approval	Project Lead

DECISION #007: Scope Lock Date¶

Status: ✅ APPROVED | Owner: Project Lead | Date: 2026-04-24

Aspect	Details
Chosen	Scope lock: NOW (2026-04-24)
Why	6-day aggressive timeline, no buffer for scope creep
Rule	New features → Phase 2 / Future Work
Process	All changes logged, estimated impact, BRAC approval
Escalation	Project Lead decides if change proceeds
Approval	Project Lead

DECISION #008: Demo Format (Live vs Recorded)¶

Status: ⏳ TBD | Owner: Project Lead | Date: TBD

Aspect	Details
Options	A) Live demo
Preferred	TBD (awaiting BRAC feedback)
Live Pros	Interactive, can answer questions, credible
Live Cons	Risk if component breaks
Recorded Pros	Safe, rehearsed, high quality
Recorded Cons	Less interactive
Decision Timeline	Day 5, 4pm (1 day before demo)
Approval	BRAC Bank + Project Lead

DECISION #009: JBoss Domain Mode vs Standalone¶

Status: ✅ APPROVED | Owner: Middleware Lead | Date: 2026-04-24

Aspect	Details
Chosen	Domain mode (as per BRAC requirements)
Why	BRAC specifically requested domain mode
Alternative	Standalone (simpler, if domain mode too hard)
Contingency	If domain mode stalls Day 6, switch to standalone
Fallback	Skip JBoss entirely if time insufficient
Approval	Project Lead

DECISION #010: ODF Storage vs External Storage¶

Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24

Aspect	Details
Chosen	ODF (OpenShift Data Foundation) - local
Why	Included with OpenShift, no external dependencies
Alternative	AWS EBS / GCP Persistent Disks (adds complexity)
Trade-off	Local storage lost if cluster destroyed
Acceptable because	POC is short-lived, cluster destroyed after
Approval	Infrastructure Lead

Phase 1 Decisions¶

DECISION #011: Disk Encryption — Tang (not TPMv2)¶

Status: ✅ APPROVED | Owner: Project Lead | Date: 2026-04-24

Aspect	Details
Chosen	Tang (network-bound disk encryption)
Server	Single Tang Deployment in `openshift-brac` ns, image `quay.io/latchset/tang:latest`, LB IP `26.26.199.110` from `lan-pool-26`, port 7500
Scope	`enable_on: all` (bootstrap + 3 masters + 3 workers)
Alternative	TPMv2 (rejected: per-VM vTPM/EFI/SecureBoot setup; harder to demo)
Trade-off	Tang must be up before OCP boot (acceptable — it runs on the already-up hosting platform); no key-persistence across Tang pod restarts, but POC is throwaway
Demo value	Network-bound key escrow story maps well to bank compliance requirements
Approval	Project Lead

DECISION #012: VM Networking — flat L2 bridge (not overlay/pod-network)¶

Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24

Aspect	Details
Chosen	OCP VMs on a dedicated internal L2 bridge with static IP addressing (classic flat-L2 topology that OCP is designed for)
Alternative	Overlay / pod-network attachment for VMs (rejected: IP instability with masquerade binding + nested VXLAN MTU complexity)
Trade-off	Loses some of the hosting platform's pod-level observability for these VMs; wins OCP-standard flat L2 networking with stable IPs
Approval	Project Lead

DECISION #013: Dedicated `haproxy-brac` (not reusing the hosting platform's edge LB)¶

Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24

Aspect	Details
Chosen	New `haproxy-brac` Deployment in `openshift-brac` ns, TCP mode, dual-homed br26 (`26.26.200.50`) + br-real (`59.153.29.101`)
Alternative	Extend the hosting platform's existing edge LB (rejected: it's HTTP mode only; can't TCP-passthrough OCP's 6443/22623 mTLS ports)
Trade-off	Duplicates pattern but cleanly scoped + deletable at POC end
Approval	Project Lead

DECISION #014: Assisted Installer — full-ISO + static networking¶

Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24

Aspect	Details
Chosen	Red Hat Assisted Installer REST API, `image_type: full-iso`, NMState per-host static networking, MACs pinned in VM specs
Alternative	Minimal-ISO (rejected: requires RHCOS fetch at first boot — extra failure mode), UPI agent-install (rejected: more infra to stand up than Assisted)
Trade-off	~1 GB ISO per infra-env vs minimal's ~100 MB; full-ISO boots offline-friendly
Approval	Project Lead

DECISION #015: FIPS 140-3 mode enabled at install¶

Status: ✅ APPROVED | Owner: Project Lead | Date: 2026-04-24

Aspect	Details
Chosen	`fips: true` in `POST /v2/clusters` payload
Rationale	PCI-DSS 3.5/3.6/4.2.1 mandates validated crypto; FIPS cannot be enabled post-install
Blocker risk	Third-party images using non-FIPS crypto will fail — verify each POC component's FIPS compatibility before adoption
Approval	Project Lead

DECISION #016: API Server audit profile = `WriteRequestBodies`¶

Status: ✅ APPROVED | Owner: Security Lead | Date: 2026-04-24

Aspect	Details
Chosen	`APIServer.spec.audit.profile: WriteRequestBodies` (applied at install via custom manifest)
Rationale	PCI-DSS 10.2/10.3 — capture who-wrote-what for audit trail. `WriteRequestBodies` is the sweet spot: mutations logged with bodies, reads metadata-only (avoids log explosion)
Alternative	`AllRequestBodies` (rejected for POC: excessive log volume; useful in high-compliance production only)
Approval	Security Lead

DECISION #017: Identity Provider = Keycloak¶

Status: ✅ APPROVED | Owner: Security Lead | Date: 2026-04-24

Aspect	Details
Chosen	Keycloak as OIDC provider; OCP OAuth CR configured to trust it
Deployment target	On the hosting platform (not inside OCP — Keycloak must survive OCP rebuilds and could front multiple clusters)
Namespace	`keycloak-brac` on the hosting platform
Database	Dedicated PostgreSQL cluster `keycloak-pg` (mirrors the PowerDNS pattern)
Public URL	`https://keycloak.brac-poc.comptech-lab.com` — cert-manager via Cloudflare DNS-01, exposed via the hosting platform's ingress controller
Realm	`brac-poc` with client `openshift-brac`; admin user with MFA
Alternative	Google OIDC (rejected: external dependency; less BRAC-demo-friendly), htpasswd (rejected: no MFA, not PCI-8.3 compliant), LDAP (rejected: needs LDAP server beyond Keycloak)
Trade-off	Adds a deployment task to prereqs; gains: standalone IdP, can broker to BRAC's AD/LDAP later, demo-friendly UI
Approval	Security Lead

DECISION #018: Allowed container registries (strict list)¶

Status: ✅ APPROVED | Owner: Security Lead | Date: 2026-04-24

Aspect	Details
Chosen	`Image.spec.registrySources.allowedRegistries: [quay.io, registry.redhat.io, registry.connect.redhat.com, ghcr.io, <nexus-internal>]`
Rationale	PCI-DSS 6.3.2 — only approved image sources. Each registry serves a purpose: Quay (OCP/operators), Red Hat registries (official + partner), GHCR (user-built custom), Nexus (internal mirror/cache)
Explicitly blocked	`docker.io` (no image provenance verification), `ghcr.io/<unknown-orgs>` (Nexus is preferred mirror for external images)
Approval	Security Lead

DECISION #026: VM tier for all non-OCP tools with DC/DR pattern¶

Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24

Aspect	Details
Chosen	All tools outside OpenShift deployed as VMs in a DC/DR active-passive pattern. Each tool's VMs named `<app>[-<profile>]-vm<N>-<dc\\|dr>`
VMs in scope	Vault, MinIO, Redis, GitLab CE, Jenkins, Nexus, WSO2 APIM (distributed), WSO2 IS, Temporal, n8n, Splunk Free, SigNoz, ClickHouse, AWX, Terrakube, plus per-app PostgreSQL and the ops-runner
Default username	`ze` (not `ubuntu`) — SSH pubkeys injected via cloud-init (Mac + dl385-2 + ops-runner)
OS	Ubuntu 24.04 Server (Noble) cloud image — same for all VMs
Inventory	`docs/VM-TIER-ARCHITECTURE.md` — 68 VMs + 8 OCP nodes = 76 total
Approval	Infrastructure Lead, Project Lead

DECISION #027: WSO2 APIM uses distributed (enterprise) topology, not all-in-one¶

Status: ✅ APPROVED | Owner: Integration Lead | Date: 2026-04-24

Aspect	Details
Chosen	Distributed deployment with 5 profiles: Publisher, DevPortal, Gateway, Traffic Manager, Key Manager — one VM per profile per site → 10 VMs + shared PG per site
Rationale	Mirrors real bank production pattern; lets BRAC evaluate each tier independently; demonstrates HA principles per component
Shared DB	Single PG per site (DC + DR) shared across profiles — streaming replication DC → DR
Version	WSO2 APIM 4.4.0, WSO2 IS 7.1.0
Approval	Integration Lead, Project Lead

DECISION #028: MinIO as universal object storage target¶

Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24

Aspect	Details
Chosen	All backup + object-storage needs use the MinIO cluster — Splunk indexed data, Vault Raft snapshots, Nexus blob store, ACM cluster-backup, GitLab backup, Jenkins/Nexus/ClickHouse archives, SigNoz cold retention
Topology	3-node distributed MinIO per site (DC + DR) with native site-replication
Version	Latest RELEASE.2025.xx.xx LTS
Trade-off	Single tool for all storage needs — simple. Production alternative: separate tiers for hot/cold/archive; out of scope for POC.
Approval	Infrastructure Lead, Project Lead

DECISION #029: OpenTofu (not Terraform), AWX + Terrakube for IaC GUIs¶

Status: ✅ APPROVED | Owner: DevOps Lead | Date: 2026-04-24

Aspect	Details
Chosen	OpenTofu 1.8.x (MPL 2.0 OSS fork of Terraform) for infrastructure provisioning; Ansible 11.x for configuration
GUI	AWX (Red Hat's upstream of AAP) for Ansible; Terrakube (OSS TFE-alternative) for OpenTofu — both self-hosted on VMs
Two Git repos	`brac-poc-infrastructure` (OpenTofu modules) + `brac-poc-ansible` (playbooks) on GitLab CE
Workflow	MR → CI lint → merge → Terrakube/AWX webhook → runs on VMs; no direct SSH-root-to-configure drift
Bootstrap	ops-runner has CLI OpenTofu + Ansible; used to create AWX/Terrakube VMs; after that, UIs take over
Rejected alternatives	HashiCorp Terraform (BSL license concerns for a bank); Rundeck (less specialized per tool)
Approval	DevOps Lead, Project Lead

DECISION #030: DC/DR strategy — per-component documented RPO/RTO + quarterly drills¶

Status: ✅ APPROVED | Owner: Infrastructure + Platform Leads | Date: 2026-04-24

Aspect	Details
Chosen	Active-passive DR for all components; RPO/RTO targets documented per tier in `docs/DC-DR-STRATEGY.md`
Replication mechanisms	Vault OSS Raft snapshot → MinIO every 15 min · MinIO native site-replication · PostgreSQL streaming replication · Redis Sentinel + cross-site replica · ACM cluster-backup-chart (15-min schedule) · ArgoCD AppSet targeting both spokes
Convergence point	MinIO cluster-pair acts as the universal backup target for every component
Drills	7 drills catalogued in `docs/DR-DRILL-PLAYBOOK.md` (MinIO, Vault, PG, Redis, hub, spoke, full-site integrated)
Demo posture	Pre-record the full-site drill on Day 4-5; show recording on demo day (full live drill too risky on Day 6)
Approval	Infrastructure Lead, Platform Lead, Project Lead

DECISION #031: Observability — dual stack (Red Hat native + SigNoz + Splunk Free)¶

Status: ✅ APPROVED | Owner: Platform Lead | Date: 2026-04-24

Aspect	Details
Red Hat native (on OCP)	Red Hat build of OpenTelemetry 0.144.0-2 (instrumentation) + Tempo 0.20.0-3 (traces) + Loki 6.5.0 (logs) + Cluster Observability Operator 1.4.0 (cross-cluster metrics on hubs) + Red Hat OpenShift Logging 6.5.0
SigNoz + ClickHouse (on VMs)	Original POC requirement retained; SigNoz UI + ClickHouse backend run on VMs with `signoz-vm1-dc/dr` + `clickhouse-vm1-dc/dr`; demonstrates community APM stack alongside Red Hat supported
Splunk Free (on VMs)	PCI-DSS + client requirement for log forwarding; 500 MB/day cap; 2-day retention; OCP Logging + OTel exports forward to Splunk HEC
Rationale	Shows BRAC both the Red Hat-supported path (vendor-backed) and the popular community tooling; Splunk satisfies compliance; SigNoz demos flexibility
Approval	Platform Lead, Security Lead

DECISION #021: Domain strategy — per-cluster subdomain on `opp.brac-poc.comptech-lab.com`¶

Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24

Aspect	Details
Root	`brac-poc.comptech-lab.com` (project umbrella)
OpenShift umbrella	`opp.brac-poc.comptech-lab.com`
Per-cluster API	`api.<cluster-name>.opp.brac-poc.comptech-lab.com` — e.g. `api.spoke-dc.opp.brac-poc.comptech-lab.com`
Per-cluster routes	`.routes.<cluster-name>.opp.brac-poc.comptech-lab.com` — per OCP-spec, replaces the default `.apps.` wildcard
Platform tools (outside OCP)	`*.apps.brac-poc.comptech-lab.com` — docs site, monitoring consoles, tooling
Rationale	Per-cluster prefix keeps DR failover explicit (operator decides when to flip) and avoids ambiguity in logs/certs
TLS	Wildcard Let's Encrypt via cert-manager + Cloudflare DNS-01 solver, auto-issued per cluster
Approval	Infrastructure Lead, Project Lead

DECISION #022: Four-cluster topology (2 hubs + 2 spokes, DC/DR)¶

Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24

Aspect	Details
Chosen	`hub-dc` + `hub-dr` (SNO management clusters) + `spoke-dc` + `spoke-dr` (compact 3-node workload clusters)
Pattern	DC/DR active-passive; DR clusters are hot standbys synced via ACM cluster-backup + ArgoCD ApplicationSets
Hub specs	1 node × 64c/64GB/1TB each (SNO)
Spoke specs	3 nodes × 64c/64GB/1TB each → 192c/192GB/3TB per cluster (compact 3-node)
Total	512 vCPU + 512 GB + 8 TB across 4 clusters
Alternative (rejected)	Single multi-node cluster — loses DR story, doesn't exercise RHACM hub-spoke patterns BRAC wants to evaluate
Alternative (rejected)	All 4 SNO — too small for workload realism on spokes
Approval	Infrastructure Lead, Project Lead

DECISION #023: Management fabric = RHACM (hub) → managed clusters (spokes + DR hub)¶

Status: ✅ APPROVED | Owner: Platform Lead | Date: 2026-04-24

Aspect	Details
Chosen	Red Hat Advanced Cluster Management for Kubernetes, release-2.13 on hub-dc (primary) and hub-dr (DR hot-standby)
Hub hosts	RHACM, ACS Central, OpenShift GitOps, Compliance Operator, Cluster Observability Operator, RHBK Keycloak
Spokes run	All POC workloads; they are ACM-managed clusters
Failover mode	Hub-dr is itself registered as managed-cluster of hub-dc; on hub-dc loss, hub-dr's MultiClusterHub takes over via ACM cluster-backup
Approval	Platform Lead, Project Lead

DECISION #024: Red Hat operators only (no community operators, ever)¶

Status: ✅ APPROVED | Owner: Security Lead | Date: 2026-04-24

Aspect	Details
Chosen	All operators sourced from `redhat-operators` CatalogSource (primary) or `certified-operators` (partner-certified, acceptable). Community + `redhat-marketplace` catalogs disabled cluster-wide via `OperatorHub` CR
Verification	Before installing any operator: confirm channel + CSV version via `oc get packagemanifest <name>`; pin the exact CSV in the Subscription manifest for reproducibility
Implications	Uses RHBK (Red Hat Build of Keycloak) not community Keycloak; Red Hat AMQ Streams not community Strimzi; OpenShift GitOps not community Argo CD
Detailed matrix	See `docs/OPERATOR-CATALOG.md` for the full operator inventory with channels
Rationale	Support contract + supply-chain trust for a bank audit; no operator source that isn't directly Red Hat-maintained or Red Hat-certified
Approval	Security Lead, Project Lead

DECISION #025: GitOps-only operational model¶

Status: ✅ APPROVED | Owner: DevOps Lead | Date: 2026-04-24

Aspect	Details
Chosen	All cluster configuration + workload deployment flows through Git → OpenShift GitOps (ArgoCD) → clusters
Pattern	App-of-Apps + ApplicationSets + pull-mode via RHACM `ManifestWork` propagation (per Red Hat RHACM 2.13 best practices, verified 2026-04-24)
Repo	`openshift-platform-gitops` on the BRAC POC self-hosted GitLab (GitLab HA deployed on spoke-dc, Issue #4). Mirrored to `comptech-lab/openshift-platform-gitops` on GitHub for review
Compliance	ACM `Policy` framework for governance (compliance baseline, allowed-registries, PSA, audit profile); ArgoCD for CD — both used together per Red Hat guidance
`oc` usage	Reserved for: (a) initial GitOps operator Subscription on each hub · (b) root `Application` apply that points at the Git repo · (c) break-glass emergencies. All other changes must go through a Git MR
Alternative (rejected)	Hybrid of `oc apply` and ArgoCD — rejected: breaks audit trail and creates drift
Detailed design	See `docs/GITOPS-ARCHITECTURE.md`
Approval	DevOps Lead, Project Lead

DECISION #020: Docs site — MkDocs Material on Cloudflare Pages¶

Status: ✅ APPROVED | Owner: Project Lead | Date: 2026-04-24

Aspect	Details
Chosen	MkDocs Material built from the `docs/` folder, hosted on Cloudflare Pages, custom domain `docs.brac-poc.comptech-lab.com`
Alternative — Log4Brains	Rejected: too narrow (ADRs only), covers ~5% of what we need to share
Alternative — self-host on hosting platform	Rejected for POC: requires ArgoCD or webhook plumbing for auto-deploy; adds 2+ hrs setup; Cloudflare Pages auto-deploys on `git push` for free
Alternative — GitHub Pages	Rejected: requires GitHub Pro/Team plan for private repos
Build	`pip install -r requirements.txt && mkdocs build` — Python 3.12, ~0.8s build, ~4.5 MB output (24 pages)
Auto-update	Every push to `main` → webhook → Cloudflare build → live in ~30s
Cost	$0 (Cloudflare Pages free tier: 500 builds/mo; we'll use ~30-50)
Access	Public URL; can be Cloudflare-Access-protected later if BRAC prefers gated
Approval	Project Lead
Status: ✅ APPROVED	Owner: Security Lead

Aspect	Details
Chosen	Two `ScanSettingBinding` CRs running: `pci-dss-4` profile + `ocp4-cis` profile, both on a weekly schedule (initial) + on-demand ad-hoc
Rationale	PCI-DSS-4 covers bank-specific controls; OCP4-CIS covers general platform hardening. ~60-70% overlap, but each catches what the other misses. Running both = broadest evidence for BRAC
Artifact	Compliance reports (HTML+XML) exported to MinIO for BRAC demo. Remediation CRs auto-generated for FAILs.
Approval	Security Lead

Phase 2 Decisions¶

(Updated as made)

Phase 3 Decisions¶

(Updated as made)

Scope Changes Log¶

Any changes to original scope are logged here

Scope-change template

Scope Change #NNN — short title

Field	Value
Date	when requested
Requested by	BRAC or internal
Request	what they asked for

Impact analysis

Effort: X hours
Timeline: slips to date · OR delays component
Risk: what could go wrong

Options

A) Add it now → timeline slips X hours
B) Add it, cut [component] → timeline stays the same
C) Defer to Phase 2 → timeline stays the same

Decision: Option [A / B / C] Approved by: Project Lead · BRAC Bank (if applicable) Reason: why this option was chosen

Trade-Off Registry¶

Every major decision has trade-offs. Document them here.

Decision	Benefit	Cost	Acceptable?
OTel instead of Datadog	Vendor neutrality	Operational overhead	✅ YES
Terraform for IaC	Automation, repeatability	Learning curve, updates	✅ YES
Single etcd master	Simpler setup	No K8s HA (acceptable for POC)	✅ YES
KRaft instead of ZooKeeper	Simpler	Newer technology	✅ YES
No load testing	Saves 2 days	Unknown performance	✅ YES (Phase 2)
Both GitLab + Jenkins	Showcases both	More work	✅ YES
6-day timeline	Aggressive, demo ready	High risk, no buffer	✅ YES (risk mitigated)
Domain mode JBoss	BRAC requirement	Complex learning curve	✅ YES (fallback available)

Questions BRAC Might Ask (And Answers)¶

Q: Why not use [competing technology]?¶

Answer: Check Decision Log #[X]. We evaluated it, here's why we chose the other option.

Q: Why are you doing it this way?¶

Answer: Check ADR [X] in docs/adr/. Explains context, alternatives, and rationale.

Q: Why did you skip [component]?¶

Answer: Check Scope Changes Log or Decision #[X]. Here's the impact analysis and why.

Q: What if [component] fails?¶

Answer: Check Risk Register and Assumptions & Constraints. Here's our mitigation strategy and fallback plan.

Q: Why these design choices?¶

Answer: Check Architecture.md. Every component is justified. ADRs explain the why.

How to Use This Log¶

During POC: - Record every significant decision - Before approving, check: What are trade-offs? Is it documented? - When BRAC asks questions, reference this log

At POC End: - Review all decisions - Document lessons learned - Input to Phase 2 planning

For BRAC Bank: - Share this with final report - Shows thoughtful decision-making - Demonstrates risk awareness

Decision Log Created: 2026-04-24
Status: Active (decisions being added)
Owner: Project Lead
Updated: As decisions are made

Decision Log¶

Why This Matters¶

Decision Log Template¶

Pre-POC Strategic Decisions¶

DECISION #001: OpenTelemetry vs Proprietary APM¶

DECISION #002: Terraform for Infrastructure¶

DECISION #003: Single 3-Node Cluster vs HA K8s Master¶

DECISION #004: Kafka KRaft vs ZooKeeper¶

DECISION #005: Skip Load Testing (Out of Scope)¶

DECISION #006: GitLab HA vs Jenkins Only¶

DECISION #007: Scope Lock Date¶

DECISION #008: Demo Format (Live vs Recorded)¶

DECISION #009: JBoss Domain Mode vs Standalone¶

DECISION #010: ODF Storage vs External Storage¶

Phase 1 Decisions¶

DECISION #011: Disk Encryption — Tang (not TPMv2)¶

DECISION #012: VM Networking — flat L2 bridge (not overlay/pod-network)¶

DECISION #013: Dedicated haproxy-brac (not reusing the hosting platform's edge LB)¶

DECISION #014: Assisted Installer — full-ISO + static networking¶

DECISION #015: FIPS 140-3 mode enabled at install¶

DECISION #016: API Server audit profile = WriteRequestBodies¶

DECISION #017: Identity Provider = Keycloak¶

DECISION #018: Allowed container registries (strict list)¶

DECISION #026: VM tier for all non-OCP tools with DC/DR pattern¶

DECISION #027: WSO2 APIM uses distributed (enterprise) topology, not all-in-one¶

DECISION #028: MinIO as universal object storage target¶

DECISION #029: OpenTofu (not Terraform), AWX + Terrakube for IaC GUIs¶

DECISION #030: DC/DR strategy — per-component documented RPO/RTO + quarterly drills¶

DECISION #031: Observability — dual stack (Red Hat native + SigNoz + Splunk Free)¶

DECISION #021: Domain strategy — per-cluster subdomain on opp.brac-poc.comptech-lab.com¶

DECISION #022: Four-cluster topology (2 hubs + 2 spokes, DC/DR)¶

DECISION #023: Management fabric = RHACM (hub) → managed clusters (spokes + DR hub)¶

DECISION #024: Red Hat operators only (no community operators, ever)¶

DECISION #025: GitOps-only operational model¶

DECISION #020: Docs site — MkDocs Material on Cloudflare Pages¶

Phase 2 Decisions¶

Phase 3 Decisions¶

Scope Changes Log¶

Trade-Off Registry¶

Questions BRAC Might Ask (And Answers)¶

Q: Why not use [competing technology]?¶

Q: Why are you doing it this way?¶

Q: Why did you skip [component]?¶

Q: What if [component] fails?¶

Q: Why these design choices?¶

How to Use This Log¶

DECISION #013: Dedicated `haproxy-brac` (not reusing the hosting platform's edge LB)¶

DECISION #016: API Server audit profile = `WriteRequestBodies`¶

DECISION #021: Domain strategy — per-cluster subdomain on `opp.brac-poc.comptech-lab.com`¶