Decision Log¶
Records all architectural decisions, scope choices, and trade-offs made during the POC. Audit trail for BRAC Bank.
Why This Matters¶
BRAC Bank will ask: - "Why OpenTelemetry instead of Datadog?" - "Why Kafka instead of direct push?" - "Why did you skip JBoss?" - "Why this infrastructure topology?"
Answer: "It's in the Decision Log. Here's the reasoning, alternatives considered, and who approved it."
Decision Log Template¶
Below is the long-form template, shown as a worked example (Decision #001 — OpenTelemetry vs proprietary APM). For quick decisions, use the condensed table format shown in the actual decision entries further down.
Long-form template (worked example)
Decision #001 — OpenTelemetry vs Proprietary APM
| Field | Value |
|---|---|
| Date | 2026-04-24 |
| Category | Observability architecture |
| Owner | Platform Lead |
| Approver | Project Lead |
| Status | ✅ Approved |
Context
- Need comprehensive observability (logs, metrics, traces)
- BRAC has no vendor preference stated
- 6-day aggressive timeline, must avoid vendor lock-in
Decision: use OpenTelemetry (OTel) with SigNoz backend.
Alternatives considered
| # | Option | Pros | Cons | Why rejected |
|---|---|---|---|---|
| 1 | Datadog APM | Enterprise-grade, fully managed | Vendor lock-in, expensive licensing | Contradicts BRAC neutrality preference |
| 2 | Elastic Stack (ELK) | Open source, good for logs | Weak tracing, fragmented | Inferior unified experience |
| 3 | Prometheus + Loki + Jaeger | All open source, CNCF projects | Three separate systems, complex integration | Worse UX than SigNoz |
Chosen-option rationale
- ✅ Vendor-neutral (CNCF-backed)
- ✅ Single platform for logs, metrics, traces (unified UX)
- ✅ Scalable with Kafka buffering
- ✅ Cost-effective (self-managed)
- ✅ Future flexibility (swap backends if needed)
Trade-offs
- Cost: operational overhead — we manage OTel, SigNoz, ClickHouse ourselves
- Cost: learning curve — team must learn OTel SDK
- Gain: flexibility — not locked to any vendor
Impact
- On Phase 2 critical path (5h for full stack)
- Affects sample-app instrumentation
- Influences testing strategy (load test required)
Related ADR: ADR 0001 — OpenTelemetry vs proprietary APM
Approved by: Project Lead ✅ · Security Lead ✅ · BRAC Bank contact (pre-approval) ✅
Notes
- If SigNoz too slow, fallback to Grafana Loki (logs only)
- Cold archiving to object storage for cost
- Retention policy: 2-day hot, auto-archive cold
Pre-POC Strategic Decisions¶
DECISION #001: OpenTelemetry vs Proprietary APM¶
Status: ✅ APPROVED | Owner: Platform Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | OpenTelemetry + SigNoz |
| Why | Vendor neutrality, unified UX, cost-effective |
| Alternative | Datadog (rejected: vendor lock-in) |
| Trade-off | Operational overhead vs flexibility |
| Approval | Project Lead, Security Lead |
DECISION #002: Terraform for Infrastructure¶
Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | Terraform + public modules |
| Why | Automation, repeatability, auditability |
| Alternative | Manual provisioning (rejected: timeline) |
| Trade-off | Module updates may break vs scalability |
| Approval | Project Lead |
DECISION #003: Single 3-Node Cluster vs HA K8s Master¶
Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | Single etcd, single master (cost/time efficient) |
| Why | 6-day POC, not production. Single point works. |
| Alternative | HA K8s master (adds 1-2 days) |
| Trade-off | No K8s HA, but single master is standard for POC |
| Limitation | If master fails, cluster down. Acceptable risk. |
| Approval | Infrastructure Lead, Project Lead |
DECISION #004: Kafka KRaft vs ZooKeeper¶
Status: ✅ APPROVED | Owner: Platform Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | Kafka KRaft (no ZooKeeper) |
| Why | Simpler, modern, fewer dependencies |
| Alternative | ZooKeeper (rejected: extra complexity) |
| Trade-off | KRaft newer but stable (GA in 3.3+) |
| Approval | Platform Lead |
DECISION #005: Skip Load Testing (Out of Scope)¶
Status: ✅ APPROVED | Owner: Project Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | No load testing / performance benchmarks |
| Why | 6-day POC, load testing adds 2+ days |
| Alternative | Include load testing (rejected: timeline) |
| Trade-off | Don't know performance limits |
| Noted for Phase 2 | Load testing should be Phase 2 item |
| Approval | Project Lead, BRAC Bank (implied) |
DECISION #006: GitLab HA vs Jenkins Only¶
Status: ✅ APPROVED | Owner: DevOps Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | Both GitLab HA and Jenkins HA |
| Why | BRAC has both, showcase both capabilities |
| Alternative | GitLab only (rejected: incomplete demo) |
| Trade-off | Takes longer (2x CI/CD work) |
| Approval | Project Lead |
DECISION #007: Scope Lock Date¶
Status: ✅ APPROVED | Owner: Project Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | Scope lock: NOW (2026-04-24) |
| Why | 6-day aggressive timeline, no buffer for scope creep |
| Rule | New features → Phase 2 / Future Work |
| Process | All changes logged, estimated impact, BRAC approval |
| Escalation | Project Lead decides if change proceeds |
| Approval | Project Lead |
DECISION #008: Demo Format (Live vs Recorded)¶
Status: ⏳ TBD | Owner: Project Lead | Date: TBD
| Aspect | Details |
|---|---|
| Options | A) Live demo |
| Preferred | TBD (awaiting BRAC feedback) |
| Live Pros | Interactive, can answer questions, credible |
| Live Cons | Risk if component breaks |
| Recorded Pros | Safe, rehearsed, high quality |
| Recorded Cons | Less interactive |
| Decision Timeline | Day 5, 4pm (1 day before demo) |
| Approval | BRAC Bank + Project Lead |
DECISION #009: JBoss Domain Mode vs Standalone¶
Status: ✅ APPROVED | Owner: Middleware Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | Domain mode (as per BRAC requirements) |
| Why | BRAC specifically requested domain mode |
| Alternative | Standalone (simpler, if domain mode too hard) |
| Contingency | If domain mode stalls Day 6, switch to standalone |
| Fallback | Skip JBoss entirely if time insufficient |
| Approval | Project Lead |
DECISION #010: ODF Storage vs External Storage¶
Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | ODF (OpenShift Data Foundation) - local |
| Why | Included with OpenShift, no external dependencies |
| Alternative | AWS EBS / GCP Persistent Disks (adds complexity) |
| Trade-off | Local storage lost if cluster destroyed |
| Acceptable because | POC is short-lived, cluster destroyed after |
| Approval | Infrastructure Lead |
Phase 1 Decisions¶
DECISION #011: Disk Encryption — Tang (not TPMv2)¶
Status: ✅ APPROVED | Owner: Project Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | Tang (network-bound disk encryption) |
| Server | Single Tang Deployment in openshift-brac ns, image quay.io/latchset/tang:latest, LB IP 26.26.199.110 from lan-pool-26, port 7500 |
| Scope | enable_on: all (bootstrap + 3 masters + 3 workers) |
| Alternative | TPMv2 (rejected: per-VM vTPM/EFI/SecureBoot setup; harder to demo) |
| Trade-off | Tang must be up before OCP boot (acceptable — it runs on the already-up hosting platform); no key-persistence across Tang pod restarts, but POC is throwaway |
| Demo value | Network-bound key escrow story maps well to bank compliance requirements |
| Approval | Project Lead |
DECISION #012: VM Networking — flat L2 bridge (not overlay/pod-network)¶
Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | OCP VMs on a dedicated internal L2 bridge with static IP addressing (classic flat-L2 topology that OCP is designed for) |
| Alternative | Overlay / pod-network attachment for VMs (rejected: IP instability with masquerade binding + nested VXLAN MTU complexity) |
| Trade-off | Loses some of the hosting platform's pod-level observability for these VMs; wins OCP-standard flat L2 networking with stable IPs |
| Approval | Project Lead |
DECISION #013: Dedicated haproxy-brac (not reusing the hosting platform's edge LB)¶
Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | New haproxy-brac Deployment in openshift-brac ns, TCP mode, dual-homed br26 (26.26.200.50) + br-real (59.153.29.101) |
| Alternative | Extend the hosting platform's existing edge LB (rejected: it's HTTP mode only; can't TCP-passthrough OCP's 6443/22623 mTLS ports) |
| Trade-off | Duplicates pattern but cleanly scoped + deletable at POC end |
| Approval | Project Lead |
DECISION #014: Assisted Installer — full-ISO + static networking¶
Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | Red Hat Assisted Installer REST API, image_type: full-iso, NMState per-host static networking, MACs pinned in VM specs |
| Alternative | Minimal-ISO (rejected: requires RHCOS fetch at first boot — extra failure mode), UPI agent-install (rejected: more infra to stand up than Assisted) |
| Trade-off | ~1 GB ISO per infra-env vs minimal's ~100 MB; full-ISO boots offline-friendly |
| Approval | Project Lead |
DECISION #015: FIPS 140-3 mode enabled at install¶
Status: ✅ APPROVED | Owner: Project Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | fips: true in POST /v2/clusters payload |
| Rationale | PCI-DSS 3.5/3.6/4.2.1 mandates validated crypto; FIPS cannot be enabled post-install |
| Blocker risk | Third-party images using non-FIPS crypto will fail — verify each POC component's FIPS compatibility before adoption |
| Approval | Project Lead |
DECISION #016: API Server audit profile = WriteRequestBodies¶
Status: ✅ APPROVED | Owner: Security Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | APIServer.spec.audit.profile: WriteRequestBodies (applied at install via custom manifest) |
| Rationale | PCI-DSS 10.2/10.3 — capture who-wrote-what for audit trail. WriteRequestBodies is the sweet spot: mutations logged with bodies, reads metadata-only (avoids log explosion) |
| Alternative | AllRequestBodies (rejected for POC: excessive log volume; useful in high-compliance production only) |
| Approval | Security Lead |
DECISION #017: Identity Provider = Keycloak¶
Status: ✅ APPROVED | Owner: Security Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | Keycloak as OIDC provider; OCP OAuth CR configured to trust it |
| Deployment target | On the hosting platform (not inside OCP — Keycloak must survive OCP rebuilds and could front multiple clusters) |
| Namespace | keycloak-brac on the hosting platform |
| Database | Dedicated PostgreSQL cluster keycloak-pg (mirrors the PowerDNS pattern) |
| Public URL | https://keycloak.brac-poc.comptech-lab.com — cert-manager via Cloudflare DNS-01, exposed via the hosting platform's ingress controller |
| Realm | brac-poc with client openshift-brac; admin user with MFA |
| Alternative | Google OIDC (rejected: external dependency; less BRAC-demo-friendly), htpasswd (rejected: no MFA, not PCI-8.3 compliant), LDAP (rejected: needs LDAP server beyond Keycloak) |
| Trade-off | Adds a deployment task to prereqs; gains: standalone IdP, can broker to BRAC's AD/LDAP later, demo-friendly UI |
| Approval | Security Lead |
DECISION #018: Allowed container registries (strict list)¶
Status: ✅ APPROVED | Owner: Security Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | Image.spec.registrySources.allowedRegistries: [quay.io, registry.redhat.io, registry.connect.redhat.com, ghcr.io, <nexus-internal>] |
| Rationale | PCI-DSS 6.3.2 — only approved image sources. Each registry serves a purpose: Quay (OCP/operators), Red Hat registries (official + partner), GHCR (user-built custom), Nexus (internal mirror/cache) |
| Explicitly blocked | docker.io (no image provenance verification), ghcr.io/<unknown-orgs> (Nexus is preferred mirror for external images) |
| Approval | Security Lead |
DECISION #026: VM tier for all non-OCP tools with DC/DR pattern¶
Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | All tools outside OpenShift deployed as VMs in a DC/DR active-passive pattern. Each tool's VMs named <app>[-<profile>]-vm<N>-<dc\|dr> |
| VMs in scope | Vault, MinIO, Redis, GitLab CE, Jenkins, Nexus, WSO2 APIM (distributed), WSO2 IS, Temporal, n8n, Splunk Free, SigNoz, ClickHouse, AWX, Terrakube, plus per-app PostgreSQL and the ops-runner |
| Default username | ze (not ubuntu) — SSH pubkeys injected via cloud-init (Mac + dl385-2 + ops-runner) |
| OS | Ubuntu 24.04 Server (Noble) cloud image — same for all VMs |
| Inventory | docs/VM-TIER-ARCHITECTURE.md — 68 VMs + 8 OCP nodes = 76 total |
| Approval | Infrastructure Lead, Project Lead |
DECISION #027: WSO2 APIM uses distributed (enterprise) topology, not all-in-one¶
Status: ✅ APPROVED | Owner: Integration Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | Distributed deployment with 5 profiles: Publisher, DevPortal, Gateway, Traffic Manager, Key Manager — one VM per profile per site → 10 VMs + shared PG per site |
| Rationale | Mirrors real bank production pattern; lets BRAC evaluate each tier independently; demonstrates HA principles per component |
| Shared DB | Single PG per site (DC + DR) shared across profiles — streaming replication DC → DR |
| Version | WSO2 APIM 4.4.0, WSO2 IS 7.1.0 |
| Approval | Integration Lead, Project Lead |
DECISION #028: MinIO as universal object storage target¶
Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | All backup + object-storage needs use the MinIO cluster — Splunk indexed data, Vault Raft snapshots, Nexus blob store, ACM cluster-backup, GitLab backup, Jenkins/Nexus/ClickHouse archives, SigNoz cold retention |
| Topology | 3-node distributed MinIO per site (DC + DR) with native site-replication |
| Version | Latest RELEASE.2025.xx.xx LTS |
| Trade-off | Single tool for all storage needs — simple. Production alternative: separate tiers for hot/cold/archive; out of scope for POC. |
| Approval | Infrastructure Lead, Project Lead |
DECISION #029: OpenTofu (not Terraform), AWX + Terrakube for IaC GUIs¶
Status: ✅ APPROVED | Owner: DevOps Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | OpenTofu 1.8.x (MPL 2.0 OSS fork of Terraform) for infrastructure provisioning; Ansible 11.x for configuration |
| GUI | AWX (Red Hat's upstream of AAP) for Ansible; Terrakube (OSS TFE-alternative) for OpenTofu — both self-hosted on VMs |
| Two Git repos | brac-poc-infrastructure (OpenTofu modules) + brac-poc-ansible (playbooks) on GitLab CE |
| Workflow | MR → CI lint → merge → Terrakube/AWX webhook → runs on VMs; no direct SSH-root-to-configure drift |
| Bootstrap | ops-runner has CLI OpenTofu + Ansible; used to create AWX/Terrakube VMs; after that, UIs take over |
| Rejected alternatives | HashiCorp Terraform (BSL license concerns for a bank); Rundeck (less specialized per tool) |
| Approval | DevOps Lead, Project Lead |
DECISION #030: DC/DR strategy — per-component documented RPO/RTO + quarterly drills¶
Status: ✅ APPROVED | Owner: Infrastructure + Platform Leads | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | Active-passive DR for all components; RPO/RTO targets documented per tier in docs/DC-DR-STRATEGY.md |
| Replication mechanisms | Vault OSS Raft snapshot → MinIO every 15 min · MinIO native site-replication · PostgreSQL streaming replication · Redis Sentinel + cross-site replica · ACM cluster-backup-chart (15-min schedule) · ArgoCD AppSet targeting both spokes |
| Convergence point | MinIO cluster-pair acts as the universal backup target for every component |
| Drills | 7 drills catalogued in docs/DR-DRILL-PLAYBOOK.md (MinIO, Vault, PG, Redis, hub, spoke, full-site integrated) |
| Demo posture | Pre-record the full-site drill on Day 4-5; show recording on demo day (full live drill too risky on Day 6) |
| Approval | Infrastructure Lead, Platform Lead, Project Lead |
DECISION #031: Observability — dual stack (Red Hat native + SigNoz + Splunk Free)¶
Status: ✅ APPROVED | Owner: Platform Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Red Hat native (on OCP) | Red Hat build of OpenTelemetry 0.144.0-2 (instrumentation) + Tempo 0.20.0-3 (traces) + Loki 6.5.0 (logs) + Cluster Observability Operator 1.4.0 (cross-cluster metrics on hubs) + Red Hat OpenShift Logging 6.5.0 |
| SigNoz + ClickHouse (on VMs) | Original POC requirement retained; SigNoz UI + ClickHouse backend run on VMs with signoz-vm1-dc/dr + clickhouse-vm1-dc/dr; demonstrates community APM stack alongside Red Hat supported |
| Splunk Free (on VMs) | PCI-DSS + client requirement for log forwarding; 500 MB/day cap; 2-day retention; OCP Logging + OTel exports forward to Splunk HEC |
| Rationale | Shows BRAC both the Red Hat-supported path (vendor-backed) and the popular community tooling; Splunk satisfies compliance; SigNoz demos flexibility |
| Approval | Platform Lead, Security Lead |
DECISION #021: Domain strategy — per-cluster subdomain on opp.brac-poc.comptech-lab.com¶
Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Root | brac-poc.comptech-lab.com (project umbrella) |
| OpenShift umbrella | opp.brac-poc.comptech-lab.com |
| Per-cluster API | api.<cluster-name>.opp.brac-poc.comptech-lab.com — e.g. api.spoke-dc.opp.brac-poc.comptech-lab.com |
| Per-cluster routes | *.routes.<cluster-name>.opp.brac-poc.comptech-lab.com — per OCP-spec, replaces the default *.apps. wildcard |
| Platform tools (outside OCP) | *.apps.brac-poc.comptech-lab.com — docs site, monitoring consoles, tooling |
| Rationale | Per-cluster prefix keeps DR failover explicit (operator decides when to flip) and avoids ambiguity in logs/certs |
| TLS | Wildcard Let's Encrypt via cert-manager + Cloudflare DNS-01 solver, auto-issued per cluster |
| Approval | Infrastructure Lead, Project Lead |
DECISION #022: Four-cluster topology (2 hubs + 2 spokes, DC/DR)¶
Status: ✅ APPROVED | Owner: Infrastructure Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | hub-dc + hub-dr (SNO management clusters) + spoke-dc + spoke-dr (compact 3-node workload clusters) |
| Pattern | DC/DR active-passive; DR clusters are hot standbys synced via ACM cluster-backup + ArgoCD ApplicationSets |
| Hub specs | 1 node × 64c/64GB/1TB each (SNO) |
| Spoke specs | 3 nodes × 64c/64GB/1TB each → 192c/192GB/3TB per cluster (compact 3-node) |
| Total | 512 vCPU + 512 GB + 8 TB across 4 clusters |
| Alternative (rejected) | Single multi-node cluster — loses DR story, doesn't exercise RHACM hub-spoke patterns BRAC wants to evaluate |
| Alternative (rejected) | All 4 SNO — too small for workload realism on spokes |
| Approval | Infrastructure Lead, Project Lead |
DECISION #023: Management fabric = RHACM (hub) → managed clusters (spokes + DR hub)¶
Status: ✅ APPROVED | Owner: Platform Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | Red Hat Advanced Cluster Management for Kubernetes, release-2.13 on hub-dc (primary) and hub-dr (DR hot-standby) |
| Hub hosts | RHACM, ACS Central, OpenShift GitOps, Compliance Operator, Cluster Observability Operator, RHBK Keycloak |
| Spokes run | All POC workloads; they are ACM-managed clusters |
| Failover mode | Hub-dr is itself registered as managed-cluster of hub-dc; on hub-dc loss, hub-dr's MultiClusterHub takes over via ACM cluster-backup |
| Approval | Platform Lead, Project Lead |
DECISION #024: Red Hat operators only (no community operators, ever)¶
Status: ✅ APPROVED | Owner: Security Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | All operators sourced from redhat-operators CatalogSource (primary) or certified-operators (partner-certified, acceptable). Community + redhat-marketplace catalogs disabled cluster-wide via OperatorHub CR |
| Verification | Before installing any operator: confirm channel + CSV version via oc get packagemanifest <name>; pin the exact CSV in the Subscription manifest for reproducibility |
| Implications | Uses RHBK (Red Hat Build of Keycloak) not community Keycloak; Red Hat AMQ Streams not community Strimzi; OpenShift GitOps not community Argo CD |
| Detailed matrix | See docs/OPERATOR-CATALOG.md for the full operator inventory with channels |
| Rationale | Support contract + supply-chain trust for a bank audit; no operator source that isn't directly Red Hat-maintained or Red Hat-certified |
| Approval | Security Lead, Project Lead |
DECISION #025: GitOps-only operational model¶
Status: ✅ APPROVED | Owner: DevOps Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | All cluster configuration + workload deployment flows through Git → OpenShift GitOps (ArgoCD) → clusters |
| Pattern | App-of-Apps + ApplicationSets + pull-mode via RHACM ManifestWork propagation (per Red Hat RHACM 2.13 best practices, verified 2026-04-24) |
| Repo | openshift-platform-gitops on the BRAC POC self-hosted GitLab (GitLab HA deployed on spoke-dc, Issue #4). Mirrored to comptech-lab/openshift-platform-gitops on GitHub for review |
| Compliance | ACM Policy framework for governance (compliance baseline, allowed-registries, PSA, audit profile); ArgoCD for CD — both used together per Red Hat guidance |
oc usage |
Reserved for: (a) initial GitOps operator Subscription on each hub · (b) root Application apply that points at the Git repo · (c) break-glass emergencies. All other changes must go through a Git MR |
| Alternative (rejected) | Hybrid of oc apply and ArgoCD — rejected: breaks audit trail and creates drift |
| Detailed design | See docs/GITOPS-ARCHITECTURE.md |
| Approval | DevOps Lead, Project Lead |
DECISION #020: Docs site — MkDocs Material on Cloudflare Pages¶
Status: ✅ APPROVED | Owner: Project Lead | Date: 2026-04-24
| Aspect | Details |
|---|---|
| Chosen | MkDocs Material built from the docs/ folder, hosted on Cloudflare Pages, custom domain docs.brac-poc.comptech-lab.com |
| Alternative — Log4Brains | Rejected: too narrow (ADRs only), covers ~5% of what we need to share |
| Alternative — self-host on hosting platform | Rejected for POC: requires ArgoCD or webhook plumbing for auto-deploy; adds 2+ hrs setup; Cloudflare Pages auto-deploys on git push for free |
| Alternative — GitHub Pages | Rejected: requires GitHub Pro/Team plan for private repos |
| Build | pip install -r requirements.txt && mkdocs build — Python 3.12, ~0.8s build, ~4.5 MB output (24 pages) |
| Auto-update | Every push to main → webhook → Cloudflare build → live in ~30s |
| Cost | $0 (Cloudflare Pages free tier: 500 builds/mo; we'll use ~30-50) |
| Access | Public URL; can be Cloudflare-Access-protected later if BRAC prefers gated |
| Approval | Project Lead |
| Status: ✅ APPROVED | Owner: Security Lead |
| Aspect | Details |
|---|---|
| Chosen | Two ScanSettingBinding CRs running: pci-dss-4 profile + ocp4-cis profile, both on a weekly schedule (initial) + on-demand ad-hoc |
| Rationale | PCI-DSS-4 covers bank-specific controls; OCP4-CIS covers general platform hardening. ~60-70% overlap, but each catches what the other misses. Running both = broadest evidence for BRAC |
| Artifact | Compliance reports (HTML+XML) exported to MinIO for BRAC demo. Remediation CRs auto-generated for FAILs. |
| Approval | Security Lead |
Phase 2 Decisions¶
(Updated as made)
Phase 3 Decisions¶
(Updated as made)
Scope Changes Log¶
Any changes to original scope are logged here
Scope-change template
Scope Change #NNN — short title
| Field | Value |
|---|---|
| Date | when requested |
| Requested by | BRAC or internal |
| Request | what they asked for |
Impact analysis
- Effort: X hours
- Timeline: slips to date · OR delays component
- Risk: what could go wrong
Options
- A) Add it now → timeline slips X hours
- B) Add it, cut [component] → timeline stays the same
- C) Defer to Phase 2 → timeline stays the same
Decision: Option [A / B / C] Approved by: Project Lead · BRAC Bank (if applicable) Reason: why this option was chosen
Trade-Off Registry¶
Every major decision has trade-offs. Document them here.
| Decision | Benefit | Cost | Acceptable? |
|---|---|---|---|
| OTel instead of Datadog | Vendor neutrality | Operational overhead | ✅ YES |
| Terraform for IaC | Automation, repeatability | Learning curve, updates | ✅ YES |
| Single etcd master | Simpler setup | No K8s HA (acceptable for POC) | ✅ YES |
| KRaft instead of ZooKeeper | Simpler | Newer technology | ✅ YES |
| No load testing | Saves 2 days | Unknown performance | ✅ YES (Phase 2) |
| Both GitLab + Jenkins | Showcases both | More work | ✅ YES |
| 6-day timeline | Aggressive, demo ready | High risk, no buffer | ✅ YES (risk mitigated) |
| Domain mode JBoss | BRAC requirement | Complex learning curve | ✅ YES (fallback available) |
Questions BRAC Might Ask (And Answers)¶
Q: Why not use [competing technology]?¶
Answer: Check Decision Log #[X]. We evaluated it, here's why we chose the other option.
Q: Why are you doing it this way?¶
Answer: Check ADR [X] in docs/adr/. Explains context, alternatives, and rationale.
Q: Why did you skip [component]?¶
Answer: Check Scope Changes Log or Decision #[X]. Here's the impact analysis and why.
Q: What if [component] fails?¶
Answer: Check Risk Register and Assumptions & Constraints. Here's our mitigation strategy and fallback plan.
Q: Why these design choices?¶
Answer: Check Architecture.md. Every component is justified. ADRs explain the why.
How to Use This Log¶
During POC: - Record every significant decision - Before approving, check: What are trade-offs? Is it documented? - When BRAC asks questions, reference this log
At POC End: - Review all decisions - Document lessons learned - Input to Phase 2 planning
For BRAC Bank: - Share this with final report - Shows thoughtful decision-making - Demonstrates risk awareness
Decision Log Created: 2026-04-24
Status: Active (decisions being added)
Owner: Project Lead
Updated: As decisions are made