Skip to content

Gap Analysis: BRAC POC Implementation Plan

Analysis Date: 2026-04-24
Analyst: Project Lead
Status: Pre-Day 1 Review


Executive Summary

Planning Quality: ⭐⭐⭐⭐⭐ Excellent (15 strategic docs, 13 issues, 3 ADRs)
Execution Readiness: ⚠️ WEAK — 0% of code exists, many technical decisions deferred
Overall Risk: HIGH — Strong planning masks significant execution gaps

Top 5 Critical Gaps

  1. Zero implementation artifacts — No Terraform modules, K8s manifests, or sample apps written
  2. No cloud provider chosen — AWS vs GCP vs on-prem undecided
  3. No team members named — Roles defined but unassigned to real people
  4. No cost estimate or budget — Cloud bill unknown, no spend alerts
  5. No fallback demo — If live demo fails, nothing pre-recorded

🔴 CRITICAL GAPS (Block Day 1)

C1. No Implementation Code Exists

Gap: Repository has 15 docs but zero infrastructure code.

Missing: - terraform/openshift/ — Not created - terraform/kafka/ — Not created - terraform/redis/ — Not created - terraform/cicd/ — Not created - k8s/ — Entire directory missing - scripts/deploy.sh, scripts/validate.sh — Not created - Sample OTel-instrumented app — Not written

Impact: Day 1 will start from scratch on every component — no head start.

Mitigation: Pre-select and download Terraform modules + Helm charts today (Day 0).


C2. Cloud Provider Not Chosen

Gap: Plan references "AWS/GCP/VMs" but no specific provider committed.

Missing: - Which cloud? (AWS, GCP, Azure, on-prem, hybrid?) - Region selection - Account/project ID - Quota verification done - Pricing model (on-demand, reserved, spot?)

Impact: Cannot provision on Day 1 — Terraform modules differ per provider.

Mitigation: Lock cloud provider TODAY. Recommend AWS (most mature Terraform/OpenShift tooling).


C3. No Team Members Named

Gap: Roles defined (Infrastructure Lead, Platform Lead, etc.) but no actual people assigned.

Missing: - Who is the Infrastructure Lead? - Who has OpenShift experience? - Who has WSO2 expertise? - Backup/on-call coverage - Skill matrix (who knows what)

Impact: On Day 1, "Infrastructure Lead starts OpenShift" is undefined work.

Mitigation: Assign names to roles before Day 1. Verify each person has minimum skills for their component.


C4. No Cost/Budget Controls

Gap: No cost estimate or spend controls.

Missing: - Projected 6-day cloud cost (rough estimate: $500-2,000 on AWS) - Budget alerts (e.g., notify at 80% of $X) - Resource cleanup script for post-POC - Spend approval from BRAC or internal

Impact: POC could run up surprise bill. Resources left running after Day 6 = ongoing cost.

Mitigation: - Estimate cost: 3x t3.2xlarge = ~$1/hr each × 3 × 144h = ~$432 + storage/egress = ~$600-800 total - Set billing alert at $500, $800 - Write scripts/teardown.sh for Day 7 cleanup


C5. No Pre-Recorded Backup Demo

Gap: Plan assumes live demo on Day 6 will work flawlessly.

Missing: - Pre-recorded video of key flows (observability, SSO, compliance scan) - Screenshots of every component working - Exported SigNoz dashboards (JSON) - Sample traces/metrics frozen in time

Impact: If something breaks 1 hour before demo (Murphy's law on live systems), no fallback.

Mitigation: Day 5 afternoon: record 10-min video of all working flows. Use as "Plan B" if live demo fails.


🟠 HIGH-PRIORITY GAPS (Fix by Day 2)

H1. Technical Versions Not Pinned

Gap: No specific versions locked for tools/charts.

Missing: - OpenShift version (4.12? 4.13? 4.14?) - Kafka version (3.6? 3.7?) - SigNoz Helm chart version - WSO2 APIM version - Terraform version - Kubernetes version

Impact: Different team members may use different versions → inconsistent results.

Mitigation: Create VERSIONS.md pinning every major dependency.


H2. No End-to-End Integration Test Scenario

Gap: Validation script checks components individually, not together.

Missing: - Single test scenario that exercises sample app → OTel → Kafka → SigNoz → dashboard - Test user for WSO2 SSO flow - Canary routing scenario (10/90 split with measurable proof) - ArgoCD demo app that shows GitOps value

Impact: Components work individually, but end-to-end story falls apart on Day 6 demo.

Mitigation: Day 3: Write scripts/e2e-test.sh that runs ONE flow proving integration.


H3. No CI/CD for the POC Repo Itself

Gap: The repo has no GitHub Actions to lint Terraform, validate YAML, or test scripts.

Missing: - .github/workflows/terraform-lint.yml - .github/workflows/yaml-validate.yml - .github/workflows/shellcheck.yml - Pre-commit hooks for secret scanning

Impact: Broken Terraform/YAML caught in production instead of in PR review.

Mitigation: Add 3 basic GitHub Actions workflows in first hour of Day 1.


H4. No Architecture Diagrams

Gap: ARCHITECTURE.md is text-only. No visual diagrams for BRAC.

Missing: - System context diagram (C4 Level 1) - Container diagram (all 9 components + flows) - Data flow diagrams (observability pipeline) - Network topology diagram - Deployment diagram

Impact: BRAC's stakeholders (non-technical) can't understand the architecture from text alone.

Mitigation: Use Mermaid (free, in Markdown) or draw.io. Add diagrams to ARCHITECTURE.md.


H5. No Runbooks for Component Operations

Gap: TROUBLESHOOTING-GUIDE.md covers failures, but no operational runbooks.

Missing: - Component startup order (if cluster reboots) - Graceful shutdown procedure - Restart commands per component - Health check dashboards/commands

Impact: If cluster reboots on Day 4, team scrambles to bring things back up.

Mitigation: Add docs/RUNBOOKS.md with start/stop/restart for each component.


H6. Terraform State Backend Not Decided

Gap: Plan mentions "local or cloud" state but no decision.

Missing: - Where does Terraform state live? - Is it shared between team members? - Is it backed up? - State lock mechanism (prevent concurrent applies)

Impact: Two team members run terraform apply → state corruption → hours lost.

Mitigation: - Recommend: Terraform Cloud (free for teams, handles locking) - Alternative: S3 backend with DynamoDB lock - Document in terraform/README.md


H7. No BRAC Handoff Plan

Gap: After Day 6, what does BRAC get? How do they run it?

Missing: - Handoff package (code, docs, credentials, training) - Training session plan for BRAC team - Post-POC support model (how long do we support?) - Phase 2 proposal/roadmap for BRAC

Impact: BRAC receives POC code but can't operate it → POC success irrelevant.

Mitigation: Day 6 afternoon: Prepare handoff package. Schedule 1-hour training with BRAC team.


🟡 MEDIUM-PRIORITY GAPS (Fix During Execution)

M1. No Disaster Recovery Plan

Gap: If cluster dies Day 3, what's the recovery plan?

Mitigation: Add DR section to RISK-REGISTER.md. Include "rebuild from Terraform" procedure.


M2. No Meta-Observability

Gap: We're deploying observability tools but not monitoring them.

Mitigation: Use Kubernetes native metrics (kubectl top pods) + minimal alerting.


M3. No Load Testing Even for Basic Validation

Gap: Plan says "100k messages/sec" Kafka test but no script exists.

Mitigation: Write 20-line Kafka load test script. Even 10 min of traffic proves it works.


M4. Communication Channel Not Chosen

Gap: Daily standup format specified, but platform (Slack/Teams/email) not chosen.

Mitigation: Pick one. Slack recommended (most teams use it).


M5. PR Review Turnaround Not Defined

Gap: CONTRIBUTING.md mentions PR reviews but no SLA.

Mitigation: Target: 2-hour review turnaround during POC days. Document in CONTRIBUTING.md.


M6. No Realistic Sample Data

Gap: Demo will use "test data" but no dataset prepared.

Mitigation: Create synthetic dataset: 1000 fake transactions, 100 users. Mark "SAMPLE DATA" clearly.


M7. Helm Charts Not Pre-Validated

Gap: Plan assumes Helm charts for SigNoz, WSO2, GitLab "just work" but haven't been tested.

Mitigation: Day 0 evening: helm template each chart to verify values work.


M8. No Pre-Commit Hooks

Gap: Security relies on "remember to check for secrets" but nothing enforces it.

Mitigation: Install pre-commit with detect-secrets hook. 10-min setup.


🟢 LOW-PRIORITY GAPS (Nice-to-Have)

L1. No Architecture Decision Records for All Decisions

Gap: Only 3 ADRs exist. Full system has ~10 major decisions deserving ADRs.

L2. No Public/Private Documentation Separation

Gap: Some docs are for internal team (RISK-REGISTER) vs BRAC (ARCHITECTURE).

L3. No Legal/IP Documentation

Gap: Who owns the POC code? What can BRAC do with it?

L4. No Demo Script

Gap: Live demo will be ad-hoc. Script it for smooth flow.

L5. No Metrics/KPIs for POC Success

Gap: DEFINITION-OF-DONE.md has acceptance criteria, but no quantifiable KPIs.


📊 Gap Analysis Summary by Category

Category Critical High Medium Low Total
Implementation 2 2 2 1 7
Operations 1 3 2 1 7
Integration 0 1 1 1 3
Documentation 0 2 0 2 4
Cost/Resources 1 0 0 0 1
Team 1 0 1 0 2
Demo/Handoff 1 1 1 0 3
Total 6 9 7 5 27

🎯 Remediation Priority Action Plan

Do TODAY (Day 0 — Pre-POC)

  1. [C2] Choose cloud provider (AWS recommended)
  2. [C3] Assign team members to roles
  3. [C4] Estimate cost + set budget alerts
  4. [H1] Pin technical versions in VERSIONS.md
  5. [H6] Decide Terraform state backend
  6. [M7] Pre-validate Helm charts
  7. [C1] Download/pre-stage Terraform modules for Day 1

Do Day 1 (First 2 Hours)

  1. [H3] Add 3 GitHub Actions workflows
  2. [M8] Install pre-commit hooks with secret detection
  3. [C1] Create terraform/ and k8s/ directory structure
  4. [H4] Draft architecture diagram in Mermaid

Do by Day 3

  1. [H2] Write end-to-end integration test script
  2. [H5] Add RUNBOOKS.md
  3. [M3] Write Kafka load test script
  4. [M6] Create synthetic sample data

Do by Day 5

  1. [C5] Record backup demo video
  2. [L4] Write demo script

Do by Day 6

  1. [H7] Prepare BRAC handoff package
  2. [C4] Write teardown script for cleanup

🚨 Highest-Leverage Action

If you do ONE thing before Day 1:

👉 Pre-stage ALL Terraform modules and Helm charts TODAY.

```bash

Pre-download public Terraform modules

mkdir -p terraform/modules cd terraform/modules git clone https://github.com/terraform-redhat/terraform-rhcs-rosa-classic.git

... (or equivalent for chosen provider)

Pre-download Helm charts

mkdir -p helm-charts helm pull signoz/signoz --untar --destination helm-charts/ helm pull bitnami/kafka --untar --destination helm-charts/ helm pull bitnami/redis --untar --destination helm-charts/ helm pull wso2/apim --untar --destination helm-charts/ ```

Impact: Saves 4-6 hours on Day 1. Catches "chart not found" issues before they block work.


📌 Gaps I'm NOT Worried About

  • Team morale: Good planning = team confidence
  • Documentation quality: Excellent, even for BRAC audit
  • Risk awareness: Well-covered in RISK-REGISTER
  • Scope creep: Scope lock in place
  • Security basics: Guide is comprehensive

🎓 Lessons for Future POCs

  1. Write code DURING planning, not after — Stub out directory structure, basic files
  2. Assign people to roles before writing role docs — Roles without people are fiction
  3. Estimate cost first — Before any cloud provisioning
  4. Pre-record demo — Live demo is high-risk
  5. Pin versions on Day 0 — Prevent "works on my machine"

Status: Gap analysis complete. 27 gaps identified, 6 critical.
Recommendation: Address all CRITICAL gaps before Day 1. Most can be done in 4-6 hours.
Next Review: After Day 2 (verify HIGH gaps addressed)