Skip to content

Assumptions & Constraints

Document what we're betting on and the hard limits we're working within.


Key Assumptions

Infrastructure Assumptions

Assumption Impact Verified? Risk if Wrong
12+ vCPU and 24+ GB RAM available in cloud/on-prem High ❌ VERIFY BEFORE DAY 1 Cannot provision cluster → entire POC blocked
Cloud account/environment accessible (API keys valid, networking open) High ❌ VERIFY BEFORE DAY 1 Cannot start provisioning → Day 1 lost
Internet access to Docker registries, Helm repos, GitHub High ❓ Likely Cannot pull images → deployment fails
Terraform state backend working (local or cloud) Medium ❌ VERIFY BEFORE DAY 1 State corruption → reprovisioning painful
Pull secret for OpenShift is current Medium ❌ VERIFY BEFORE DAY 1 Registry auth fails → cannot deploy

Architecture Assumptions

Assumption Impact Verified? Risk if Wrong
OpenTelemetry SDKs work for sample apps (Node.js/Go) High ✅ YES (standard, widely used) Traces not visible → observability demo fails
Kafka KRaft mode is stable (no ZooKeeper) Medium ✅ YES (GA in Kafka 3.3+) Broker crashes → pipeline unstable
WSO2 APIM Helm chart has working SAML/OIDC Medium ✅ PARTIALLY (depends on values) SSO doesn't work → skip that demo
ClickHouse cold archiving works to S3-compatible storage (MinIO) Low ❌ TEST EARLY Archiving fails → retention policy doesn't work
NGINX canary routing via weight distribution works Low ✅ YES (standard Istio/NGINX feature) Traffic not split correctly → demo misleading

Project Assumptions

Assumption Impact Verified? Risk if Wrong
6-day timeline is tight but achievable Critical ❌ UNTESTED Slip by 1-2 days → miss BRAC deadline
Team can work in parallel (no blocking dependencies within Phase 1) High ✅ YES (Phase 1 parallelizable) Sequential work → 2x timeline
GitHub issues & PRs are sufficient for coordination (no extra tooling needed) Medium ✅ YES (standard workflow) Communication gaps → rework
Terraform modules from Registry are stable Medium ✅ MOSTLY (mature projects) Module breaking change → hours of rework
BRAC Bank will provide demo time after Day 6 High ❓ TO CONFIRM No time to present → missed opportunity

Scope Assumptions

Assumption Impact Verified? Risk if Wrong
"All 9 components" means each component deployable, not necessarily all talking together High ⚠️ CLARIFY WITH BRAC Expectation of complex integration → insufficient
Demo success = "All components accessible and basic flow working" Medium ⚠️ ASSUME YES BRAC expects production-grade features → fail
Compliance scan passing = "Few findings" not "Zero findings" Medium ✅ YES (realistic for POC) Expectation of clean scan → fail
Performance testing not required (just functional) Low ✅ YES (POC, not prod) BRAC expects load test → not planned

Hard Constraints

Timeline

Constraint Impact Mitigation
6 days total (Days 1-6) Critical No scope creep, parallelize, cut non-essential features
Phase 1 must be done by EOD Day 2 Critical Start infrastructure on Day 0 evening if possible
Phase 2 must be demo-ready by EOD Day 5 Critical All 4 components functional, even if not all features
Phase 3 is 1.5 days (Day 5 evening → Day 6 EOD) Medium Pre-stage manifests, parallelize deployment

Resource Limits

Constraint Impact Mitigation
3-node cluster (8 vCPU, 32 GB per node) High Be aggressive with pod limits, skip non-critical services if needed
One person managing infrastructure (if that's the case) High Pre-automate Terraform, script everything, document heavily
Limited storage (50-500 GB depending on environment) Medium Monitor usage, enable ClickHouse cold archiving early
No GPU (CPU-only) Low Machine learning components not in scope anyway

External Dependencies

Constraint Impact Mitigation
Docker Hub rate limits (100 pulls/6h for anonymous) Low Use authenticated pulls, use Nexus mirror early
Helm chart availability (some charts may not exist) Medium Pre-download charts, have fallback manual YAML
GitHub API rate limits (60 req/h for unauthenticated) Low Authenticate CLI: gh auth login
Terraform Registry availability Low Pre-download modules, pin versions in code

Network & Security

Constraint Impact Mitigation
No public IP/domain (running locally or private cloud) Medium Use port-forwarding, LoadBalancer type services
Self-signed certificates only (no real TLS) Low Accept browser warnings, document for demo
No persistent public DNS Low Use IP addresses, port numbers in docs

Operational Constraints

Constraint Impact Mitigation
Cluster will be destroyed/recreated after POC Low All config as code (Terraform, K8s YAML) in Git
No prod-like backup/disaster recovery Low Single-environment (no dev/staging/prod)
Manual scaling (no autoscaling configured) Low Scale down non-essential services if CPU constrained

What We're NOT Doing (Out of Scope)

Item Why Reason
Load Testing / Performance Benchmarks Time 6 days insufficient for proper load test
High Availability for Kubernetes Master Time 3-node cluster has 1 etcd instance (not HA)
Multi-Cluster / Federation Scope Single cluster only
Advanced Networking / Service Mesh Time Basic routing only (no Istio)
Advanced RBAC / Multi-Tenancy Time Basic permissions only
Advanced Storage / Multi-Region Time Local storage only
Disaster Recovery Testing Time No backup/restore procedures
Security Hardening (beyond basics) Time Functional security, not production hardened
Custom Metrics / Business Analytics Time Standard metrics only
Advanced Log Correlation Time Logs visible, not cross-referenced

Verification Checklist (Before Day 1)

Must verify these 7 things or POC is at risk:

  • Compute Available: <cloud-cli> describe-instance-type t3.2xlarge shows resource availability for 3 nodes
  • Credentials Valid: aws configure list or gcloud config list shows valid creds
  • Network Open: Ping cluster nodes from test machine, verify no firewall blocks
  • Pull Secret Current: cat /path/to/pull-secret.txt shows recent timestamp, not expired
  • Terraform State Backend: terraform init succeeds, state readable
  • GitHub Access: gh auth status shows authenticated, gh repo view comptech-lab/brac-poc works
  • Internet Connectivity: curl https://registry.terraform.io and curl https://docker.io succeed

If ANY of these fail, escalate immediately. Do not start Day 1.


Scope Lock Date

After this date, no new features will be added:

Date: 2026-04-24 (Now)
Reason: 6-day POC, no buffer for scope creep
Policy: New requests documented as "future work" for post-POC phase


Known Limitations

Observability

  • Traces sampled at 10% (not 100%) to avoid data volume overload
  • ClickHouse hot retention is 2 days, not 30 days (cost/performance tradeoff)
  • No machine learning for anomaly detection

API Management

  • Basic rate limiting only (no quota by user, region, etc.)
  • No advanced security features (mTLS, attestation, etc.)
  • SSO demo is SAML OR OIDC (both might not have time)

Infrastructure

  • Cluster nodes not hardened beyond Compliance Operator scan
  • No multi-tenancy (all workloads in same cluster)
  • No ingress TLS termination (self-signed certs only)

CI/CD

  • Basic pipeline (build → push → deploy)
  • No advanced features (code signing, supply chain security, etc.)
  • GitOps pull-based only (no push-based fallback)

Escalation Path for Assumption Violations

If assumptions are violated:

  1. Immediately notify Project Lead
  2. Document the violation: What assumption? What's different?
  3. Assess impact: How much does it change the timeline/scope?
  4. Propose mitigation: What's the workaround?
  5. Escalate to BRAC Bank if needed: External factors beyond control

Example: - Assumption: 24 GB RAM available - Reality: Only 16 GB available (quota exceeded) - Impact: Cannot run all 9 components simultaneously - Mitigation: Deploy components sequentially, skip Nexus (lower priority) - Escalation: Inform BRAC that Nexus demo will be skipped


Decision Log (Assumptions Made)

Date Assumption Decision Owner Rationale
2026-04-24 6-day timeline achievable ACCEPTED Project Lead Team committed, Phase 1-2 parallelizable
2026-04-24 Terraform modules stable ACCEPTED Infrastructure Modules are GA, widely used
2026-04-24 OTel SDK instrumentation works ACCEPTED Platform Standard SDK, proven in production
2026-04-24 SSO demo = SAML OR OIDC ACCEPTED Security Both not essential, either demonstrates capability
2026-04-24 Load testing out of scope ACCEPTED Project Lead Time constraint, not in BRAC requirements

How to Use This Document

  1. Before Phase 1 Starts: Verify all 7 items in "Verification Checklist"
  2. Daily Standups: If any assumption is violated, escalate immediately
  3. PR Reviews: If code breaks an assumption, reject and discuss
  4. Phase Sign-Off: Confirm all assumptions still hold before phase gates
  5. Post-POC: Lessons learned → input to next POC planning


Document Created: 2026-04-24
Last Updated: 2026-04-24
Owner: Project Lead
Status: Active (Ready for Phase 1)
Next Review: EOD Day 1 (after verification)