Definition of Done (DoD)¶
Phase-by-phase acceptance criteria. A phase is considered DONE only when ALL DoD criteria are met.
Phase 1: Foundation & Infrastructure (Days 1-3)¶
Issue #1: OpenShift Cluster - DONE When:¶
Infrastructure:
- [ ] 3 nodes in "Ready" state (oc get nodes)
- [ ] All cluster operators "Available" (oc get clusteroperators)
- [ ] OpenShift console accessible via URL
- [ ] kubeconfig exported and tested (oc whoami works)
Storage:
- [ ] ODF storage classes exist: ocs-storagecluster-ceph-rbd, ocs-storagecluster-ceph-rgw
- [ ] Test PVC created and bound successfully
- [ ] Storage test passed: Can read/write to persistent volume
Compliance: - [ ] Compliance Operator installed - [ ] PCI-DSS scan profile created - [ ] At least one scan completed (PASSED or with known remediations)
Image Security: - [ ] ACS (Advanced Cluster Security) installed - [ ] Policy configured to block "Critical" vulnerabilities - [ ] Test: Pod with critical CVE rejected
Artifacts:
- [ ] Cluster info documented: cluster-info.txt (console URL, API server, kubeconfig path)
- [ ] Terraform code committed to infra/openshift-terraform branch
- [ ] PR merged to develop
Validation:
- [ ] scripts/validate.sh passes OpenShift checks
- [ ] Cluster accessible to all team members (if needed)
Issue #2: Kafka KRaft Cluster - DONE When:¶
Broker Deployment:
- [ ] 3 Kafka brokers running and ready
- [ ] Broker connectivity verified: kafka-broker-api-versions.sh --bootstrap-server kafka:9092 returns all 3 brokers
- [ ] No broker errors in logs
Topics:
- [ ] 4 topics created:
- [ ] telemetry.logs (3 partitions, replication-factor 2)
- [ ] telemetry.metrics (3 partitions, replication-factor 2)
- [ ] telemetry.traces (3 partitions, replication-factor 2)
- [ ] telemetry.dlq (1 partition, replication-factor 2)
- [ ] Topic validation: kafka-topics.sh --describe --topic telemetry.logs shows correct configuration
Schema Registry: - [ ] Schema Registry pod running and ready - [ ] Schema Registry endpoint accessible (curl test returns 200) - [ ] Test schema registered successfully - [ ] Validation working (invalid payload rejected)
Artifacts:
- [ ] Bootstrap servers documented: kafka-0:9092,kafka-1:9092,kafka-2:9092
- [ ] Terraform code committed
- [ ] PR merged to develop
Validation:
- [ ] scripts/validate.sh passes Kafka checks
- [ ] Can produce to topic: kafka-console-producer.sh --topic telemetry.logs --broker-list ...
- [ ] Can consume from topic: kafka-console-consumer.sh --topic telemetry.logs ...
Issue #3: Redis Sentinel HA - DONE When:¶
Nodes: - [ ] 3 Redis data nodes running and ready - [ ] 3 Redis Sentinel nodes running and ready - [ ] All pods in Running state
Replication:
- [ ] Master elected and stable
- [ ] Replication working: SET on master, GET on slave returns value
- [ ] Replication lag < 10ms (via info replication)
Failover: - [ ] Simulate failover: Kill master pod - [ ] New master elected within 30 seconds - [ ] No data loss (test key still retrievable) - [ ] Sentinels monitor new master correctly
Connectivity:
- [ ] Connection string documented: redis-sentinel.redis.svc.cluster.local:26379
- [ ] Test connection from external pod works
- [ ] SSL/TLS tested (if configured)
Artifacts:
- [ ] Terraform code committed
- [ ] PR merged to develop
Validation:
- [ ] scripts/validate.sh passes Redis checks
- [ ] Failover test documented in validation report
Issue #4: GitLab & Jenkins HA - DONE When:¶
GitLab: - [ ] 2 GitLab pod replicas running - [ ] PostgreSQL backend running - [ ] GitLab console accessible and responsive - [ ] Container registry working: Can push/pull images - [ ] Initial admin credentials documented - [ ] Monorepo created with sample structure
Jenkins: - [ ] 2 Jenkins replica pods + agents running - [ ] Jenkins console accessible and responsive - [ ] Kubernetes plugin configured - [ ] Test job created and executed successfully - [ ] Initial admin credentials documented
CI/CD Integration:
- [ ] .gitlab-ci.yml in monorepo triggers on push
- [ ] Sample pipeline runs and completes
- [ ] Build artifacts generated
Artifacts:
- [ ] Terraform code committed
- [ ] Monorepo initialized with README and sample structure
- [ ] PR merged to develop
- [ ] GitLab & Jenkins URLs documented
Validation:
- [ ] scripts/validate.sh passes GitLab/Jenkins checks
- [ ] Both consoles accessible from test machine
Phase 1 Sign-Off¶
Phase 1 is DONE when:
- ✅ All 4 issues meet their individual DoD
- ✅ scripts/validate.sh passes ALL Phase 1 checks
- ✅ No critical errors in logs across all components
- ✅ All infrastructure documented and accessible
- ✅ All code merged to develop
- ✅ Ready for Phase 2 (teams can proceed with Kubernetes deployments)
Sign-Off Owner: Infrastructure Lead
Approval Required: Project Lead
Target Date: EOD Day 2
Phase 2: Kubernetes Components (Days 3-5)¶
Issue #5: Compliance Operator - DONE When:¶
- Compliance Operator pod running in
openshift-compliance - PCI-DSS scan profile available
- Scan completed with DONE status
- Results exported (JSON format)
- Remediation report generated
- All remediations understood (passed or acknowledged risk)
Validation: - [ ] PDF report generated and reviewed - [ ] Compliance scan results in validation report
Issue #6: OpenTelemetry Stack - DONE When:¶
OTel Collector:
- [ ] DaemonSet deployed on all 3 nodes (DESIRED=3, READY=3)
- [ ] Collector endpoints responsive:
- [ ] OTLP receiver (port 4317): grpcurl -plaintext localhost:4317 list works
- [ ] Prometheus receiver (port 8888): curl localhost:8888/metrics returns data
SigNoz: - [ ] SigNoz pods running (web UI, backend) - [ ] SigNoz web UI accessible and loads - [ ] Services registered in UI
ClickHouse:
- [ ] ClickHouse pod running
- [ ] Database initialized
- [ ] Tables created: logs, metrics, traces
- [ ] Test data inserted successfully
Kafka Integration: - [ ] OTel Collector configured to export to Kafka - [ ] Kafka topics receiving data (verified with consumer) - [ ] No export errors in collector logs
Sample Application:
- [ ] Sample app deployed (both with and without SDK)
- [ ] App generating traffic (curl returns 200)
- [ ] Traces appearing in SigNoz within 10 seconds of request
Dashboards: - [ ] At least 3 dashboards visible in SigNoz: - [ ] Application dashboard (service metrics) - [ ] System dashboard (node metrics) - [ ] Tracing dashboard (traces visible) - [ ] Traces show proper span hierarchy
Retention: - [ ] 2-day hot retention policy verified in ClickHouse - [ ] Old data successfully archived (or archived mock destination works)
Artifacts:
- [ ] K8s manifests in k8s/observability/
- [ ] PR merged to develop
- [ ] SigNoz URL documented
Validation:
- [ ] E2E test: Sample app → OTel Collector → Kafka → SigNoz ✅
- [ ] Load test: 100+ spans flow through pipeline without loss
- [ ] scripts/validate.sh shows "OTel Stack: PASSING"
Issue #7: WSO2 APIM - DONE When:¶
Deployment: - [ ] 2 APIM pod replicas running - [ ] Identity Server pod running - [ ] Replicas healthy (no errors in logs)
API Gateway: - [ ] APIM console accessible (login works) - [ ] API gateway endpoint responsive - [ ] Sample API created and deployed - [ ] Can invoke API: Returns 200 with correct response
SSO Integration: - [ ] Identity Server configured - [ ] SAML provider configured - [ ] OIDC provider configured
SSO Testing (Either SAML OR OIDC must work): - [ ] SAML: Login redirects to IS, auth works, returns to APIM authenticated ✅ OR - [ ] OIDC: Token endpoint returns valid JWT, can use token for API calls ✅
Rate Limiting: - [ ] Rate limiting policy created (e.g., 10 req/min) - [ ] Test: Requests 1-10 succeed, request 11+ return 429 - [ ] Rate limit headers present in response
Artifacts:
- [ ] K8s manifests in k8s/wso2/
- [ ] PR merged to develop
- [ ] APIM console URL documented
- [ ] Sample API documented
Validation:
- [ ] scripts/validate.sh shows "WSO2 APIM: PASSING"
- [ ] SSO flow documented/recorded
Issue #8: Middleware - DONE When:¶
Open Liberty:
- [ ] Open Liberty pod(s) running
- [ ] App accessible on /app endpoint (returns 200)
- [ ] Health check endpoint responsive (/health)
NGINX Load Balancer:
- [ ] NGINX ingress controller running
- [ ] Routes traffic to Open Liberty
- [ ] Can access: http://nginx-ingress/app
Canary Routing: - [ ] Canary weights set: 90/10 (stable/canary) - [ ] Traffic distribution verified (send 100 requests, verify ~90 hit stable) - [ ] Can adjust weights without downtime
Observability: - [ ] Middleware metrics exported to OTel Collector - [ ] Metrics visible in SigNoz (e.g., HTTP requests, latency)
Artifacts:
- [ ] K8s manifests in k8s/middleware/
- [ ] PR merged to develop
Validation:
- [ ] scripts/validate.sh shows "Middleware: PASSING"
- [ ] Canary traffic split verified in logs
Phase 2 Sign-Off¶
Phase 2 is DONE when:
- ✅ All 4 issues meet their individual DoD
- ✅ End-to-end observability working (trace → SigNoz visible)
- ✅ API gateway protecting endpoints with rate limiting
- ✅ Canary routing operational
- ✅ scripts/validate.sh passes ALL Phase 2 checks
- ✅ All code merged to develop
Sign-Off Owner: Platform Lead
Approval Required: Project Lead
Target Date: EOD Day 4
Phase 3: Supporting Components (Days 5-6)¶
Issue #9: Trivy Dashboard - DONE When:¶
- Dashboard deployed and accessible
- Image scans showing vulnerabilities
- SBOM generated for sample app
- Reports downloadable
Issue #10: ArgoCD - DONE When:¶
- ArgoCD deployed and accessible
- Sample app defined in Git
- ArgoCD synced app state with Git ✅
- Can trigger sync manually
Issue #11: Nexus - DONE When:¶
- Nexus deployed and accessible
- Docker repo: Can push/pull images ✅
- Maven repo: Accessible ✅
- NPM repo: Accessible ✅
Issue #12: JBoss - DONE When:¶
- Domain controller running
- Managed servers running
- Domain console accessible
- Sample app deployable
Issue #13: Validation Report - DONE When:¶
- All 9 components validated ✅
- Validation report generated (PDF/Markdown)
- Demo video recorded (5 min walkthrough)
- All success criteria documented
Phase 3 Sign-Off¶
Phase 3 is DONE when:
- ✅ All 5 issues meet their individual DoD
- ✅ Validation report completed and approved
- ✅ Demo video ready
- ✅ All code merged to develop
- ✅ POC COMPLETE: Ready to present to BRAC Bank
Sign-Off Owner: Project Lead
Approval Required: BRAC Bank (Aman Ullah Sarker)
Target Date: EOD Day 6
Overall POC Completion Criteria¶
POC is DONE when:
✅ All Phases Complete (DoD met for all 13 issues)
✅ All Components Accessible:
- OpenShift console
- GitLab console
- Jenkins console
- SigNoz dashboards
- APIM console
- Trivy dashboard
- ArgoCD UI
- Nexus console
- JBoss console
✅ All Key Flows Working: - OpenTelemetry trace: App → OTel → Kafka → SigNoz ✅ - API Request: Client → NGINX → Liberty → WSO2 Gateway ✅ - Deployment: Git commit → GitLab CI → Nexus → ArgoCD → K8s ✅
✅ Compliance Met: - PCI-DSS scan completed - Compliance report generated - No critical findings unaddressed
✅ Documentation: - All deployment steps documented - All validation procedures executed - Demo video ready
✅ BRAC Sign-Off: - Aman Ullah Sarker approves POC - Feedback documented for next phase - Success criteria met
How to Use DoD¶
- Per Issue: Before merging PR, verify all DoD items ✅
- Per Phase: Before phase sign-off, run
scripts/validate.sh - Project Lead: Reviews DoD checklist before BRAC presentation
- BRAC Bank: Verifies all components accessible during demo
DoD is Non-Negotiable: If criteria not met, issue stays open. No "technical debt" exceptions.
DoD Created: 2026-04-24
Last Updated: 2026-04-24
Owner: Project Lead
Status: Active (Ready for Phase 1)