Skip to content

Definition of Done (DoD)

Phase-by-phase acceptance criteria. A phase is considered DONE only when ALL DoD criteria are met.


Phase 1: Foundation & Infrastructure (Days 1-3)

Issue #1: OpenShift Cluster - DONE When:

Infrastructure: - [ ] 3 nodes in "Ready" state (oc get nodes) - [ ] All cluster operators "Available" (oc get clusteroperators) - [ ] OpenShift console accessible via URL - [ ] kubeconfig exported and tested (oc whoami works)

Storage: - [ ] ODF storage classes exist: ocs-storagecluster-ceph-rbd, ocs-storagecluster-ceph-rgw - [ ] Test PVC created and bound successfully - [ ] Storage test passed: Can read/write to persistent volume

Compliance: - [ ] Compliance Operator installed - [ ] PCI-DSS scan profile created - [ ] At least one scan completed (PASSED or with known remediations)

Image Security: - [ ] ACS (Advanced Cluster Security) installed - [ ] Policy configured to block "Critical" vulnerabilities - [ ] Test: Pod with critical CVE rejected

Artifacts: - [ ] Cluster info documented: cluster-info.txt (console URL, API server, kubeconfig path) - [ ] Terraform code committed to infra/openshift-terraform branch - [ ] PR merged to develop

Validation: - [ ] scripts/validate.sh passes OpenShift checks - [ ] Cluster accessible to all team members (if needed)


Issue #2: Kafka KRaft Cluster - DONE When:

Broker Deployment: - [ ] 3 Kafka brokers running and ready - [ ] Broker connectivity verified: kafka-broker-api-versions.sh --bootstrap-server kafka:9092 returns all 3 brokers - [ ] No broker errors in logs

Topics: - [ ] 4 topics created: - [ ] telemetry.logs (3 partitions, replication-factor 2) - [ ] telemetry.metrics (3 partitions, replication-factor 2) - [ ] telemetry.traces (3 partitions, replication-factor 2) - [ ] telemetry.dlq (1 partition, replication-factor 2) - [ ] Topic validation: kafka-topics.sh --describe --topic telemetry.logs shows correct configuration

Schema Registry: - [ ] Schema Registry pod running and ready - [ ] Schema Registry endpoint accessible (curl test returns 200) - [ ] Test schema registered successfully - [ ] Validation working (invalid payload rejected)

Artifacts: - [ ] Bootstrap servers documented: kafka-0:9092,kafka-1:9092,kafka-2:9092 - [ ] Terraform code committed - [ ] PR merged to develop

Validation: - [ ] scripts/validate.sh passes Kafka checks - [ ] Can produce to topic: kafka-console-producer.sh --topic telemetry.logs --broker-list ... - [ ] Can consume from topic: kafka-console-consumer.sh --topic telemetry.logs ...


Issue #3: Redis Sentinel HA - DONE When:

Nodes: - [ ] 3 Redis data nodes running and ready - [ ] 3 Redis Sentinel nodes running and ready - [ ] All pods in Running state

Replication: - [ ] Master elected and stable - [ ] Replication working: SET on master, GET on slave returns value - [ ] Replication lag < 10ms (via info replication)

Failover: - [ ] Simulate failover: Kill master pod - [ ] New master elected within 30 seconds - [ ] No data loss (test key still retrievable) - [ ] Sentinels monitor new master correctly

Connectivity: - [ ] Connection string documented: redis-sentinel.redis.svc.cluster.local:26379 - [ ] Test connection from external pod works - [ ] SSL/TLS tested (if configured)

Artifacts: - [ ] Terraform code committed - [ ] PR merged to develop

Validation: - [ ] scripts/validate.sh passes Redis checks - [ ] Failover test documented in validation report


Issue #4: GitLab & Jenkins HA - DONE When:

GitLab: - [ ] 2 GitLab pod replicas running - [ ] PostgreSQL backend running - [ ] GitLab console accessible and responsive - [ ] Container registry working: Can push/pull images - [ ] Initial admin credentials documented - [ ] Monorepo created with sample structure

Jenkins: - [ ] 2 Jenkins replica pods + agents running - [ ] Jenkins console accessible and responsive - [ ] Kubernetes plugin configured - [ ] Test job created and executed successfully - [ ] Initial admin credentials documented

CI/CD Integration: - [ ] .gitlab-ci.yml in monorepo triggers on push - [ ] Sample pipeline runs and completes - [ ] Build artifacts generated

Artifacts: - [ ] Terraform code committed - [ ] Monorepo initialized with README and sample structure - [ ] PR merged to develop - [ ] GitLab & Jenkins URLs documented

Validation: - [ ] scripts/validate.sh passes GitLab/Jenkins checks - [ ] Both consoles accessible from test machine


Phase 1 Sign-Off

Phase 1 is DONE when: - ✅ All 4 issues meet their individual DoD - ✅ scripts/validate.sh passes ALL Phase 1 checks - ✅ No critical errors in logs across all components - ✅ All infrastructure documented and accessible - ✅ All code merged to develop - ✅ Ready for Phase 2 (teams can proceed with Kubernetes deployments)

Sign-Off Owner: Infrastructure Lead
Approval Required: Project Lead
Target Date: EOD Day 2


Phase 2: Kubernetes Components (Days 3-5)

Issue #5: Compliance Operator - DONE When:

  • Compliance Operator pod running in openshift-compliance
  • PCI-DSS scan profile available
  • Scan completed with DONE status
  • Results exported (JSON format)
  • Remediation report generated
  • All remediations understood (passed or acknowledged risk)

Validation: - [ ] PDF report generated and reviewed - [ ] Compliance scan results in validation report


Issue #6: OpenTelemetry Stack - DONE When:

OTel Collector: - [ ] DaemonSet deployed on all 3 nodes (DESIRED=3, READY=3) - [ ] Collector endpoints responsive: - [ ] OTLP receiver (port 4317): grpcurl -plaintext localhost:4317 list works - [ ] Prometheus receiver (port 8888): curl localhost:8888/metrics returns data

SigNoz: - [ ] SigNoz pods running (web UI, backend) - [ ] SigNoz web UI accessible and loads - [ ] Services registered in UI

ClickHouse: - [ ] ClickHouse pod running - [ ] Database initialized - [ ] Tables created: logs, metrics, traces - [ ] Test data inserted successfully

Kafka Integration: - [ ] OTel Collector configured to export to Kafka - [ ] Kafka topics receiving data (verified with consumer) - [ ] No export errors in collector logs

Sample Application: - [ ] Sample app deployed (both with and without SDK) - [ ] App generating traffic (curl returns 200) - [ ] Traces appearing in SigNoz within 10 seconds of request

Dashboards: - [ ] At least 3 dashboards visible in SigNoz: - [ ] Application dashboard (service metrics) - [ ] System dashboard (node metrics) - [ ] Tracing dashboard (traces visible) - [ ] Traces show proper span hierarchy

Retention: - [ ] 2-day hot retention policy verified in ClickHouse - [ ] Old data successfully archived (or archived mock destination works)

Artifacts: - [ ] K8s manifests in k8s/observability/ - [ ] PR merged to develop - [ ] SigNoz URL documented

Validation: - [ ] E2E test: Sample app → OTel Collector → Kafka → SigNoz ✅ - [ ] Load test: 100+ spans flow through pipeline without loss - [ ] scripts/validate.sh shows "OTel Stack: PASSING"


Issue #7: WSO2 APIM - DONE When:

Deployment: - [ ] 2 APIM pod replicas running - [ ] Identity Server pod running - [ ] Replicas healthy (no errors in logs)

API Gateway: - [ ] APIM console accessible (login works) - [ ] API gateway endpoint responsive - [ ] Sample API created and deployed - [ ] Can invoke API: Returns 200 with correct response

SSO Integration: - [ ] Identity Server configured - [ ] SAML provider configured - [ ] OIDC provider configured

SSO Testing (Either SAML OR OIDC must work): - [ ] SAML: Login redirects to IS, auth works, returns to APIM authenticated ✅ OR - [ ] OIDC: Token endpoint returns valid JWT, can use token for API calls ✅

Rate Limiting: - [ ] Rate limiting policy created (e.g., 10 req/min) - [ ] Test: Requests 1-10 succeed, request 11+ return 429 - [ ] Rate limit headers present in response

Artifacts: - [ ] K8s manifests in k8s/wso2/ - [ ] PR merged to develop - [ ] APIM console URL documented - [ ] Sample API documented

Validation: - [ ] scripts/validate.sh shows "WSO2 APIM: PASSING" - [ ] SSO flow documented/recorded


Issue #8: Middleware - DONE When:

Open Liberty: - [ ] Open Liberty pod(s) running - [ ] App accessible on /app endpoint (returns 200) - [ ] Health check endpoint responsive (/health)

NGINX Load Balancer: - [ ] NGINX ingress controller running - [ ] Routes traffic to Open Liberty - [ ] Can access: http://nginx-ingress/app

Canary Routing: - [ ] Canary weights set: 90/10 (stable/canary) - [ ] Traffic distribution verified (send 100 requests, verify ~90 hit stable) - [ ] Can adjust weights without downtime

Observability: - [ ] Middleware metrics exported to OTel Collector - [ ] Metrics visible in SigNoz (e.g., HTTP requests, latency)

Artifacts: - [ ] K8s manifests in k8s/middleware/ - [ ] PR merged to develop

Validation: - [ ] scripts/validate.sh shows "Middleware: PASSING" - [ ] Canary traffic split verified in logs


Phase 2 Sign-Off

Phase 2 is DONE when: - ✅ All 4 issues meet their individual DoD - ✅ End-to-end observability working (trace → SigNoz visible) - ✅ API gateway protecting endpoints with rate limiting - ✅ Canary routing operational - ✅ scripts/validate.sh passes ALL Phase 2 checks - ✅ All code merged to develop

Sign-Off Owner: Platform Lead
Approval Required: Project Lead
Target Date: EOD Day 4


Phase 3: Supporting Components (Days 5-6)

Issue #9: Trivy Dashboard - DONE When:

  • Dashboard deployed and accessible
  • Image scans showing vulnerabilities
  • SBOM generated for sample app
  • Reports downloadable

Issue #10: ArgoCD - DONE When:

  • ArgoCD deployed and accessible
  • Sample app defined in Git
  • ArgoCD synced app state with Git ✅
  • Can trigger sync manually

Issue #11: Nexus - DONE When:

  • Nexus deployed and accessible
  • Docker repo: Can push/pull images ✅
  • Maven repo: Accessible ✅
  • NPM repo: Accessible ✅

Issue #12: JBoss - DONE When:

  • Domain controller running
  • Managed servers running
  • Domain console accessible
  • Sample app deployable

Issue #13: Validation Report - DONE When:

  • All 9 components validated ✅
  • Validation report generated (PDF/Markdown)
  • Demo video recorded (5 min walkthrough)
  • All success criteria documented

Phase 3 Sign-Off

Phase 3 is DONE when: - ✅ All 5 issues meet their individual DoD - ✅ Validation report completed and approved - ✅ Demo video ready - ✅ All code merged to develop - ✅ POC COMPLETE: Ready to present to BRAC Bank

Sign-Off Owner: Project Lead
Approval Required: BRAC Bank (Aman Ullah Sarker)
Target Date: EOD Day 6


Overall POC Completion Criteria

POC is DONE when:

All Phases Complete (DoD met for all 13 issues)
All Components Accessible: - OpenShift console - GitLab console - Jenkins console - SigNoz dashboards - APIM console - Trivy dashboard - ArgoCD UI - Nexus console - JBoss console

All Key Flows Working: - OpenTelemetry trace: App → OTel → Kafka → SigNoz ✅ - API Request: Client → NGINX → Liberty → WSO2 Gateway ✅ - Deployment: Git commit → GitLab CI → Nexus → ArgoCD → K8s ✅

Compliance Met: - PCI-DSS scan completed - Compliance report generated - No critical findings unaddressed

Documentation: - All deployment steps documented - All validation procedures executed - Demo video ready

BRAC Sign-Off: - Aman Ullah Sarker approves POC - Feedback documented for next phase - Success criteria met


How to Use DoD

  1. Per Issue: Before merging PR, verify all DoD items ✅
  2. Per Phase: Before phase sign-off, run scripts/validate.sh
  3. Project Lead: Reviews DoD checklist before BRAC presentation
  4. BRAC Bank: Verifies all components accessible during demo

DoD is Non-Negotiable: If criteria not met, issue stays open. No "technical debt" exceptions.


DoD Created: 2026-04-24
Last Updated: 2026-04-24
Owner: Project Lead
Status: Active (Ready for Phase 1)