Team & Execution Guide¶
Team structure, expertise requirements, coordination patterns, and knowledge transfer for a smooth 6-day execution.
Team structure¶
Recommended roles (minimum 3 people; ideal 5+)¶
| Role | Responsibilities | Skills required | Time commitment |
|---|---|---|---|
| Infrastructure Lead | OpenShift provisioning, IaC | Terraform, OpenShift, networking | Days 1-3 full · Days 4-6 support |
| Platform Lead | OTel stack, SigNoz, Kafka, Redis | Kubernetes, observability, databases | Days 2-5 full |
| DevOps Lead | GitLab, Jenkins, ArgoCD, Nexus | CI/CD, container registries, Git | Days 1-2 full · Days 5-6 full |
| Integration Lead | WSO2 APIM + IS, SSO | WSO2, auth protocols, API gateways | Days 2-4 |
| Security Lead | Compliance scanning, ACS, Trivy | K8s security, compliance, scanning | Days 2-3 · Day 5 |
| Project Lead | Coordination, BRAC comms, risk mgmt | Leadership, communication | All 6 days |
If you're solo¶
Working alone is high risk. Strategy:
| Phase | Action |
|---|---|
| Day 0 | Pre-stage everything possible |
| Days 1-2 | Critical path only: OpenShift + Kafka + OTel |
| Days 3-4 | WSO2 + middleware; start Phase 3 prep |
| Days 5-6 | Finish Phase 3, demo, report |
Solo risks
- Single point of failure — you get sick, POC stops
- Context switching between components is slow
- No peer review / sanity checks
- Knowledge transfer is near-impossible (only you know how it works)
Solo mitigations: document heavily as you build · record yourself explaining components · automate everything (no manual steps) · line up an on-call backup (friend/colleague).
Expertise requirements¶
Must-have skills for everyone¶
- Git (branching, commits, PRs) — ~70% of collaboration
- Bash scripting for deployments
- Kubernetes basics (
kubectl) - Reading error logs
Per-component skill matrix¶
| Component | Required skill | Nice-to-have |
|---|---|---|
| OpenShift | openshift-install, oc CLI, VM provisioning |
IaaS automation |
| Kafka | Kubernetes, distributed-systems concepts | KRaft mode, Kafka ops |
| Redis | Kubernetes, caching patterns | Sentinel, failover concepts |
| GitLab/Jenkins | CI/CD pipeline design, container registries | Git workflows, plugin config |
| OTel/SigNoz | Observability concepts, time-series DBs | APM, tracing, metrics |
| WSO2 | API gateway concepts, SAML/OIDC | OAuth2, SSO configuration |
| NGINX/Liberty | Load balancing, app servers | Canary deployment, routing |
| Trivy | Container security scanning | Supply-chain security, SBOM |
| JBoss | App server concepts | Domain mode, JNDI, datasources |
If the team lacks expertise¶
- Pre-POC: 2-3h demo session per tool
- During: Pair experienced person with learner
- Always: whoever deploys also documents for the team
- Recorded walkthroughs: screen-record yourself, share with the team
Example — nobody knows WSO2:
- Day 0: watch a 30-min WSO2 tutorial
- Day 2: pair-program with WSO2 community help
- Day 2 evening: recorded walkthrough of what you learned
- Team review: everyone understands setup
Team communication¶
Daily standup (15 min, 09:00)¶
Three questions per person:
- What did I complete yesterday?
- What am I working on today?
- What's blocking me?
Template:
Infrastructure Lead Today: OpenShift provisioning (8h, 60% done) Tomorrow: Finish OpenShift (4h), validate storage (1h) Blocker: None
Platform Lead Today: Kafka KRaft setup (2h), started OTel collector (1h) Tomorrow: OTel collector (3h), SigNoz (2h) Blocker: Need Docker images for OTel, downloading overnight
Rule: if blockers, propose a solution — don't just state the problem.
Escalation protocol (stuck > 30 min)¶
| Step | Action | Time budget |
|---|---|---|
| 1 | Slack/Chat: post the problem + what you've tried | 5 min |
| 2 | Buddy reviews — another team member takes a look | 15 min |
| 3 | Pair program (screen share) | 30 min |
| 4 | Skip the component, document as "blocked on X", move on | — |
Never block the whole team
Keep forward momentum. A blocked component gets parked; others keep shipping.
Knowledge transfer¶
During execution (continuous)¶
Every build session should produce documentation. Examples:
Day 1 — Infrastructure Lead creates docs/OPENSHIFT-SETUP.md:
- What the installer does
- How to debug provisioning
- How to access the cluster
- What worked, what didn't
Day 2 — Platform Lead creates docs/OBSERVABILITY-SETUP.md:
- How OTel Collector sends to Kafka
- How SigNoz queries ClickHouse
- Troubleshooting: traces missing? Check these 5 things
- How to scale OTel (replicas)
For every component: whoever deploys it also documents it, as they go.
Post-POC knowledge transfer (Day 6 evening + after)¶
Recorded walkthroughs (~10-15 min each):
- Infrastructure Lead: "How the OpenShift cluster works"
- Platform Lead: "How the OTel pipeline flows"
- DevOps Lead: "How to deploy new apps via ArgoCD"
Wiki entry per component:
- How to access it (URL / credentials)
- How to deploy it (commands)
- How to troubleshoot it (5-10 common issues)
- How to scale it (replicas? memory?)
- How to extend it (custom dashboards, new policies)
One-page runbooks:
- How to restart a component
- How to restore from backup
- How to handle a component failure
- How to monitor for issues
Coordination & dependencies¶
Phase 1 — fully parallelizable¶
All 4 issues run concurrently on Day 1:
| Issue | Owner | Duration |
|---|---|---|
| OpenShift | Infrastructure Lead | 8h |
| GitLab | DevOps Lead | 3h |
| Kafka | Platform Lead | 2h |
| Redis | Platform Lead | 1.5h |
No waiting. Action: everyone starts Day 1 morning.
Phase 2 — partially sequential¶
OpenShift must be ready before Phase 2 kicks off. Then:
| Dependency | Duration | Notes |
|---|---|---|
| OpenShift ready (Day 1 EOD) | — | Prerequisite |
| Compliance scan | 1h | Runs in background |
| OTel + SigNoz + ClickHouse | 4-5h | 🔴 Critical path |
| WSO2 APIM + IS | 2-3h | Start when OTel > 50% to avoid cluster overload |
Infrastructure Lead watches cluster health during Phase 2. If resource exhaustion looms, Platform Lead reduces OTel replicas.
Phase 3 — parallel again¶
All Phase 3 components are independent. All start Day 5 morning.
| Issue | Owner | Duration |
|---|---|---|
| Trivy | Security Lead | 1-2h |
| ArgoCD | DevOps Lead | 1.5-2h |
| Nexus | DevOps Lead | 1-1.5h |
| JBoss | Middleware Lead | 1-2h |
Common execution mistakes (and fixes)¶
| # | Mistake | Reality | Fix |
|---|---|---|---|
| 1 | "I'll document after" | Never happens — team forgets, post-POC becomes painful | Document as you build. 5 min/day = 30 min total. |
| 2 | "Just do it my way, I'll explain later" | Inconsistency, confusion, rework | 2-min huddle on approach before starting a component |
| 3 | Working in silos | Multiple people hit the same problem separately | Daily standup, pair-program for blockers, shared chat |
| 4 | Skipping tests to save time | Broken deployment → 4-hour debug → lost timeline | Always validate before moving on (each issue has DoD) |
| 5 | "We'll fix it later" | "Later" never comes — tech debt ships | Fix immediately · or document as limitation |
| 6 | Ignoring resource constraints | Cluster OOM → evictions → cascading failures | Daily kubectl top nodes; escalate at > 80% memory |
| 7 | Solo-debugging complicated issues | 2 hours wasted solo — 20 min with two brains | If stuck > 30 min, call for pair programming |
| 8 | Feature creep | "Let me add one more thing" — 1h each, 6h lost | Scope lock enforced; new ideas → Phase 2 list |
Phase handoffs¶
Phase 1 → Phase 2 handoff¶
Who: Infrastructure Lead → Platform Lead When: End of Day 1, 17:00 Duration: 30 min
Infrastructure Lead presents:
- How the cluster was set up
- Where the kubeconfig is
- How to access the console
- Known issues / quirks
- How to scale nodes if needed
Platform Lead asks:
- Can I deploy 20 pods without issues?
- How much free memory do we have?
- Any known problems?
- How do I get to the cluster if something breaks?
Deliverables: cluster info document · kubeconfig file · access credentials · infrastructure code with docs.
Phase 2 → Phase 3 handoff¶
Who: Platform Lead + DevOps Lead → Phase 3 team When: End of Day 4, 17:00 Duration: 30 min
Platform Lead presents:
- OTel pipeline setup
- Kafka topology
- ClickHouse queries
- Common bottlenecks
DevOps Lead presents:
- GitLab / Jenkins setup
- Container registry access
- How to push images
- How to trigger CI/CD
Morale & burnout prevention¶
6 days is intense. Actively prevent burnout:
Daily:
- 15-min standup (keeps everyone aligned, breaks isolation)
- Pair-program for blockers (less frustrating than solo debugging)
- Celebrate small wins — "OTel traces working!" 🎉
Days 3-4:
- Take a real break — 2h lunch, walk outside
- If ahead of schedule: ease off, don't over-invest
- Socialize: team lunch or dinner
Day 6:
- Wrap up calmly, don't panic
- Demo is not your grade — you did your best
- Celebrate completion: team dinner after POC
If someone burns out
- Redistribute work immediately
- No shame
- Say: "We can shift your tasks to others. Rest — you're valuable."
Escalation for team issues¶
Team member overwhelmed¶
- Person tells Project Lead
- Project Lead asks: "What's overloading you?"
- Immediate action: shift tasks to others or reduce scope
- Don't wait until they break
Two members disagree on approach¶
- Quick 15-min discussion
- If deadlocked: Project Lead decides (not by vote — by timeline/risk)
- Decision made: everyone commits (no undermining)
- Learn from it post-POC
Team member not delivering¶
- Private conversation: "Here's what I'm seeing. What's going on?"
- Understand the blocker (stuck? unclear? task too hard?)
- Adjust: more support, different task, or escalate if needed
- 24h to improve
Per-role Definition of Done¶
Infrastructure Lead¶
- OpenShift cluster: 3-node, all operators healthy
- Infrastructure code clean, documented, in Git
- Cluster monitoring set up
- Any resource issues escalated to team
- Setup documented in
docs/
Platform Lead¶
- OTel pipeline flowing data
- SigNoz accessible with traces visible
- Kafka topics created and validated
- Redis HA tested with failover
- Documented in
docs/
DevOps Lead¶
- GitLab + Jenkins running
- Sample pipeline triggered and working
- ArgoCD syncing apps
- Nexus repos accessible
- Documented in
docs/
Integration Lead¶
- WSO2 APIM deployed
- At least one SSO method (SAML or OIDC) working
- Rate-limiting policy applied
- Documented in
docs/
Security Lead¶
- Compliance scan completed and report generated
- Trivy dashboard running
- SBOM generated
- Documented in
docs/
Project Lead¶
- All issues tracked and progressing
- Daily standups happening
- BRAC updates sent daily
- Blockers escalated within 2 hours
- Demo scheduled and prepared
- All documentation complete
Created: 2026-04-24 · Owner: Project Lead · Next step: recruit team, assign roles, kickoff training