Team & Execution Guide¶

Team structure, expertise requirements, coordination patterns, and knowledge transfer for a smooth 6-day execution.

Team structure¶

Recommended roles (minimum 3 people; ideal 5+)¶

Role	Responsibilities	Skills required	Time commitment
Infrastructure Lead	OpenShift provisioning, IaC	Terraform, OpenShift, networking	Days 1-3 full · Days 4-6 support
Platform Lead	OTel stack, SigNoz, Kafka, Redis	Kubernetes, observability, databases	Days 2-5 full
DevOps Lead	GitLab, Jenkins, ArgoCD, Nexus	CI/CD, container registries, Git	Days 1-2 full · Days 5-6 full
Integration Lead	WSO2 APIM + IS, SSO	WSO2, auth protocols, API gateways	Days 2-4
Security Lead	Compliance scanning, ACS, Trivy	K8s security, compliance, scanning	Days 2-3 · Day 5
Project Lead	Coordination, BRAC comms, risk mgmt	Leadership, communication	All 6 days

If you're solo¶

Working alone is high risk. Strategy:

Phase	Action
Day 0	Pre-stage everything possible
Days 1-2	Critical path only: OpenShift + Kafka + OTel
Days 3-4	WSO2 + middleware; start Phase 3 prep
Days 5-6	Finish Phase 3, demo, report

Solo risks

Single point of failure — you get sick, POC stops
Context switching between components is slow
No peer review / sanity checks
Knowledge transfer is near-impossible (only you know how it works)

Solo mitigations: document heavily as you build · record yourself explaining components · automate everything (no manual steps) · line up an on-call backup (friend/colleague).

Expertise requirements¶

Must-have skills for everyone¶

Git (branching, commits, PRs) — ~70% of collaboration
Bash scripting for deployments
Kubernetes basics (kubectl)
Reading error logs

Per-component skill matrix¶

Component	Required skill	Nice-to-have
OpenShift	`openshift-install`, `oc` CLI, VM provisioning	IaaS automation
Kafka	Kubernetes, distributed-systems concepts	KRaft mode, Kafka ops
Redis	Kubernetes, caching patterns	Sentinel, failover concepts
GitLab/Jenkins	CI/CD pipeline design, container registries	Git workflows, plugin config
OTel/SigNoz	Observability concepts, time-series DBs	APM, tracing, metrics
WSO2	API gateway concepts, SAML/OIDC	OAuth2, SSO configuration
NGINX/Liberty	Load balancing, app servers	Canary deployment, routing
Trivy	Container security scanning	Supply-chain security, SBOM
JBoss	App server concepts	Domain mode, JNDI, datasources

If the team lacks expertise¶

Pre-POC: 2-3h demo session per tool
During: Pair experienced person with learner
Always: whoever deploys also documents for the team
Recorded walkthroughs: screen-record yourself, share with the team

Example — nobody knows WSO2:

Day 0: watch a 30-min WSO2 tutorial
Day 2: pair-program with WSO2 community help
Day 2 evening: recorded walkthrough of what you learned
Team review: everyone understands setup

Team communication¶

Daily standup (15 min, 09:00)¶

Three questions per person:

What did I complete yesterday?
What am I working on today?
What's blocking me?

Template:

Infrastructure Lead Today: OpenShift provisioning (8h, 60% done) Tomorrow: Finish OpenShift (4h), validate storage (1h) Blocker: None

Platform Lead Today: Kafka KRaft setup (2h), started OTel collector (1h) Tomorrow: OTel collector (3h), SigNoz (2h) Blocker: Need Docker images for OTel, downloading overnight

Rule: if blockers, propose a solution — don't just state the problem.

Escalation protocol (stuck > 30 min)¶

Step	Action	Time budget
1	Slack/Chat: post the problem + what you've tried	5 min
2	Buddy reviews — another team member takes a look	15 min
3	Pair program (screen share)	30 min
4	Skip the component, document as "blocked on X", move on	—

Never block the whole team

Keep forward momentum. A blocked component gets parked; others keep shipping.

Knowledge transfer¶

During execution (continuous)¶

Every build session should produce documentation. Examples:

Day 1 — Infrastructure Lead creates docs/OPENSHIFT-SETUP.md:

What the installer does
How to debug provisioning
How to access the cluster
What worked, what didn't

Day 2 — Platform Lead creates docs/OBSERVABILITY-SETUP.md:

How OTel Collector sends to Kafka
How SigNoz queries ClickHouse
Troubleshooting: traces missing? Check these 5 things
How to scale OTel (replicas)

For every component: whoever deploys it also documents it, as they go.

Post-POC knowledge transfer (Day 6 evening + after)¶

Recorded walkthroughs (~10-15 min each):

Infrastructure Lead: "How the OpenShift cluster works"
Platform Lead: "How the OTel pipeline flows"
DevOps Lead: "How to deploy new apps via ArgoCD"

Wiki entry per component:

How to access it (URL / credentials)
How to deploy it (commands)
How to troubleshoot it (5-10 common issues)
How to scale it (replicas? memory?)
How to extend it (custom dashboards, new policies)

One-page runbooks:

How to restart a component
How to restore from backup
How to handle a component failure
How to monitor for issues

Coordination & dependencies¶

Phase 1 — fully parallelizable¶

All 4 issues run concurrently on Day 1:

Issue	Owner	Duration
OpenShift	Infrastructure Lead	8h
GitLab	DevOps Lead	3h
Kafka	Platform Lead	2h
Redis	Platform Lead	1.5h

No waiting. Action: everyone starts Day 1 morning.

Phase 2 — partially sequential¶

OpenShift must be ready before Phase 2 kicks off. Then:

Dependency	Duration	Notes
OpenShift ready (Day 1 EOD)	—	Prerequisite
Compliance scan	1h	Runs in background
OTel + SigNoz + ClickHouse	4-5h	🔴 Critical path
WSO2 APIM + IS	2-3h	Start when OTel > 50% to avoid cluster overload

Infrastructure Lead watches cluster health during Phase 2. If resource exhaustion looms, Platform Lead reduces OTel replicas.

Phase 3 — parallel again¶

All Phase 3 components are independent. All start Day 5 morning.

Issue	Owner	Duration
Trivy	Security Lead	1-2h
ArgoCD	DevOps Lead	1.5-2h
Nexus	DevOps Lead	1-1.5h
JBoss	Middleware Lead	1-2h

Common execution mistakes (and fixes)¶

#	Mistake	Reality	Fix
1	"I'll document after"	Never happens — team forgets, post-POC becomes painful	Document as you build. 5 min/day = 30 min total.
2	"Just do it my way, I'll explain later"	Inconsistency, confusion, rework	2-min huddle on approach before starting a component
3	Working in silos	Multiple people hit the same problem separately	Daily standup, pair-program for blockers, shared chat
4	Skipping tests to save time	Broken deployment → 4-hour debug → lost timeline	Always validate before moving on (each issue has DoD)
5	"We'll fix it later"	"Later" never comes — tech debt ships	Fix immediately · or document as limitation
6	Ignoring resource constraints	Cluster OOM → evictions → cascading failures	Daily `kubectl top nodes`; escalate at > 80% memory
7	Solo-debugging complicated issues	2 hours wasted solo — 20 min with two brains	If stuck > 30 min, call for pair programming
8	Feature creep	"Let me add one more thing" — 1h each, 6h lost	Scope lock enforced; new ideas → Phase 2 list

Phase handoffs¶

Phase 1 → Phase 2 handoff¶

Who: Infrastructure Lead → Platform Lead When: End of Day 1, 17:00 Duration: 30 min

Infrastructure Lead presents:

How the cluster was set up
Where the kubeconfig is
How to access the console
Known issues / quirks
How to scale nodes if needed

Platform Lead asks:

Can I deploy 20 pods without issues?
How much free memory do we have?
Any known problems?
How do I get to the cluster if something breaks?

Deliverables: cluster info document · kubeconfig file · access credentials · infrastructure code with docs.

Phase 2 → Phase 3 handoff¶

Who: Platform Lead + DevOps Lead → Phase 3 team When: End of Day 4, 17:00 Duration: 30 min

Platform Lead presents:

OTel pipeline setup
Kafka topology
ClickHouse queries
Common bottlenecks

DevOps Lead presents:

GitLab / Jenkins setup
Container registry access
How to push images
How to trigger CI/CD

Morale & burnout prevention¶

6 days is intense. Actively prevent burnout:

Daily:

15-min standup (keeps everyone aligned, breaks isolation)
Pair-program for blockers (less frustrating than solo debugging)
Celebrate small wins — "OTel traces working!" 🎉

Days 3-4:

Take a real break — 2h lunch, walk outside
If ahead of schedule: ease off, don't over-invest
Socialize: team lunch or dinner

Day 6:

Wrap up calmly, don't panic
Demo is not your grade — you did your best
Celebrate completion: team dinner after POC

If someone burns out

Redistribute work immediately
No shame
Say: "We can shift your tasks to others. Rest — you're valuable."

Escalation for team issues¶

Team member overwhelmed¶

Person tells Project Lead
Project Lead asks: "What's overloading you?"
Immediate action: shift tasks to others or reduce scope
Don't wait until they break

Two members disagree on approach¶

Quick 15-min discussion
If deadlocked: Project Lead decides (not by vote — by timeline/risk)
Decision made: everyone commits (no undermining)
Learn from it post-POC

Team member not delivering¶

Private conversation: "Here's what I'm seeing. What's going on?"
Understand the blocker (stuck? unclear? task too hard?)
Adjust: more support, different task, or escalate if needed
24h to improve

Per-role Definition of Done¶

Infrastructure Lead¶

OpenShift cluster: 3-node, all operators healthy
Infrastructure code clean, documented, in Git
Cluster monitoring set up
Any resource issues escalated to team
Setup documented in docs/

Platform Lead¶

OTel pipeline flowing data
SigNoz accessible with traces visible
Kafka topics created and validated
Redis HA tested with failover
Documented in docs/

DevOps Lead¶

Integration Lead¶

WSO2 APIM deployed
At least one SSO method (SAML or OIDC) working
Rate-limiting policy applied
Documented in docs/

Security Lead¶

Compliance scan completed and report generated
Trivy dashboard running
SBOM generated
Documented in docs/

Project Lead¶

All issues tracked and progressing
Daily standups happening
BRAC updates sent daily
Blockers escalated within 2 hours
Demo scheduled and prepared
All documentation complete

Created: 2026-04-24 · Owner: Project Lead · Next step: recruit team, assign roles, kickoff training