Security & Compliance Guide¶

Covers credentials management, incident response, data privacy, and compliance violations during the POC.

Credentials & Secrets Management¶

The Problem¶

6-day intense work + multiple people + multiple systems = HIGH RISK of credential leaks

Typical Mistakes: - ❌ Hardcoding credentials in Terraform - ❌ Committing Docker Hub token to Git - ❌ Sharing credentials in Slack/email - ❌ OpenShift pull-secret stored in repo - ❌ Admin passwords in logs - ❌ Database credentials in manifests

Cost of leak: Cluster compromised, data stolen, cleanup takes days

The Rules¶

RULE 1: Never Commit Secrets to Git¶

What gets committed: - ✅ Terraform code (but NOT values) - ✅ Kubernetes manifests (but NOT secrets) - ✅ Docker compose (but NOT credentials)

What NEVER gets committed: - ❌ .tfvars files with values - ❌ kubeconfig with tokens - ❌ Docker credentials - ❌ Database passwords - ❌ API keys, SSH keys, certs - ❌ Pull secrets (OpenShift, Docker Hub)

How to handle: ```bash

WRONG ❌¶

terraform.tfvars # Contains all secrets git add terraform.tfvars git commit -m "config"

RIGHT ✅¶

terraform.tfvars.example # Template, no real values git add terraform.tfvars.example

Add to .gitignore:¶

terraform.tfvars .key .pem kubeconfig ```

Check before every commit: ```bash git diff --cached | grep -i "password|token|secret|key"

If it shows secrets → DON'T COMMIT¶

git reset HEAD filename ```

RULE 2: Use Environment Variables or Secret Management¶

Option A: Environment Variables (simplest for POC) ```bash

.env.local (NOT committed to Git)¶

export OPENSHIFT_API_TOKEN="xxxx" export DOCKER_USERNAME="user" export DOCKER_TOKEN="xxxx" export DB_PASSWORD="xxxx"

In Terraform:¶

variable "openshift_token" { sensitive = true }

Access via: var.openshift_token¶

```

Option B: Secrets Manager (better for production) - Store in: AWS Secrets Manager, Vault, Kubernetes Secrets - Access via: API call at runtime - Rotate frequently

For this POC: Environment variables are fine, but use consistently

RULE 3: Rotate Credentials Frequently¶

Pull Secret: Check expiration date before Day 1 ```bash

If expires in < 7 days: Refresh from Red Hat¶

Download new pull-secret.json¶

Add to .env.local (NOT to Git)¶

```

Admin Passwords: Change after POC ```bash

Admin password for OpenShift? Change it Day 6 evening¶

Admin password for SigNoz? Change it Day 6 evening¶

Don't leave defaults¶

```

Git/Docker Tokens: Create new ones just for this POC ```bash

Create new GitHub token (repo read/write only)¶

Create new Docker Hub token (push/pull images only)¶

Delete after POC¶

```

Access Control & Data Privacy¶

Who Can Access What?¶

Golden Rule: Least privilege - Infrastructure Lead: OpenShift cluster access (needs it) - Platform Lead: OTel/Kafka/Redis (needs it) - Project Lead: Everything (oversight) - BRAC Bank: Read-only access to GitHub, demo environment only

Do NOT give: - ❌ BRAC Bank access to cluster credentials - ❌ Team members access they don't need - ❌ Permanent API keys (use session tokens instead)

Data in Transit¶

Data generated during POC: - [ ] Sample app logs (no real data, OK) - [ ] Traces through OTel (no PII, OK) - [ ] Kubernetes event logs (OK) - [ ] Git commits (code only, OK)

Do NOT include: - ❌ Real BRAC Bank data (credit cards, customer info, account numbers) - ❌ Real user data in sample apps - ❌ Real credentials in demo

If demo needs "realistic" data: - Use fake/synthetic data - Example: "Customer ID 123" (not real account) - Example: "$100 transaction" (example amount) - Mark clearly as "SAMPLE DATA"

Data at Rest¶

Logs & Volumes: - [ ] ClickHouse data: POC only, will be deleted Day 6 - [ ] Kafka messages: Test data, will be deleted - [ ] Cluster storage: Ephemeral, deleted when cluster destroyed

Backups: - Do NOT backup this POC cluster - No snapshot persistence beyond Day 6

Security Scanning & Compliance¶

PCI-DSS Compliance (Required)¶

What BRAC requires: - Compliance Operator scan against PCI-DSS baseline - Report generated showing: - ✅ Passed checks - ⚠️ Failed checks (with remediation plan) - ℹ️ Manual review items

Acceptable findings: - ✅ 0 Critical findings - ✅ 0-3 High findings (with plan to fix) - ✅ Any number of Medium/Low (document for Phase 2)

Not acceptable: - ❌ Unaddressed Critical findings - ❌ No plan for High findings - ❌ No compliance report at all

Timeline: - Day 2: Run compliance scan (30 min) - Day 2: Review report (30 min) - Day 2-3: Fix any Critical findings (2-4 hours) - Day 6: Final report ready

If compliance fails: 1. Identify failure: "PCI-DSS check X failed" 2. Understand it: "This means [security implication]" 3. Fix it: "We're implementing [solution]" 4. Verify fix: "Scan again to confirm pass" 5. If unfixable: "This is a Phase 2 item, POC limitation" 6. Document: "Compliance report shows [what was done]"

Image Security (ACS Policy)¶

Requirement: Block deployment of images with "Critical" vulnerabilities

How to test: ```bash

Create intentionally vulnerable image¶

docker pull library/alpine:3.8 # This has known CVEs

Try to deploy it¶

kubectl apply -f pod-with-cve.yaml

Expected: Pod rejected with ACS policy violation¶

```

If policy not enforcing: ```bash

Check ACS status¶

kubectl get deployment -n stackrox

Check policy¶

kubectl get clustersecuritypolicies

If broken: Fix or document as limitation¶

```

Incident Response¶

What If Something Breaks?¶

Incident = "Component not working, affecting demo"

Response Protocol (15 minutes or less):

Immediate (minute 1):
Pause demo/work
Take screenshot of error
Document what triggered the incident
Diagnose (minutes 1-5):
Check logs: kubectl logs -f [pod]
Check events: kubectl describe pod [pod]
Check cluster health: kubectl top nodes
Restart component if applicable: kubectl rollout restart deployment/[name]
Assess (minute 5):
Can you fix it in < 15 min?
- YES → Fix it, test, continue
- NO → Go to Workaround
Workaround (minute 6):
Skip this component's demo
Show screenshots instead
Explain: "Known limitation identified during POC"
Document (after immediate resolution):
What broke
Why it broke
How you fixed it (or worked around)
How to prevent next time

Critical vs. Non-Critical Incidents¶

CRITICAL (Affects demo, must fix): - OpenShift cluster down - OTel/SigNoz not receiving traces - API gateway not working - Kubernetes networking broken

Response: Stop everything, fix immediately (up to 30 min)

NON-CRITICAL (Can work around): - JBoss domain mode not deploying - Trivy dashboard slow - WSO2 SSO only one method (OIDC, not SAML) - One pod evicted but others running

Response: Document, continue demo, note for Phase 2

Compliance Violations & How to Prevent¶

VIOLATION 1: Credentials Committed to Git¶

Prevention: - [ ] .gitignore covers all sensitive files - [ ] Pre-commit hook checks for patterns (optional) - [ ] Code review catches secrets before merge

If it happens: ```bash

DO NOT just delete and commit again (leak still in history)¶

Instead:¶

Rotate the credential immediately
Use git-filter-repo or BFG to remove from history
Force push (⚠️ only if repo not public)
Document incident ```

Better: Use GitHub's "Secret Scanning" to detect leaks

VIOLATION 2: Unencrypted Connections¶

Prevention: - [ ] All external APIs use HTTPS (✅ GitHub, Docker Hub do) - [ ] K8s cluster API uses TLS (✅ OpenShift does by default) - [ ] Database connections encrypted (check WSO2, ClickHouse configs)

Check: ```bash

Verify HTTPS¶

curl https://api.github.com # Should work curl http://api.github.com # Should fail or redirect

Verify K8s TLS¶

kubectl config view | grep certificate-authority

Should show cert path¶

```

VIOLATION 3: Unencrypted Secrets in Kubernetes¶

Prevention: - [ ] Use Kubernetes Secrets (encrypted at rest) - [ ] DO NOT put secrets in ConfigMaps - [ ] DO NOT put secrets in Pod definitions

Right way: ```yaml apiVersion: v1 kind: Secret metadata: name: my-secret type: Opaque stringData: password: "my-password"

apiVersion: v1 kind: Pod metadata: name: my-pod spec: containers: - name: app env: - name: DB_PASSWORD valueFrom: secretKeyRef: name: my-secret key: password ```

Wrong way ❌: yaml env: - name: DB_PASSWORD value: "my-password" # ❌ EXPOSED IN POD DEFINITION

VIOLATION 4: Inadequate Logging¶

Prevention: - [ ] All components log to stdout (captured by K8s) - [ ] Logs forwarded to centralized location (OTel pipeline) - [ ] Logs retained for 7+ days - [ ] Audit logging enabled

Check: ```bash

View logs¶

kubectl logs -n openshift-kube-apiserver -l component=kube-apiserver

Should see audit events: "Who did what, when"¶

```

VIOLATION 5: No Access Control¶

Prevention: - [ ] RBAC configured (who can do what in cluster) - [ ] Network policies in place (which pods can talk) - [ ] API gateway requires authentication

Check: ```bash

Verify RBAC¶

kubectl get rolebindings -A

Verify network policies¶

kubectl get networkpolicies -A ```

BRAC Bank Compliance Requirements¶

What BRAC likely cares about:

PCI-DSS Compliance
Scope: Cluster stores no real financial data
Proof: Compliance Operator scan report
Status: ✅ Scanned, findings addressed
Data Privacy
Scope: Demo uses synthetic data only
Proof: Sample data policy documented
Status: ✅ No real customer data
Access Control
Scope: Cluster access limited to team
Proof: RBAC/Network policies
Status: ✅ Configured
Logging & Audit
Scope: All actions logged
Proof: OTel logging pipeline
Status: ✅ Operational
Incident Response
Scope: Documented process
Proof: This guide + incident log
Status: ✅ Ready

What BRAC does NOT require: - ❌ Production-grade backups - ❌ Multi-region disaster recovery - ❌ Pentesting - ❌ Formal security audit - ❌ SOC 2 certification

Checklist: Security Hardening¶

Before Day 1: - [ ] .gitignore covers all sensitive files - [ ] Pull secret is valid and up-to-date - [ ] Team trained on "never commit secrets" - [ ] Environment variables setup for credentials - [ ] Incident response plan documented (this guide)

Days 1-6: - [ ] Daily: Check for credential commits (pre-commit hook or code review) - [ ] Daily: Monitor cluster access logs - [ ] Day 2: PCI-DSS compliance scan completed - [ ] Day 2: ACS policy tested and enforcing - [ ] Day 5: Review all logs for unauthorized access - [ ] Day 6: Document any security findings

After Day 6: - [ ] Rotate all temporary credentials - [ ] Delete demo environment - [ ] Collect security findings for Phase 2 - [ ] Lessons learned on security

Escalation: Security Incident¶

If you discover a security issue:

Immediately (stop work):
Isolate the problem (don't propagate further)
Don't panic, don't fix without thinking
Assess (5 minutes):
Severity: Critical / High / Medium / Low
Impact: Data leaked? Access compromised? Demo affected?
Fixable? (in < 1 hour? > 1 hour?)
Notify:
Project Lead: "Security issue found: [description]"
If critical: Escalate to BRAC Bank
Fix:
Critical: Fix immediately (up to 4 hours)
High: Fix if time allows, document as limitation
Medium/Low: Document for Phase 2
Document:
What was the issue?
How did it happen?
How was it fixed?
What's the fix for Phase 2?

Example Security Incident Log¶

``` INCIDENT REPORT - POC DAY 2

Date: 2026-04-XX, 2pm Severity: HIGH Status: RESOLVED

ISSUE: OpenShift pull-secret found in Git history (Accidentally committed in early version)

ROOT CAUSE: Missing .gitignore entry for pull-secret.txt

IMPACT: - Secret exposed in public/semi-public repo (if public) - Attack window: ~2 hours before discovered

IMMEDIATE ACTION: 1. Revoked old pull-secret 2. Generated new pull-secret 3. Used BFG to remove from history 4. Force-pushed (only 1 commit, safe)

VERIFICATION: - Ran: git log --all --grep="secret" → CLEAN - Ran: grep -r "pull-secret" * → NO MATCHES - Tested: New pull-secret works ✅

PREVENTION: - Added to .gitignore: pull-secret - Added to .gitignore: .tfvars - Added to .gitignore: kubeconfig* - Code review checklist: "Any secrets?" ✅

LESSON LEARNED: Set up .gitignore BEFORE first commit (Not after incident)

Reported by: [Name] Reviewed by: [Project Lead] Approved: ✅ ```

Security Guide Created: 2026-04-24
Status: Ready to deploy
Owner: Project Lead + Security Lead
Next Step: Brief team on security rules before Day 1