Skip to content

Security & Compliance Guide

Covers credentials management, incident response, data privacy, and compliance violations during the POC.


Credentials & Secrets Management

The Problem

6-day intense work + multiple people + multiple systems = HIGH RISK of credential leaks

Typical Mistakes: - ❌ Hardcoding credentials in Terraform - ❌ Committing Docker Hub token to Git - ❌ Sharing credentials in Slack/email - ❌ OpenShift pull-secret stored in repo - ❌ Admin passwords in logs - ❌ Database credentials in manifests

Cost of leak: Cluster compromised, data stolen, cleanup takes days

The Rules

RULE 1: Never Commit Secrets to Git

What gets committed: - ✅ Terraform code (but NOT values) - ✅ Kubernetes manifests (but NOT secrets) - ✅ Docker compose (but NOT credentials)

What NEVER gets committed: - ❌ .tfvars files with values - ❌ kubeconfig with tokens - ❌ Docker credentials - ❌ Database passwords - ❌ API keys, SSH keys, certs - ❌ Pull secrets (OpenShift, Docker Hub)

How to handle: ```bash

WRONG ❌

terraform.tfvars # Contains all secrets git add terraform.tfvars git commit -m "config"

RIGHT ✅

terraform.tfvars.example # Template, no real values git add terraform.tfvars.example

Add to .gitignore:

terraform.tfvars .key .pem kubeconfig ```

Check before every commit: ```bash git diff --cached | grep -i "password|token|secret|key"

If it shows secrets → DON'T COMMIT

git reset HEAD filename ```

RULE 2: Use Environment Variables or Secret Management

Option A: Environment Variables (simplest for POC) ```bash

.env.local (NOT committed to Git)

export OPENSHIFT_API_TOKEN="xxxx" export DOCKER_USERNAME="user" export DOCKER_TOKEN="xxxx" export DB_PASSWORD="xxxx"

In Terraform:

variable "openshift_token" { sensitive = true }

Access via: var.openshift_token

```

Option B: Secrets Manager (better for production) - Store in: AWS Secrets Manager, Vault, Kubernetes Secrets - Access via: API call at runtime - Rotate frequently

For this POC: Environment variables are fine, but use consistently

RULE 3: Rotate Credentials Frequently

Pull Secret: Check expiration date before Day 1 ```bash

If expires in < 7 days: Refresh from Red Hat

Download new pull-secret.json

Add to .env.local (NOT to Git)

```

Admin Passwords: Change after POC ```bash

Admin password for OpenShift? Change it Day 6 evening

Admin password for SigNoz? Change it Day 6 evening

Don't leave defaults

```

Git/Docker Tokens: Create new ones just for this POC ```bash

Create new GitHub token (repo read/write only)

Create new Docker Hub token (push/pull images only)

Delete after POC

```


Access Control & Data Privacy

Who Can Access What?

Golden Rule: Least privilege - Infrastructure Lead: OpenShift cluster access (needs it) - Platform Lead: OTel/Kafka/Redis (needs it) - Project Lead: Everything (oversight) - BRAC Bank: Read-only access to GitHub, demo environment only

Do NOT give: - ❌ BRAC Bank access to cluster credentials - ❌ Team members access they don't need - ❌ Permanent API keys (use session tokens instead)

Data in Transit

Data generated during POC: - [ ] Sample app logs (no real data, OK) - [ ] Traces through OTel (no PII, OK) - [ ] Kubernetes event logs (OK) - [ ] Git commits (code only, OK)

Do NOT include: - ❌ Real BRAC Bank data (credit cards, customer info, account numbers) - ❌ Real user data in sample apps - ❌ Real credentials in demo

If demo needs "realistic" data: - Use fake/synthetic data - Example: "Customer ID 123" (not real account) - Example: "$100 transaction" (example amount) - Mark clearly as "SAMPLE DATA"

Data at Rest

Logs & Volumes: - [ ] ClickHouse data: POC only, will be deleted Day 6 - [ ] Kafka messages: Test data, will be deleted - [ ] Cluster storage: Ephemeral, deleted when cluster destroyed

Backups: - Do NOT backup this POC cluster - No snapshot persistence beyond Day 6


Security Scanning & Compliance

PCI-DSS Compliance (Required)

What BRAC requires: - Compliance Operator scan against PCI-DSS baseline - Report generated showing: - ✅ Passed checks - ⚠️ Failed checks (with remediation plan) - ℹ️ Manual review items

Acceptable findings: - ✅ 0 Critical findings - ✅ 0-3 High findings (with plan to fix) - ✅ Any number of Medium/Low (document for Phase 2)

Not acceptable: - ❌ Unaddressed Critical findings - ❌ No plan for High findings - ❌ No compliance report at all

Timeline: - Day 2: Run compliance scan (30 min) - Day 2: Review report (30 min) - Day 2-3: Fix any Critical findings (2-4 hours) - Day 6: Final report ready

If compliance fails: 1. Identify failure: "PCI-DSS check X failed" 2. Understand it: "This means [security implication]" 3. Fix it: "We're implementing [solution]" 4. Verify fix: "Scan again to confirm pass" 5. If unfixable: "This is a Phase 2 item, POC limitation" 6. Document: "Compliance report shows [what was done]"

Image Security (ACS Policy)

Requirement: Block deployment of images with "Critical" vulnerabilities

How to test: ```bash

Create intentionally vulnerable image

docker pull library/alpine:3.8 # This has known CVEs

Try to deploy it

kubectl apply -f pod-with-cve.yaml

Expected: Pod rejected with ACS policy violation

```

If policy not enforcing: ```bash

Check ACS status

kubectl get deployment -n stackrox

Check policy

kubectl get clustersecuritypolicies

If broken: Fix or document as limitation

```


Incident Response

What If Something Breaks?

Incident = "Component not working, affecting demo"

Response Protocol (15 minutes or less):

  1. Immediate (minute 1):
  2. Pause demo/work
  3. Take screenshot of error
  4. Document what triggered the incident

  5. Diagnose (minutes 1-5):

  6. Check logs: kubectl logs -f [pod]
  7. Check events: kubectl describe pod [pod]
  8. Check cluster health: kubectl top nodes
  9. Restart component if applicable: kubectl rollout restart deployment/[name]

  10. Assess (minute 5):

  11. Can you fix it in < 15 min?

    • YES → Fix it, test, continue
    • NO → Go to Workaround
  12. Workaround (minute 6):

  13. Skip this component's demo
  14. Show screenshots instead
  15. Explain: "Known limitation identified during POC"

  16. Document (after immediate resolution):

  17. What broke
  18. Why it broke
  19. How you fixed it (or worked around)
  20. How to prevent next time

Critical vs. Non-Critical Incidents

CRITICAL (Affects demo, must fix): - OpenShift cluster down - OTel/SigNoz not receiving traces - API gateway not working - Kubernetes networking broken

Response: Stop everything, fix immediately (up to 30 min)

NON-CRITICAL (Can work around): - JBoss domain mode not deploying - Trivy dashboard slow - WSO2 SSO only one method (OIDC, not SAML) - One pod evicted but others running

Response: Document, continue demo, note for Phase 2


Compliance Violations & How to Prevent

VIOLATION 1: Credentials Committed to Git

Prevention: - [ ] .gitignore covers all sensitive files - [ ] Pre-commit hook checks for patterns (optional) - [ ] Code review catches secrets before merge

If it happens: ```bash

DO NOT just delete and commit again (leak still in history)

Instead:

  1. Rotate the credential immediately
  2. Use git-filter-repo or BFG to remove from history
  3. Force push (⚠️ only if repo not public)
  4. Document incident ```

Better: Use GitHub's "Secret Scanning" to detect leaks


VIOLATION 2: Unencrypted Connections

Prevention: - [ ] All external APIs use HTTPS (✅ GitHub, Docker Hub do) - [ ] K8s cluster API uses TLS (✅ OpenShift does by default) - [ ] Database connections encrypted (check WSO2, ClickHouse configs)

Check: ```bash

Verify HTTPS

curl https://api.github.com # Should work curl http://api.github.com # Should fail or redirect

Verify K8s TLS

kubectl config view | grep certificate-authority

Should show cert path

```


VIOLATION 3: Unencrypted Secrets in Kubernetes

Prevention: - [ ] Use Kubernetes Secrets (encrypted at rest) - [ ] DO NOT put secrets in ConfigMaps - [ ] DO NOT put secrets in Pod definitions

Right way: ```yaml apiVersion: v1 kind: Secret metadata: name: my-secret type: Opaque stringData: password: "my-password"


apiVersion: v1 kind: Pod metadata: name: my-pod spec: containers: - name: app env: - name: DB_PASSWORD valueFrom: secretKeyRef: name: my-secret key: password ```

Wrong way ❌: yaml env: - name: DB_PASSWORD value: "my-password" # ❌ EXPOSED IN POD DEFINITION


VIOLATION 4: Inadequate Logging

Prevention: - [ ] All components log to stdout (captured by K8s) - [ ] Logs forwarded to centralized location (OTel pipeline) - [ ] Logs retained for 7+ days - [ ] Audit logging enabled

Check: ```bash

View logs

kubectl logs -n openshift-kube-apiserver -l component=kube-apiserver

Should see audit events: "Who did what, when"

```


VIOLATION 5: No Access Control

Prevention: - [ ] RBAC configured (who can do what in cluster) - [ ] Network policies in place (which pods can talk) - [ ] API gateway requires authentication

Check: ```bash

Verify RBAC

kubectl get rolebindings -A

Verify network policies

kubectl get networkpolicies -A ```


BRAC Bank Compliance Requirements

What BRAC likely cares about:

  1. PCI-DSS Compliance
  2. Scope: Cluster stores no real financial data
  3. Proof: Compliance Operator scan report
  4. Status: ✅ Scanned, findings addressed

  5. Data Privacy

  6. Scope: Demo uses synthetic data only
  7. Proof: Sample data policy documented
  8. Status: ✅ No real customer data

  9. Access Control

  10. Scope: Cluster access limited to team
  11. Proof: RBAC/Network policies
  12. Status: ✅ Configured

  13. Logging & Audit

  14. Scope: All actions logged
  15. Proof: OTel logging pipeline
  16. Status: ✅ Operational

  17. Incident Response

  18. Scope: Documented process
  19. Proof: This guide + incident log
  20. Status: ✅ Ready

What BRAC does NOT require: - ❌ Production-grade backups - ❌ Multi-region disaster recovery - ❌ Pentesting - ❌ Formal security audit - ❌ SOC 2 certification


Checklist: Security Hardening

Before Day 1: - [ ] .gitignore covers all sensitive files - [ ] Pull secret is valid and up-to-date - [ ] Team trained on "never commit secrets" - [ ] Environment variables setup for credentials - [ ] Incident response plan documented (this guide)

Days 1-6: - [ ] Daily: Check for credential commits (pre-commit hook or code review) - [ ] Daily: Monitor cluster access logs - [ ] Day 2: PCI-DSS compliance scan completed - [ ] Day 2: ACS policy tested and enforcing - [ ] Day 5: Review all logs for unauthorized access - [ ] Day 6: Document any security findings

After Day 6: - [ ] Rotate all temporary credentials - [ ] Delete demo environment - [ ] Collect security findings for Phase 2 - [ ] Lessons learned on security


Escalation: Security Incident

If you discover a security issue:

  1. Immediately (stop work):
  2. Isolate the problem (don't propagate further)
  3. Don't panic, don't fix without thinking

  4. Assess (5 minutes):

  5. Severity: Critical / High / Medium / Low
  6. Impact: Data leaked? Access compromised? Demo affected?
  7. Fixable? (in < 1 hour? > 1 hour?)

  8. Notify:

  9. Project Lead: "Security issue found: [description]"
  10. If critical: Escalate to BRAC Bank

  11. Fix:

  12. Critical: Fix immediately (up to 4 hours)
  13. High: Fix if time allows, document as limitation
  14. Medium/Low: Document for Phase 2

  15. Document:

  16. What was the issue?
  17. How did it happen?
  18. How was it fixed?
  19. What's the fix for Phase 2?

Example Security Incident Log

``` INCIDENT REPORT - POC DAY 2

Date: 2026-04-XX, 2pm Severity: HIGH Status: RESOLVED

ISSUE: OpenShift pull-secret found in Git history (Accidentally committed in early version)

ROOT CAUSE: Missing .gitignore entry for pull-secret.txt

IMPACT: - Secret exposed in public/semi-public repo (if public) - Attack window: ~2 hours before discovered

IMMEDIATE ACTION: 1. Revoked old pull-secret 2. Generated new pull-secret 3. Used BFG to remove from history 4. Force-pushed (only 1 commit, safe)

VERIFICATION: - Ran: git log --all --grep="secret" → CLEAN - Ran: grep -r "pull-secret" * → NO MATCHES - Tested: New pull-secret works ✅

PREVENTION: - Added to .gitignore: pull-secret - Added to .gitignore: .tfvars - Added to .gitignore: kubeconfig* - Code review checklist: "Any secrets?" ✅

LESSON LEARNED: Set up .gitignore BEFORE first commit (Not after incident)


Reported by: [Name] Reviewed by: [Project Lead] Approved: ✅ ```


Security Guide Created: 2026-04-24
Status: Ready to deploy
Owner: Project Lead + Security Lead
Next Step: Brief team on security rules before Day 1