Security & Compliance Guide¶
Covers credentials management, incident response, data privacy, and compliance violations during the POC.
Credentials & Secrets Management¶
The Problem¶
6-day intense work + multiple people + multiple systems = HIGH RISK of credential leaks
Typical Mistakes: - ❌ Hardcoding credentials in Terraform - ❌ Committing Docker Hub token to Git - ❌ Sharing credentials in Slack/email - ❌ OpenShift pull-secret stored in repo - ❌ Admin passwords in logs - ❌ Database credentials in manifests
Cost of leak: Cluster compromised, data stolen, cleanup takes days
The Rules¶
RULE 1: Never Commit Secrets to Git¶
What gets committed: - ✅ Terraform code (but NOT values) - ✅ Kubernetes manifests (but NOT secrets) - ✅ Docker compose (but NOT credentials)
What NEVER gets committed:
- ❌ .tfvars files with values
- ❌ kubeconfig with tokens
- ❌ Docker credentials
- ❌ Database passwords
- ❌ API keys, SSH keys, certs
- ❌ Pull secrets (OpenShift, Docker Hub)
How to handle: ```bash
WRONG ❌¶
terraform.tfvars # Contains all secrets git add terraform.tfvars git commit -m "config"
RIGHT ✅¶
terraform.tfvars.example # Template, no real values git add terraform.tfvars.example
Add to .gitignore:¶
terraform.tfvars .key .pem kubeconfig ```
Check before every commit: ```bash git diff --cached | grep -i "password|token|secret|key"
If it shows secrets → DON'T COMMIT¶
git reset HEAD filename ```
RULE 2: Use Environment Variables or Secret Management¶
Option A: Environment Variables (simplest for POC) ```bash
.env.local (NOT committed to Git)¶
export OPENSHIFT_API_TOKEN="xxxx" export DOCKER_USERNAME="user" export DOCKER_TOKEN="xxxx" export DB_PASSWORD="xxxx"
In Terraform:¶
variable "openshift_token" { sensitive = true }
Access via: var.openshift_token¶
```
Option B: Secrets Manager (better for production) - Store in: AWS Secrets Manager, Vault, Kubernetes Secrets - Access via: API call at runtime - Rotate frequently
For this POC: Environment variables are fine, but use consistently
RULE 3: Rotate Credentials Frequently¶
Pull Secret: Check expiration date before Day 1 ```bash
If expires in < 7 days: Refresh from Red Hat¶
Download new pull-secret.json¶
Add to .env.local (NOT to Git)¶
```
Admin Passwords: Change after POC ```bash
Admin password for OpenShift? Change it Day 6 evening¶
Admin password for SigNoz? Change it Day 6 evening¶
Don't leave defaults¶
```
Git/Docker Tokens: Create new ones just for this POC ```bash
Create new GitHub token (repo read/write only)¶
Create new Docker Hub token (push/pull images only)¶
Delete after POC¶
```
Access Control & Data Privacy¶
Who Can Access What?¶
Golden Rule: Least privilege - Infrastructure Lead: OpenShift cluster access (needs it) - Platform Lead: OTel/Kafka/Redis (needs it) - Project Lead: Everything (oversight) - BRAC Bank: Read-only access to GitHub, demo environment only
Do NOT give: - ❌ BRAC Bank access to cluster credentials - ❌ Team members access they don't need - ❌ Permanent API keys (use session tokens instead)
Data in Transit¶
Data generated during POC: - [ ] Sample app logs (no real data, OK) - [ ] Traces through OTel (no PII, OK) - [ ] Kubernetes event logs (OK) - [ ] Git commits (code only, OK)
Do NOT include: - ❌ Real BRAC Bank data (credit cards, customer info, account numbers) - ❌ Real user data in sample apps - ❌ Real credentials in demo
If demo needs "realistic" data: - Use fake/synthetic data - Example: "Customer ID 123" (not real account) - Example: "$100 transaction" (example amount) - Mark clearly as "SAMPLE DATA"
Data at Rest¶
Logs & Volumes: - [ ] ClickHouse data: POC only, will be deleted Day 6 - [ ] Kafka messages: Test data, will be deleted - [ ] Cluster storage: Ephemeral, deleted when cluster destroyed
Backups: - Do NOT backup this POC cluster - No snapshot persistence beyond Day 6
Security Scanning & Compliance¶
PCI-DSS Compliance (Required)¶
What BRAC requires: - Compliance Operator scan against PCI-DSS baseline - Report generated showing: - ✅ Passed checks - ⚠️ Failed checks (with remediation plan) - ℹ️ Manual review items
Acceptable findings: - ✅ 0 Critical findings - ✅ 0-3 High findings (with plan to fix) - ✅ Any number of Medium/Low (document for Phase 2)
Not acceptable: - ❌ Unaddressed Critical findings - ❌ No plan for High findings - ❌ No compliance report at all
Timeline: - Day 2: Run compliance scan (30 min) - Day 2: Review report (30 min) - Day 2-3: Fix any Critical findings (2-4 hours) - Day 6: Final report ready
If compliance fails:
1. Identify failure: "PCI-DSS check X failed"
2. Understand it: "This means [security implication]"
3. Fix it: "We're implementing [solution]"
4. Verify fix: "Scan again to confirm pass"
5. If unfixable: "This is a Phase 2 item, POC limitation"
6. Document: "Compliance report shows [what was done]"
Image Security (ACS Policy)¶
Requirement: Block deployment of images with "Critical" vulnerabilities
How to test: ```bash
Create intentionally vulnerable image¶
docker pull library/alpine:3.8 # This has known CVEs
Try to deploy it¶
kubectl apply -f pod-with-cve.yaml
Expected: Pod rejected with ACS policy violation¶
```
If policy not enforcing: ```bash
Check ACS status¶
kubectl get deployment -n stackrox
Check policy¶
kubectl get clustersecuritypolicies
If broken: Fix or document as limitation¶
```
Incident Response¶
What If Something Breaks?¶
Incident = "Component not working, affecting demo"
Response Protocol (15 minutes or less):
- Immediate (minute 1):
- Pause demo/work
- Take screenshot of error
-
Document what triggered the incident
-
Diagnose (minutes 1-5):
- Check logs:
kubectl logs -f [pod] - Check events:
kubectl describe pod [pod] - Check cluster health:
kubectl top nodes -
Restart component if applicable:
kubectl rollout restart deployment/[name] -
Assess (minute 5):
-
Can you fix it in < 15 min?
- YES → Fix it, test, continue
- NO → Go to Workaround
-
Workaround (minute 6):
- Skip this component's demo
- Show screenshots instead
-
Explain: "Known limitation identified during POC"
-
Document (after immediate resolution):
- What broke
- Why it broke
- How you fixed it (or worked around)
- How to prevent next time
Critical vs. Non-Critical Incidents¶
CRITICAL (Affects demo, must fix): - OpenShift cluster down - OTel/SigNoz not receiving traces - API gateway not working - Kubernetes networking broken
Response: Stop everything, fix immediately (up to 30 min)
NON-CRITICAL (Can work around): - JBoss domain mode not deploying - Trivy dashboard slow - WSO2 SSO only one method (OIDC, not SAML) - One pod evicted but others running
Response: Document, continue demo, note for Phase 2
Compliance Violations & How to Prevent¶
VIOLATION 1: Credentials Committed to Git¶
Prevention: - [ ] .gitignore covers all sensitive files - [ ] Pre-commit hook checks for patterns (optional) - [ ] Code review catches secrets before merge
If it happens: ```bash
DO NOT just delete and commit again (leak still in history)¶
Instead:¶
- Rotate the credential immediately
- Use git-filter-repo or BFG to remove from history
- Force push (⚠️ only if repo not public)
- Document incident ```
Better: Use GitHub's "Secret Scanning" to detect leaks
VIOLATION 2: Unencrypted Connections¶
Prevention: - [ ] All external APIs use HTTPS (✅ GitHub, Docker Hub do) - [ ] K8s cluster API uses TLS (✅ OpenShift does by default) - [ ] Database connections encrypted (check WSO2, ClickHouse configs)
Check: ```bash
Verify HTTPS¶
curl https://api.github.com # Should work curl http://api.github.com # Should fail or redirect
Verify K8s TLS¶
kubectl config view | grep certificate-authority
Should show cert path¶
```
VIOLATION 3: Unencrypted Secrets in Kubernetes¶
Prevention: - [ ] Use Kubernetes Secrets (encrypted at rest) - [ ] DO NOT put secrets in ConfigMaps - [ ] DO NOT put secrets in Pod definitions
Right way: ```yaml apiVersion: v1 kind: Secret metadata: name: my-secret type: Opaque stringData: password: "my-password"
apiVersion: v1 kind: Pod metadata: name: my-pod spec: containers: - name: app env: - name: DB_PASSWORD valueFrom: secretKeyRef: name: my-secret key: password ```
Wrong way ❌:
yaml
env:
- name: DB_PASSWORD
value: "my-password" # ❌ EXPOSED IN POD DEFINITION
VIOLATION 4: Inadequate Logging¶
Prevention: - [ ] All components log to stdout (captured by K8s) - [ ] Logs forwarded to centralized location (OTel pipeline) - [ ] Logs retained for 7+ days - [ ] Audit logging enabled
Check: ```bash
View logs¶
kubectl logs -n openshift-kube-apiserver -l component=kube-apiserver
Should see audit events: "Who did what, when"¶
```
VIOLATION 5: No Access Control¶
Prevention: - [ ] RBAC configured (who can do what in cluster) - [ ] Network policies in place (which pods can talk) - [ ] API gateway requires authentication
Check: ```bash
Verify RBAC¶
kubectl get rolebindings -A
Verify network policies¶
kubectl get networkpolicies -A ```
BRAC Bank Compliance Requirements¶
What BRAC likely cares about:
- PCI-DSS Compliance
- Scope: Cluster stores no real financial data
- Proof: Compliance Operator scan report
-
Status: ✅ Scanned, findings addressed
-
Data Privacy
- Scope: Demo uses synthetic data only
- Proof: Sample data policy documented
-
Status: ✅ No real customer data
-
Access Control
- Scope: Cluster access limited to team
- Proof: RBAC/Network policies
-
Status: ✅ Configured
-
Logging & Audit
- Scope: All actions logged
- Proof: OTel logging pipeline
-
Status: ✅ Operational
-
Incident Response
- Scope: Documented process
- Proof: This guide + incident log
- Status: ✅ Ready
What BRAC does NOT require: - ❌ Production-grade backups - ❌ Multi-region disaster recovery - ❌ Pentesting - ❌ Formal security audit - ❌ SOC 2 certification
Checklist: Security Hardening¶
Before Day 1: - [ ] .gitignore covers all sensitive files - [ ] Pull secret is valid and up-to-date - [ ] Team trained on "never commit secrets" - [ ] Environment variables setup for credentials - [ ] Incident response plan documented (this guide)
Days 1-6: - [ ] Daily: Check for credential commits (pre-commit hook or code review) - [ ] Daily: Monitor cluster access logs - [ ] Day 2: PCI-DSS compliance scan completed - [ ] Day 2: ACS policy tested and enforcing - [ ] Day 5: Review all logs for unauthorized access - [ ] Day 6: Document any security findings
After Day 6: - [ ] Rotate all temporary credentials - [ ] Delete demo environment - [ ] Collect security findings for Phase 2 - [ ] Lessons learned on security
Escalation: Security Incident¶
If you discover a security issue:
- Immediately (stop work):
- Isolate the problem (don't propagate further)
-
Don't panic, don't fix without thinking
-
Assess (5 minutes):
- Severity: Critical / High / Medium / Low
- Impact: Data leaked? Access compromised? Demo affected?
-
Fixable? (in < 1 hour? > 1 hour?)
-
Notify:
- Project Lead: "Security issue found: [description]"
-
If critical: Escalate to BRAC Bank
-
Fix:
- Critical: Fix immediately (up to 4 hours)
- High: Fix if time allows, document as limitation
-
Medium/Low: Document for Phase 2
-
Document:
- What was the issue?
- How did it happen?
- How was it fixed?
- What's the fix for Phase 2?
Example Security Incident Log¶
``` INCIDENT REPORT - POC DAY 2
Date: 2026-04-XX, 2pm Severity: HIGH Status: RESOLVED
ISSUE: OpenShift pull-secret found in Git history (Accidentally committed in early version)
ROOT CAUSE: Missing .gitignore entry for pull-secret.txt
IMPACT: - Secret exposed in public/semi-public repo (if public) - Attack window: ~2 hours before discovered
IMMEDIATE ACTION: 1. Revoked old pull-secret 2. Generated new pull-secret 3. Used BFG to remove from history 4. Force-pushed (only 1 commit, safe)
VERIFICATION: - Ran: git log --all --grep="secret" → CLEAN - Ran: grep -r "pull-secret" * → NO MATCHES - Tested: New pull-secret works ✅
PREVENTION: - Added to .gitignore: pull-secret - Added to .gitignore: .tfvars - Added to .gitignore: kubeconfig* - Code review checklist: "Any secrets?" ✅
LESSON LEARNED: Set up .gitignore BEFORE first commit (Not after incident)
Reported by: [Name] Reviewed by: [Project Lead] Approved: ✅ ```
Security Guide Created: 2026-04-24
Status: Ready to deploy
Owner: Project Lead + Security Lead
Next Step: Brief team on security rules before Day 1