Observability Demo Applications¶

Three workloads that drive the observability stack (logs, metrics, traces). Together they produce realistic telemetry so BRAC can see the whole pipeline end-to-end on Day 6.

What we deploy¶

Demo	Purpose	Why it matters
OpenTelemetry Demo (Astronomy Shop)	Canonical multi-language microservice reference with built-in traffic generator	Traces across Java/Go/.NET/Python/Ruby/JS services; rich enough to show real-world distributed-tracing patterns
Bookinfo (Istio sample)	4-service canonical mesh app	Simplest demo for canary routing + service-mesh patterns (if we enable mesh later)
`brac-poc-demo-app` (custom)	Banking-flavored traffic + error generator	Shows BRAC-themed telemetry: loan-approval latencies, payment flows, "account not found" error spikes

All three run on spoke-dc (and spoke-dr via ApplicationSet), deployed via GitOps.

1. OpenTelemetry Demo (Astronomy Shop)¶

Source: github.com/open-telemetry/opentelemetry-demo

What it is: a reference e-commerce app with ~15 services in 10+ languages. Simulates customers browsing and buying astronomy products. Has its own load generator, feature flags for injecting failures, and full OTel instrumentation.

Deployment: - Pull the upstream Helm chart, vendor the rendered manifests into openshift-platform-gitops/components/workloads/otel-demo/ - Version-pin to a specific chart version (e.g. 0.36.x — verify current latest) - Configure to send traces to our OTel Collector (hub-managed Tempo + spoke-local collector) - Enable the built-in load generator + feature-flag service

What to show on demo day: - In SigNoz: end-to-end trace of "purchase" flow spanning frontend → cart → payment → shipping - In Tempo (OCP Observe console): same trace, Red Hat-native view - In Loki: correlated logs for the same request ID - In Prometheus / COO: RED metrics (Rate, Errors, Duration) per service - Inject a latency failure via feature flag → watch alerts + traces highlight the slow service

Resource footprint: ~2 GB RAM + ~2 CPU across the 15 services. Fits easily on spoke-dc.

2. Bookinfo (Istio sample)¶

Source: github.com/istio/istio/tree/master/samples/bookinfo

What it is: 4 services — productpage, details, reviews (v1/v2/v3), ratings. The reviews service has 3 versions: v1 (no stars), v2 (black stars), v3 (red stars) — perfect for canary/A-B routing.

Deployment: - Vendor the manifests into openshift-platform-gitops/components/workloads/bookinfo/ - Deploy productpage as a Deployment + Service + OCP Route - Deploy reviews v1/v2/v3 with traffic-split (10% v2, 10% v3, 80% v1 via ingress or service-mesh rules)

What to show: - Hit productpage repeatedly — see distribution across v1/v2/v3 reviews - In SigNoz/Tempo traces: which version served which request - Shift traffic weights via Git commit (change YAML → ArgoCD syncs → new routing) — demonstrates GitOps-driven canary

Optional: enable OpenShift Service Mesh (Red Hat's Istio) later; Bookinfo is the canonical mesh demo. Out of POC scope unless specifically needed; mesh adds substantial complexity.

3. `brac-poc-demo-app` — custom traffic generator¶

Source: we write it. Lives in a new repo brac-poc-demo-app on GitLab CE.

Purpose: tell BRAC's story. A small app that mimics banking flows so the observability shows domain-relevant signal, not just "astronomy shop".

Shape¶

Language: Go (small, fast, easy OTel instrumentation, fits on modest compute)
Single HTTP server + in-process workers
~4 "logical services" simulated via code (not separate pods):
customer-api — GET /customer/:id
loan-service — POST /loan/apply, GET /loan/:id
payment-service — POST /payment
reporting-service — GET /report

What it emits¶

Signal	Content
Logs (stdout JSON)	Structured: `{ts, level, correlation_id, service, msg, amount, customer_id, ...}` — ingested by Loki + forwarded to Splunk
Traces (OTLP)	One trace per incoming request, spans for internal "services"; includes simulated DB calls (fake `pg_query` spans)
Metrics (Prometheus `/metrics`)	`requests_total{endpoint}`, `request_duration_seconds{endpoint}`, `loan_amount_sum`, `payment_failures_total`
Synthetic errors	Configurable failure rate per endpoint (e.g., 2% `payment-service` → 5xx) — generates error rate visibility
Traffic shape	Configurable RPS, with diurnal pattern (higher during "business hours" simulated by minute-of-hour)

How it runs¶

Single Go binary, ~500 LOC
Deployed as a Deployment with 2-3 replicas on spoke-dc (mirrored to spoke-dr via ApplicationSet)
Uses OTel SDK for traces + metrics; logs to stdout
Built via GitLab CI pipeline → image pushed to Nexus → ArgoCD pulls from Nexus
A separate "client" Deployment calls it continuously (basic curl loop or a hey/vegeta-style loadgen)

Demo value¶

BRAC sees banking-named metrics in Prometheus dashboards
SigNoz traces show realistic flows: "Customer 1234 applied for a loan → credit check → approved → notification"
Deliberate error spike demo: flip a feature flag, watch error rate climb in SigNoz, OCP Observe console, and Splunk Enterprise Free dashboard
Logs in Loki + Splunk show correlated request IDs across services

Repo layout¶

brac-poc-demo-app/ ├── main.go ├── handlers/ │ ├── customer.go │ ├── loan.go │ ├── payment.go │ └── reporting.go ├── instrumentation/ │ └── otel.go # one-time OTel setup ├── Dockerfile ├── .gitlab-ci.yml # build + push image to Nexus ├── k8s/ │ ├── deployment.yaml │ ├── service.yaml │ └── servicemonitor.yaml └── README.md

Pipeline: commit → CI builds → pushes to Nexus → webhook to ArgoCD → redeploy on spoke-dc/dr.

Version¶

Self-versioned (we own it). Semver v0.1.0 on Day-1 release. No channel, it's our code.

Integration with the observability stacks¶

All three demos feed both observability stacks (per Decision #031):

Signal	Red Hat native (on OCP)	SigNoz (VM tier)	Splunk (PCI forwarder)
Logs	Logging + Loki + OCP console Log viewer	via OTel → ClickHouse via Kafka	Via OCP Logging `ClusterLogForwarder` → Splunk HEC
Traces	Tempo + OCP console Tracing viewer	Via OTel → ClickHouse	(Splunk doesn't handle traces)
Metrics	Prometheus + COO + Grafana (OCP built-in)	Via OTel → ClickHouse	(Splunk doesn't handle metrics)

This is why we have both stacks: OCP admins use the Red Hat UIs (familiar, supported, integrated); developers use SigNoz (more feature-rich APM); security uses Splunk (compliance-mandated).

Demo day script (~5 min for observability section)¶

Hit brac-poc-demo-app endpoint a few times from a browser → shows live traffic
Open OCP Observe → Traces → drill into one trace → show the call chain
Open SigNoz → Traces → same request via correlation_id → show the graph view
Open Loki → filter by that correlation_id → show all logs
Open Splunk dashboard → show same logs are also there for PCI audit trail
Flip a failure flag via n8n webhook → watch error rate spike in SigNoz in real-time
Show the ACM Observability UI on hub-dc → consolidated metrics across all managed clusters

Version pins (to confirm at install)¶

OpenTelemetry Demo: Helm chart 0.36.x (verify latest at install)
Bookinfo: release-1.27 tag from istio repo (or current)
brac-poc-demo-app: we build; pin base image to registry.redhat.io/ubi9/ubi-minimal:9.6 + Go toolchain via Nexus mirror

Created: 2026-04-24 · Owner: Platform Lead · Decision: #033