Skip to content

Observability Demo Applications

Three workloads that drive the observability stack (logs, metrics, traces). Together they produce realistic telemetry so BRAC can see the whole pipeline end-to-end on Day 6.


What we deploy

Demo Purpose Why it matters
OpenTelemetry Demo (Astronomy Shop) Canonical multi-language microservice reference with built-in traffic generator Traces across Java/Go/.NET/Python/Ruby/JS services; rich enough to show real-world distributed-tracing patterns
Bookinfo (Istio sample) 4-service canonical mesh app Simplest demo for canary routing + service-mesh patterns (if we enable mesh later)
brac-poc-demo-app (custom) Banking-flavored traffic + error generator Shows BRAC-themed telemetry: loan-approval latencies, payment flows, "account not found" error spikes

All three run on spoke-dc (and spoke-dr via ApplicationSet), deployed via GitOps.


1. OpenTelemetry Demo (Astronomy Shop)

Source: github.com/open-telemetry/opentelemetry-demo

What it is: a reference e-commerce app with ~15 services in 10+ languages. Simulates customers browsing and buying astronomy products. Has its own load generator, feature flags for injecting failures, and full OTel instrumentation.

Deployment: - Pull the upstream Helm chart, vendor the rendered manifests into openshift-platform-gitops/components/workloads/otel-demo/ - Version-pin to a specific chart version (e.g. 0.36.x — verify current latest) - Configure to send traces to our OTel Collector (hub-managed Tempo + spoke-local collector) - Enable the built-in load generator + feature-flag service

What to show on demo day: - In SigNoz: end-to-end trace of "purchase" flow spanning frontend → cart → payment → shipping - In Tempo (OCP Observe console): same trace, Red Hat-native view - In Loki: correlated logs for the same request ID - In Prometheus / COO: RED metrics (Rate, Errors, Duration) per service - Inject a latency failure via feature flag → watch alerts + traces highlight the slow service

Resource footprint: ~2 GB RAM + ~2 CPU across the 15 services. Fits easily on spoke-dc.


2. Bookinfo (Istio sample)

Source: github.com/istio/istio/tree/master/samples/bookinfo

What it is: 4 services — productpage, details, reviews (v1/v2/v3), ratings. The reviews service has 3 versions: v1 (no stars), v2 (black stars), v3 (red stars) — perfect for canary/A-B routing.

Deployment: - Vendor the manifests into openshift-platform-gitops/components/workloads/bookinfo/ - Deploy productpage as a Deployment + Service + OCP Route - Deploy reviews v1/v2/v3 with traffic-split (10% v2, 10% v3, 80% v1 via ingress or service-mesh rules)

What to show: - Hit productpage repeatedly — see distribution across v1/v2/v3 reviews - In SigNoz/Tempo traces: which version served which request - Shift traffic weights via Git commit (change YAML → ArgoCD syncs → new routing) — demonstrates GitOps-driven canary

Optional: enable OpenShift Service Mesh (Red Hat's Istio) later; Bookinfo is the canonical mesh demo. Out of POC scope unless specifically needed; mesh adds substantial complexity.


3. brac-poc-demo-app — custom traffic generator

Source: we write it. Lives in a new repo brac-poc-demo-app on GitLab CE.

Purpose: tell BRAC's story. A small app that mimics banking flows so the observability shows domain-relevant signal, not just "astronomy shop".

Shape

  • Language: Go (small, fast, easy OTel instrumentation, fits on modest compute)
  • Single HTTP server + in-process workers
  • ~4 "logical services" simulated via code (not separate pods):
  • customer-apiGET /customer/:id
  • loan-servicePOST /loan/apply, GET /loan/:id
  • payment-servicePOST /payment
  • reporting-serviceGET /report

What it emits

Signal Content
Logs (stdout JSON) Structured: {ts, level, correlation_id, service, msg, amount, customer_id, ...} — ingested by Loki + forwarded to Splunk
Traces (OTLP) One trace per incoming request, spans for internal "services"; includes simulated DB calls (fake pg_query spans)
Metrics (Prometheus /metrics) requests_total{endpoint}, request_duration_seconds{endpoint}, loan_amount_sum, payment_failures_total
Synthetic errors Configurable failure rate per endpoint (e.g., 2% payment-service → 5xx) — generates error rate visibility
Traffic shape Configurable RPS, with diurnal pattern (higher during "business hours" simulated by minute-of-hour)

How it runs

  • Single Go binary, ~500 LOC
  • Deployed as a Deployment with 2-3 replicas on spoke-dc (mirrored to spoke-dr via ApplicationSet)
  • Uses OTel SDK for traces + metrics; logs to stdout
  • Built via GitLab CI pipeline → image pushed to Nexus → ArgoCD pulls from Nexus
  • A separate "client" Deployment calls it continuously (basic curl loop or a hey/vegeta-style loadgen)

Demo value

  • BRAC sees banking-named metrics in Prometheus dashboards
  • SigNoz traces show realistic flows: "Customer 1234 applied for a loan → credit check → approved → notification"
  • Deliberate error spike demo: flip a feature flag, watch error rate climb in SigNoz, OCP Observe console, and Splunk Enterprise Free dashboard
  • Logs in Loki + Splunk show correlated request IDs across services

Repo layout

brac-poc-demo-app/ ├── main.go ├── handlers/ │ ├── customer.go │ ├── loan.go │ ├── payment.go │ └── reporting.go ├── instrumentation/ │ └── otel.go # one-time OTel setup ├── Dockerfile ├── .gitlab-ci.yml # build + push image to Nexus ├── k8s/ │ ├── deployment.yaml │ ├── service.yaml │ └── servicemonitor.yaml └── README.md

Pipeline: commit → CI builds → pushes to Nexus → webhook to ArgoCD → redeploy on spoke-dc/dr.

Version

Self-versioned (we own it). Semver v0.1.0 on Day-1 release. No channel, it's our code.


Integration with the observability stacks

All three demos feed both observability stacks (per Decision #031):

Signal Red Hat native (on OCP) SigNoz (VM tier) Splunk (PCI forwarder)
Logs Logging + Loki + OCP console Log viewer via OTel → ClickHouse via Kafka Via OCP Logging ClusterLogForwarder → Splunk HEC
Traces Tempo + OCP console Tracing viewer Via OTel → ClickHouse (Splunk doesn't handle traces)
Metrics Prometheus + COO + Grafana (OCP built-in) Via OTel → ClickHouse (Splunk doesn't handle metrics)

This is why we have both stacks: OCP admins use the Red Hat UIs (familiar, supported, integrated); developers use SigNoz (more feature-rich APM); security uses Splunk (compliance-mandated).


Demo day script (~5 min for observability section)

  1. Hit brac-poc-demo-app endpoint a few times from a browser → shows live traffic
  2. Open OCP Observe → Traces → drill into one trace → show the call chain
  3. Open SigNoz → Traces → same request via correlation_id → show the graph view
  4. Open Loki → filter by that correlation_id → show all logs
  5. Open Splunk dashboard → show same logs are also there for PCI audit trail
  6. Flip a failure flag via n8n webhook → watch error rate spike in SigNoz in real-time
  7. Show the ACM Observability UI on hub-dc → consolidated metrics across all managed clusters

Version pins (to confirm at install)

  • OpenTelemetry Demo: Helm chart 0.36.x (verify latest at install)
  • Bookinfo: release-1.27 tag from istio repo (or current)
  • brac-poc-demo-app: we build; pin base image to registry.redhat.io/ubi9/ubi-minimal:9.6 + Go toolchain via Nexus mirror

Created: 2026-04-24 · Owner: Platform Lead · Decision: #033