principle @community/principle-three-pillars-observability

Three Pillars Observability

Metrics tell you what is happening at scale (RED, USE, four-golden-signals); logs tell you what happened in a specific event (errors, audit, debug); traces tell you why a request was slow or failed across service boundar…

Skill: @community
Domain: ops-observability
Version: 1.0.0
Quality: 4.0
Edges: 5 out · 7 in
Tokens: 228/545/912

$ prime install @community/principle-three-pillars-observability

Projection

3 levels · agent picks one per query

Always in _index.xml · the agent never has to ask for this.

ThreePillarsObservability [principle] v1.0.0

A production system must emit three complementary signal types — metrics (numeric time-series, aggregable), logs (timestamped event records, contextual), and traces (causal request chains across services) — because each answers questions the others cannot.

Metrics tell you what is happening at scale (RED, USE, four-golden-signals); logs tell you what happened in a specific event (errors, audit, debug); traces tell you why a request was slow or failed across service boundaries (causal path + per-span timing). Eliminate any one and a class of question becomes answerable only by guessing. Modern practice unifies emission via OpenTelemetry — single SDK, common attributes (service.name, trace_id, span_id), then routed to three backends (Prometheus/Mimir, Loki/ELK, Tempo/Jaeger) or one unified backend (Datadog, Honeycomb, Grafana).

Loaded when retrieval picks the atom as adjacent / supporting.

ThreePillarsObservability [principle] v1.0.0

Metrics tell you what is happening at scale (RED, USE, four-golden-signals); logs tell you what happened in a specific event (errors, audit, debug); traces tell you why a request was slow or failed across service boundaries (causal path + per-span timing). Eliminate any one and a class of question becomes answerable only by guessing. Modern practice unifies emission via OpenTelemetry — single SDK, common attributes (service.name, trace_id, span_id), then routed to three backends (Prometheus/Mimir, Loki/ELK, Tempo/Jaeger) or one unified backend (Datadog, Honeycomb, Grafana).

Attributed To

Cindy Sridharan, 'Distributed Systems Observability' (O'Reilly 2018) — popularized the 'three pillars' framing; OpenTelemetry project (2019, merger of OpenTracing + OpenCensus).

Applies To

Every microservice or production application
Cloud-managed services (RDS, ElastiCache, Kafka MSK) — infra metrics + slow-query logs + cross-service traces
Mobile and web clients — RUM (real-user monitoring) extends traces to client-side; logs go to Sentry / Honeybadger
Batch jobs, ETL pipelines — emit metrics on rows processed, logs on row-level errors, traces on stage timings
Edge functions (Cloudflare Workers, Lambda@Edge) — same pillars apply; OpenTelemetry has runtimes for both

Counter Examples

Logs-only observability: every incident requires grep -r ERROR | wc -l — works for small services, breaks at >5 services or >100 RPS.
Metrics-only: dashboard shows error rate spike at 14:32 — but no way to see the specific failing requests without traces or logs.
Traces-only: per-request detail is rich, but no aggregate view of 'how many requests crossed our SLO this hour'.
Three siloed teams (logs team, metrics team, traces team) without correlation IDs — each tool tells half the story; on-call assembles the rest by hand.

Loaded when retrieval picks the atom as a focal / direct hit.

ThreePillarsObservability [principle] v1.0.0

Metrics tell you what is happening at scale (RED, USE, four-golden-signals); logs tell you what happened in a specific event (errors, audit, debug); traces tell you why a request was slow or failed across service boundaries (causal path + per-span timing). Eliminate any one and a class of question becomes answerable only by guessing. Modern practice unifies emission via OpenTelemetry — single SDK, common attributes (service.name, trace_id, span_id), then routed to three backends (Prometheus/Mimir, Loki/ELK, Tempo/Jaeger) or one unified backend (Datadog, Honeycomb, Grafana).

Attributed To

Cindy Sridharan, 'Distributed Systems Observability' (O'Reilly 2018) — popularized the 'three pillars' framing; OpenTelemetry project (2019, merger of OpenTracing + OpenCensus).

Applies To

Every microservice or production application
Cloud-managed services (RDS, ElastiCache, Kafka MSK) — infra metrics + slow-query logs + cross-service traces
Mobile and web clients — RUM (real-user monitoring) extends traces to client-side; logs go to Sentry / Honeybadger
Batch jobs, ETL pipelines — emit metrics on rows processed, logs on row-level errors, traces on stage timings
Edge functions (Cloudflare Workers, Lambda@Edge) — same pillars apply; OpenTelemetry has runtimes for both

Counter Examples

Logs-only observability: every incident requires grep -r ERROR | wc -l — works for small services, breaks at >5 services or >100 RPS.
Metrics-only: dashboard shows error rate spike at 14:32 — but no way to see the specific failing requests without traces or logs.
Traces-only: per-request detail is rich, but no aggregate view of 'how many requests crossed our SLO this hour'.
Three siloed teams (logs team, metrics team, traces team) without correlation IDs — each tool tells half the story; on-call assembles the rest by hand.

Sources

Examples

OpenTelemetry SDK in a Node service: @opentelemetry/auto-instrumentations-node instruments HTTP, DB, queue clients automatically; emits OTLP to a Collector; Collector fan-outs to Prometheus, Loki, Tempo.
Datadog APM: traces auto-link to logs via trace_id and span_id injected by the agent; metrics, logs, and traces share dashboards.
Honeycomb / wide-event observability: instead of separate signals, every event is a high-cardinality structured log that can be aggregated into metrics and stitched into traces — alternative to the three-pillar split.
Grafana stack: Prometheus (metrics) + Loki (logs) + Tempo (traces) + Pyroscope (profiles); Grafana UI correlates by trace_id.

Relations

requires: @community/rule-slo-required-for-prod

Source

Cindy Sridharan, 'Distributed Systems Observability' (O'Reilly Free Report, 2018) — 'logs, metrics, and traces are referred to as the three pillars of observability'
OpenTelemetry specification — https://opentelemetry.io/docs/specs/otel/ — unified data model for the three signals
Google SRE Book Chapter 6 — 'Monitoring Distributed Systems' — defines four golden signals (latency, traffic, errors, saturation)
Brendan Gregg, 'USE Method' (utilization, saturation, errors) — resource-oriented metrics taxonomy
Tom Wilkie 'RED Method' (rate, errors, duration) — service-oriented metrics for request-driven services

Requires

@community/rule-slo-required-for-prod

Source

where the compiled artifact came from

prime-system/examples/frontend-design/primes/compiled/@community/principle-three-pillars-observability/atom.yaml

Compiled at 2026-05-07