fact @community/fact-test-pyramid-ratios

Test Pyramid Ratios

Common observed test-suite distributions across mature codebases: (1) Mike Cohn's classical pyramid: 70% unit / 20% integration / 10% E2E, by test count. (2) Kent C.…

Skill: @community
Domain: testing
Version: 1.0.0
Quality: 4.0
Edges: 6 out · 6 in
Tokens: 358/885/1546

$ prime install @community/fact-test-pyramid-ratios

Projection

3 levels · agent picks one per query

Always in _index.xml · the agent never has to ask for this.

TestPyramidRatios [fact] v1.0.0

Empirically observed test-suite ratios that produce fast, reliable CI: 70/20/10 (unit/integration/E2E) is the canonical pyramid; the Testing Trophy variant (5/30/55/10 = static/unit/integration/E2E) better fits frontend code; Google's published data shows their backends average ~80% unit, ~15% integration, ~5% E2E.

Common observed test-suite distributions across mature codebases: (1) Mike Cohn's classical pyramid: 70% unit / 20% integration / 10% E2E, by test count. (2) Kent C. Dodds' Testing Trophy (frontend): 5% static analysis / 30% unit / 55% integration / 10% E2E — emphasizes integration-style tests with React Testing Library + msw because most frontend bugs are at component boundaries. (3) Google's published distribution (2017 internal blog): ~80% unit / ~15% integration / ~5% E2E for backend services. (4) Microsoft Research data on flake rates: gating CI suites with E2E > 20% have 5-10x higher flake rates than pyramid-shaped suites. Execution time ratios are even more skewed: unit tests typically 10ms each, integration 1-3s, E2E 10-30s — a pyramid by test-count usually produces a strict pyramid by execution-time too. Coverage targets that are achievable AND meaningful: 80-90% line coverage on unit-test-friendly code (pure logic), 60-80% on glue code, intentionally not 100% (the last 20% targets defensive code paths, error handlers, and dead branches).

Loaded when retrieval picks the atom as adjacent / supporting.

TestPyramidRatios [fact] v1.0.0

Common observed test-suite distributions across mature codebases: (1) Mike Cohn's classical pyramid: 70% unit / 20% integration / 10% E2E, by test count. (2) Kent C. Dodds' Testing Trophy (frontend): 5% static analysis / 30% unit / 55% integration / 10% E2E — emphasizes integration-style tests with React Testing Library + msw because most frontend bugs are at component boundaries. (3) Google's published distribution (2017 internal blog): ~80% unit / ~15% integration / ~5% E2E for backend services. (4) Microsoft Research data on flake rates: gating CI suites with E2E > 20% have 5-10x higher flake rates than pyramid-shaped suites. Execution time ratios are even more skewed: unit tests typically 10ms each, integration 1-3s, E2E 10-30s — a pyramid by test-count usually produces a strict pyramid by execution-time too. Coverage targets that are achievable AND meaningful: 80-90% line coverage on unit-test-friendly code (pure logic), 60-80% on glue code, intentionally not 100% (the last 20% targets defensive code paths, error handlers, and dead branches).

Confidence

medium

Applies To

Setting initial test-strategy targets for new projects
Auditing existing test suites for shape pathology (ice cream cone, cupcake)
Capacity planning CI infrastructure (parallelism, timeout budgets)
Code-coverage targets per repo / package

Quantitative

Classical Pyramid: 70% unit / 20% integration / 10% E2E by test count
Testing Trophy Frontend: 5% static / 30% unit / 55% integration / 10% E2E
Google Backend Typical: ~80% unit / ~15% integration / ~5% E2E
Execution Time Targets: unit < 50ms each; integration < 5s each; E2E < 60s each
Flake Rate Target: gating CI < 0.5% (1 flake per 200 runs); above 1% blocks merges until fixed
Coverage Targets: 80-90% line coverage on pure-logic packages; 60-80% on integration code; 100% target is usually a smell
Ice Cream Cone Symptoms: E2E > 30% of test count → CI > 30min, flake > 5%, engineers stop running locally
Cupcake Symptoms: high coverage, mostly mocks, tests break on refactor — see @community/anti-pattern-test-the-mock

Counter Conditions

Frontend codebases benefit from MORE integration testing than backend — see Testing Trophy. UI bugs concentrate at component boundaries (state, props, accessibility), not in pure logic.
Pure-functional cores (compilers, interpreters, parsers, financial-math libraries) often run 95%+ unit tests because the surface area is unit-friendly. Pyramid still applies, just heavily compressed at the base.
Microservice architectures use contract tests (Pact, Spring Cloud Contract) as a 4th tier between unit and integration — these are not E2E.
Coverage is a leading indicator only — high coverage with mock-heavy tests = low real confidence (see @community/anti-pattern-test-the-mock and @community/anti-pattern-vanity-metrics).
Cypress / Playwright modern E2E is faster (5-15s/test) than Selenium-era tests (60s/test) — the 10% E2E budget can stretch in modern toolchains.
Teams should measure their distribution explicitly via tagging or directory structure; gut-feel ratios are usually wrong by 2x.

Loaded when retrieval picks the atom as a focal / direct hit.

TestPyramidRatios [fact] v1.0.0

Common observed test-suite distributions across mature codebases: (1) Mike Cohn's classical pyramid: 70% unit / 20% integration / 10% E2E, by test count. (2) Kent C. Dodds' Testing Trophy (frontend): 5% static analysis / 30% unit / 55% integration / 10% E2E — emphasizes integration-style tests with React Testing Library + msw because most frontend bugs are at component boundaries. (3) Google's published distribution (2017 internal blog): ~80% unit / ~15% integration / ~5% E2E for backend services. (4) Microsoft Research data on flake rates: gating CI suites with E2E > 20% have 5-10x higher flake rates than pyramid-shaped suites. Execution time ratios are even more skewed: unit tests typically 10ms each, integration 1-3s, E2E 10-30s — a pyramid by test-count usually produces a strict pyramid by execution-time too. Coverage targets that are achievable AND meaningful: 80-90% line coverage on unit-test-friendly code (pure logic), 60-80% on glue code, intentionally not 100% (the last 20% targets defensive code paths, error handlers, and dead branches).

Confidence

medium

Applies To

Setting initial test-strategy targets for new projects
Auditing existing test suites for shape pathology (ice cream cone, cupcake)
Capacity planning CI infrastructure (parallelism, timeout budgets)
Code-coverage targets per repo / package

Quantitative

Classical Pyramid: 70% unit / 20% integration / 10% E2E by test count
Testing Trophy Frontend: 5% static / 30% unit / 55% integration / 10% E2E
Google Backend Typical: ~80% unit / ~15% integration / ~5% E2E
Execution Time Targets: unit < 50ms each; integration < 5s each; E2E < 60s each
Flake Rate Target: gating CI < 0.5% (1 flake per 200 runs); above 1% blocks merges until fixed
Coverage Targets: 80-90% line coverage on pure-logic packages; 60-80% on integration code; 100% target is usually a smell
Ice Cream Cone Symptoms: E2E > 30% of test count → CI > 30min, flake > 5%, engineers stop running locally
Cupcake Symptoms: high coverage, mostly mocks, tests break on refactor — see @community/anti-pattern-test-the-mock

Counter Conditions

Frontend codebases benefit from MORE integration testing than backend — see Testing Trophy. UI bugs concentrate at component boundaries (state, props, accessibility), not in pure logic.
Pure-functional cores (compilers, interpreters, parsers, financial-math libraries) often run 95%+ unit tests because the surface area is unit-friendly. Pyramid still applies, just heavily compressed at the base.
Microservice architectures use contract tests (Pact, Spring Cloud Contract) as a 4th tier between unit and integration — these are not E2E.
Coverage is a leading indicator only — high coverage with mock-heavy tests = low real confidence (see @community/anti-pattern-test-the-mock and @community/anti-pattern-vanity-metrics).
Cypress / Playwright modern E2E is faster (5-15s/test) than Selenium-era tests (60s/test) — the 10% E2E budget can stretch in modern toolchains.
Teams should measure their distribution explicitly via tagging or directory structure; gut-feel ratios are usually wrong by 2x.

Sources

Confidence

medium

Source

Cohn, Mike — 'Succeeding with Agile' (Addison-Wesley 2009) — original pyramid
Mike Wacker, Google Testing Blog — 'Just Say No to More End-to-End Tests' (April 2015)
Kent C. Dodds — 'Write Tests. Not Too Many. Mostly Integration.' (2016) — Testing Trophy
Memon et al., 'Taming Google-scale Continuous Testing' (ICSE-SEIP 2017)
Luo et al., 'An Empirical Analysis of Flaky Tests' (FSE 2014)
Spadini et al., 'Mock Objects for Testing Java Systems' (EMSE 2017)

Applies To

Setting initial test-strategy targets for new projects
Auditing existing test suites for shape pathology (ice cream cone, cupcake)
Capacity planning CI infrastructure (parallelism, timeout budgets)
Code-coverage targets per repo / package

Quantitative

Classical Pyramid: 70% unit / 20% integration / 10% E2E by test count
Testing Trophy Frontend: 5% static / 30% unit / 55% integration / 10% E2E
Google Backend Typical: ~80% unit / ~15% integration / ~5% E2E
Execution Time Targets: unit < 50ms each; integration < 5s each; E2E < 60s each
Flake Rate Target: gating CI < 0.5% (1 flake per 200 runs); above 1% blocks merges until fixed
Coverage Targets: 80-90% line coverage on pure-logic packages; 60-80% on integration code; 100% target is usually a smell
Ice Cream Cone Symptoms: E2E > 30% of test count → CI > 30min, flake > 5%, engineers stop running locally
Cupcake Symptoms: high coverage, mostly mocks, tests break on refactor — see @community/anti-pattern-test-the-mock

Counter Conditions

Frontend codebases benefit from MORE integration testing than backend — see Testing Trophy. UI bugs concentrate at component boundaries (state, props, accessibility), not in pure logic.
Pure-functional cores (compilers, interpreters, parsers, financial-math libraries) often run 95%+ unit tests because the surface area is unit-friendly. Pyramid still applies, just heavily compressed at the base.
Microservice architectures use contract tests (Pact, Spring Cloud Contract) as a 4th tier between unit and integration — these are not E2E.
Coverage is a leading indicator only — high coverage with mock-heavy tests = low real confidence (see @community/anti-pattern-test-the-mock and @community/anti-pattern-vanity-metrics).
Cypress / Playwright modern E2E is faster (5-15s/test) than Selenium-era tests (60s/test) — the 10% E2E budget can stretch in modern toolchains.
Teams should measure their distribution explicitly via tagging or directory structure; gut-feel ratios are usually wrong by 2x.

Derived From

@community/principle-test-pyramid

Source

where the compiled artifact came from

prime-system/examples/frontend-design/primes/compiled/@community/fact-test-pyramid-ratios/atom.yaml

Compiled at 2026-05-07