The testing pyramid is an outdated economic model

Tom Akehurst

CTO and Co-founder

January 20, 2025

Table of Contents

This is some text inside of a div block.

The testing pyramid is one of the most famous concepts in agile. But how relevant is it for developers today?

Mike Cohn first proposed a concept he called the “test automation pyramid” in 2004 before formalizing it in his 2009 book Succeeding With Agile. It sets out a three-stage process, with broad, basic unit tests at the bottom covering individual code functions or components, which are fast, cheap, and easily automated.

In the traditional model, the next stage involves integration testing to verify interactions between subsystems and assess whether application modules work together as required. These tests are slower and more complex than those at the foundation of the pyramid, so are carried out less often.

Finally, the pinnacle of the structure covers E2E (end-to-end) testing that validates the entire application and simulates user scenarios to ensure the system behaves correctly in real-world workflows. These are expensive and take time to get right. They sit at the top of the pyramid because they are not enacted as often as the other stages.

The shape of the pyramid emphasizes the need for devs to carry out many fast, low-cost unit tests, fewer integration tests, and an even smaller number of rigorous integration and end-to-end tests. Now this familiar edifice is crumbling.

Why it’s time to rethink the test pyramid

Fundamentally, the test pyramid is an outdated economic model. Each stage is a tradeoff between cost and the information that can potentially be uncovered. This simple framework hides hidden complexity because the cost of each test includes a variety of metrics such as the effort to write and maintain code, how long it takes to run, the time taken to identify root-cause failure, and its resilience to refactoring without causing further problems.

The pyramid is also an artifact of the era in which it was created. Computers were slower, testing and debugging tools were rudimentary, and developer infrastructure was bloated, cumbersome, and inefficient. This made narrow, isolated unit tests the most cost-effective and practical approach for ensuring code quality.

Since then, significant progress in both technology and development practices has transformed testing in three key ways:

1) It’s now possible to run a wide range of tests on an application very quickly through its public interface, enabling a broader scope of testing without excessive time or resource constraints.

2) Improved test frameworks have brought down the cost and effort required to write robust, maintainable integration tests, offering accessible, scalable ways of validating the interplay between components.

3) The development of sophisticated debugging tools and enhanced observability platforms has made it much easier to identify the root causes of failures. This improvement reduces the reliance on narrowly focused unit tests.

The expanding middle (i.e. the test trophy / diamond / vase)

pyramid. At WireMock, our test codebases are more like vases than pyramids - fattest in the middle. (When I posted about this on LinkedIn, several commenters were quick to highlight that this has been referred to in the past as a trophy or diamond, which also seems appropriate. The term was originally coined by Kent C. Dodds.)

Unit tests still sit at the bottom and are used for complex logic or large example sets. But the majority of tests are of the entire service, via its API, with dependencies mocked outside the process. In the spirit of eating our own dogfood, we will often use WireMock itself to create these mocks.

By decoupling our tests from external systems we can ensure they run quickly, don’t depend on 3rd party data that may have changed, and that spurious test failures don’t occur due to external factors we don’t control.

Meanwhile, testing through the public API and exercising all of the code in a very similar manner to production usage, we maximise our chances of finding a wide variety of problems while also being able to perform large refactors safely (we could in principle swap out the entire web framework with the protection of these tests).

What are the practical implications?

The new look of testing in 2025 isn’t necessarily a target or even a heuristic. Other projects will have a different shape because testing isn’t a one-size-fits-all endeavor. But the future direction of travel is clear: the pyramids of yore will be confined to the history books. And these changing economics are actually great news for developers: Improvements in mocking tools, debuggers, application frameworks and raw compute performance enables testing strategies that return far more on the effort invested than was possible when the test pyramid was conceived.

>> Connect with Tom on LinkedIn

>> Learn more about WIreMock Cloud

‍