The real reason you have 29 dev and test environments

Tom Akehurst

CTO and Co-founder

August 19, 2024

Table of Contents

This is some text inside of a div block.

One company I spoke to recently told me they manage 18 separate dev and test environments. This would have surprised me more if I had not recently spoken to another one, which had 29 environments. The latter of these are paying more for their dev/test environments than production. These numbers might seem extraordinary but I’ve seen and heard of many similar cases.

Anyone with even passing familiarity of how this works will know that creating, updating, maintaining, and managing data within this many environments is costly and labor-intensive. Each environment can add many $1000s to a monthly cloud bill and keeping them healthy can require the full-time effort of a dedicated ops team.

Why do software engineering organizations end up in this situation?

The root cause: over reliance on end-to-end testing

The root cause (or at least a root cause) is that these organizations rely heavily or exclusively on end-to-end testing for their QA processes. In end-to-end testing, tests are run in an environment that is usually a scaled-down replica of production, and where each service or component is a copy of the real thing. When we test a feature of service A which depends on services B and C, it will make calls out to real running copies of these. In turn these may call out to services D and E and so on. This “fan out” effect often means every service must be available for comprehensive testing to be possible.

When does it become a problem? When we’re talking about large organizations with many simultaneous streams of work in flight. In these cases, e2e testing in environments shared by many teams becomes untenable beyond a certain scale for reasons such as:

Different version combinations required per team.
Unstable software is deployed, destabilizing the environment for everyone using it.
Different teams have conflicting data requirements.
One team’s state changes pollute or invalidate another’s.
The need for exclusive access to an environment during load or reliability testing.

A consequence of these issues is that many teams (or groupings with similar needs) will demand to have a dedicated environment created for them consisting of the specific software versions they specify, their own test data and no destabilizing deployments from other teams.

But while many companies would benefit from having a frank conversation of when e2e testing is actually needed, and the tradeoffs involved, this is also just a symptom of a deeper issue – which goes to the heart of how today’s dev organizations approach testing.

The root root cause - beliefs about testing

If we dig deeper still, the reason many organizations choose an e2e-centric testing approach despite the enormous cost and difficulty is the prevailing belief that “tests performed against anything other than the full, real system can’t be trusted”.

This belief has been repeatedly proven false in some of the most successful software engineering orgs in the world, but it requires some unpacking to demonstrate why this is the case and what the alternative might look like.

Testing in this context is essentially the act of exercising all of the system’s functions from an end-user perspective and confirming that they perform correctly, and while this may seem sensible at face-value, in practice this means that we’re bundling together management of every type of quality risk into a single activity.

How to avoid testing environment sprawl

Naturally, I’m writing this article because I believe there’s a better way. Here’s what I suggest if you don’t want to end up with 15, 20, or 30 environments and all the headaches that come with them.

In a nutshell, there are many cases where you can replace e2e testing with more targeted, isolated tests. Let’s talk about how we identify these scenarios.

In executing a single test case we’re checking that:

Each cooperating service makes correct API calls to others.
Each service’s API accepts correctly formed requests and returns correctly formed responses.
Several individual services are functionally correct.
Infrastructure and networks are configured correctly so that services can communicate with each other.

Only the last of these risks requires a fully integrated environment to uncover. Each other type can be approached via targeted, isolated testing - contract testing to ensure the client and server API usage is correct, and functional testing of individual services against mocks to ensure correct behavior.

Mature, high-performing engineering orgs will often run these isolated tests in a pipeline against every code or configuration change, only deploying to a fully integrated environment after these have passed. There, only a minimal set of e2e tests are run, specifically targeted to the types of risk that can only be detected in the system as a whole.

This way far fewer of these e2e environments are needed as most testing will occur within minimal footprint environments that may only exist for the lifetime of the test run. This substantially reduces infrastructure cost and ops effort.

There are additional benefits to this approach. A common negative side-effect of e2e testing overreliance is that it tends to promote monolithic, high risk, high effort releases - “we’ve tested these service versions together so we must now release them all together”. Isolated contract testing and functional testing against mocks can be used to ensure forwards and backwards compatibility between service versions, meaning that individual services can be released with high confidence. This leads to higher new feature throughput, shorter lead times and less system downtime among other benefits.

Organizations that adopt a more nuanced approach to testing, using isolated tests where possible and reserving end-to-end testing for specific risks, can avoid the headaches of environment sprawl and deliver better software faster – while also keeping their number of environments at more manageable levels.

‍