Testing Under the Hood Or Behind the Wheel

In a room full of architects, no two will agree about their exact job description, as the joke goes. In a similar vein, someone in our team had a refreshing solution to another persistent bone of contention: how do you define an integration test? Don’t try to reconcile differing opinions and just stop using the term altogether. I liked it very much. Rather than think of unit and integration tests, it’s more helpful to distinguish between tests that validate code versus ones that validate public interfaces and recognize that within each category the scope varies for each individual test.

Here’s the first confusion that the word ‘integration’ throws up. Is it the integrated parts we target, or their points of integration? The first approach checks that parts cooperate and fulfill (part of) the implementation of some business functionality, for example submitting a new order and making sure that it is validated and saved. The latter category is only about checking contracts. What exact messages do we expect component A to receive and how should it react? Older readers may be reminded of WSDL and XML Schema. Not popular anymore, but still around in many enterprises.

I find it more helpful to discuss test scope in the context of what parts we choose to ignore, versus what we cannot ignore and must somehow stub out. It’s only the simplest of applications where just an end-to-end test will suffice. For any large system, we must use scripted interactions executed against the constituent parts, both in isolation and in cooperation. You can try to draw hard lines between categories of such tests (remember the test pyramid), but they lie on a continuum. The smaller their focus, the more detailed and exhaustive the coverage, and the more of the outside world we must take for granted. The focus of any given test is therefore determined by how many dependencies we must bring under explicit control through mocks.

Only Fake It if You Must

This brings me to mocks, stubs, test doubles, and sandboxes; in short, the tools for faking it. Why fake parts of our system just for the sake of testing? Stability, speed, and convenience, that’s why. 3rd party systems, say a payment provider, can be slow or hard to set up with test accounts. They may even charge you for each usage. Databases with a production-like schema are another good example. Data should be in a predictable state and fast to access. But the benefits of mocking should always offset their cost in terms of the extra lines of code to write and maintain. If there are little gains in either speed of execution, ease of coding, or stability, you had better use the real thing instead of a mock.

Orthodox fans of unit testing disagree with that last statement. They argue that mocking out code dependencies within the same executable is required as a matter of principle. The unit should always be a single source file, a class, or a standalone function. I think that’s neither practical nor reasonable. Do you mock outcalls to the very stable Apache StringUtils library? Please don’t. What about file access? Well, it’s a good rule of thumb to mock out connections to the network and file system, but if it’s only a small configuration file, why mock it? Mocks lead to tests that are tightly coupled to implementation details, and this makes it hard to make changes to production code that leave the public API intact. And that’s the whole idea behind refactoring!

Under the Hood or Behind the Wheel

Popular libraries like Mockito and Wiremock are very different tools. The first lets you fake behavior at the class level through byte code manipulation, the other is a fully functional HTTP server with configurable responses. Functionally, they each emulate parts of the system under test for the benefit of speed and reliability. Their difference in the applicability is analogous to testing under the hood versus testing behind the wheel. Mockito helps you write white-box tests that target classes and methods. Wiremock or Karate come to your aid if you target a component’s public interface and must use test doubles for some or all of its (networked) dependencies.

I prefer to define the dichotomy as code tests versus interface tests, but the community favors unit and component tests, so let’s stick with that. Bear in mind however that outside the technical arena unit and component are effectively synonymous, like aspect and dimension, or property and feature. And though these types of testing are different to write, from a functional perspective it doesn’t matter much whether the faked behavior flows through a Mockito mock, a Wiremock instance, or an in-memory database. What varies is the extent of mocking.

The public interface in a component test needn’t even be a REST API or event bus. It can be a web GUI too. In the Cypress testing framework, you can test an Angular or React frontend with all backend calls replaced by hardcoded responses, defined in the Cypress test code. Where GUI tests with a browser are usually used for end-to-end scenarios (with a fully functional backend), that is certainly not a prerequisite.

So, here are my thoughts about unit tests. Feel free to disagree.

  • Unit tests target units of code and they do it explicitly, meaning that the writer of the test has access to and understanding of the source code.
  • A test scenario targets the public interface of a single class, but the scope of the code involved may extend well beyond the single class. Strictly speaking, it already does when you invoke a library method on the String class.
  • Mocking out dependencies adds complexity to the test suites and hampers refactoring. It should only be considered if you cannot use the real thing, like an external networked dependency in a local context.

And about component tests:

  • They have as their point of entry a network API (REST, SOAP, event-based) or a graphical web/mobile interface.
  • They typically rely on server runtime. The test is not executing lines of code in its own thread but is a client to a separately running system.
  • The scope of a component is usually a single deployable artifact, but if it’s more practicable to deploy it alongside some dependencies, don’t be held back on a matter of principle.
  • You should only mock out dependencies that act as clients to networked components outside your control. If you mock anything that is managed by the component’s runtime under test, you no longer have a component test, but a unit test.