Revisiting Unit Testing and Mocking in Python

My previous blog post, Python Mocking 101: Fake It Before You Make It, discussed the basic mechanics of mocking and unit testing in Python. This post covers some higher-level software engineering principles demonstrated in my experience with Python testing over the past year and half. In particular, I want to revisit the idea of patching mock objects in unit tests.

 

Patching External Clients

 

Clients in this post refer to any objects that create side effects, such as disk or network I/O. Consider a class, CloudCreator, that receives messages over HTTP, generates some side effects by creating cloud infrastructure, and sends messages over HTTP in response:

<code class="language-python"><span class="hljs-keyword">import</span> http_client<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CloudCreator</span>:</span>    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self)</span>:</span>        self.network_client = http_client.HTTPClient()</code>

We can test CloudCreator as follows:

<code class="language-python"><span class="hljs-keyword">import</span> unittest<span class="hljs-keyword">import</span> http_client<span class="hljs-keyword">from</span> unittest.mock <span class="hljs-keyword">import</span> MagicMock, patch<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">TestCloudCreator</span><span class="hljs-params">(unittest.TestCase)</span>:</span><span class="hljs-decorator"> @patch('http_client.HTTPClient')</span>    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">setUp</span><span class="hljs-params">(self, mock_http_client_call)</span>:</span>        self.mock_http_client = MagicMock(autospec=http_client.HTTPClient)        <span class="hljs-comment"># Recall that patch patches the _initialization_ call of classes</span>        mock_http_client_call.return_value = self.mock_http_client        self.cloud_creator = CloudCreator()</code>

patch has given us the ability to test our CloudCreator class without creating any network side effects. However, this design has some flaws. If CloudCreator uses a lot of external clients, we need to stack a lot of patch calls. Furthermore, CloudCreator and its unit tests are strongly dependent on HTTPClient, which makes changing the network client difficult.

 

Fugue engineer Josh Einhorn notes some other disadvantages of patch:

 

  • Using it means there are implicit dependencies somewhere in the class -- another developer wouldn't ever know this. Constructor args make dependencies explicit.
  • When refactoring underlying implementations, using patch will require updating multiple unrelated unit tests, and it is not always clear which unit tests will require changes due to patch's use of hard-coded strings rather than more strongly "typed" references (which can be caught by linters/IDEs).
  • Using patch is a code smell because it means that the class under test has been coupled to one or more other concrete classes.
  • Code that is written that depends on patch to unit test is not portable to other languages. Statically typed languages with compilers don't allow monkey patching (without some serious work). A refactor of the structure would be required to properly unit test such a class in another language.

 

In general, if testing a class requires a lot of patching of external clients, it's a sign that a refactor is needed. Experienced software engineers will see that this example is a prime opportunity for dependency inversion.

 

Dependency Inversion and Injection

 

Dependency inversion, and specifically dependency injection in this case, are well-worn subjects in software engineering circles; for the uninitiated, dependency injection is the idea that a class or function should be given any external clients it depends on, rather than creating them itself. This allows code to operate in multiple contexts, depending on what clients it is given.

 

In our example, CloudCreator's core functionality of creating cloud infrastructure is not dependent on any particular means of sending and receiving messages. Therefore, it's logical to write the class in such a way that network I/O is handled by a client that is injected at runtime, rather than hard-coded (this code uses Python's type-hinting syntax):

<code class="language-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CloudCreator</span>:</span>    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, network_client: NetworkClient)</span>:</span>        self.network_client = network_client</code>

This encapsulation allows the class to be used with an HTTP client, a TCP/IP client, a ZMQ client, or an SQS/SNS client, so long as the network I/O conforms to a pre-specified interface defined in NetworkClient, such as NetworkClient.recv() and NetworkClient.send(data). This is a great simplification of the process, and astute observers will note that the specification of the client interface becomes of paramount importance, but that's another blog post.

 

One of the primary advantages of dependency injection is that it allows the developer to easily pass in mock objects when unit testing. Now we can set up our tests like this:

<code class="language-python"><span class="hljs-keyword">import</span> unittest<span class="hljs-keyword">from</span> unittest.mock <span class="hljs-keyword">import</span> MagicMock<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">TestCloudCreator</span><span class="hljs-params">(unittest.TestCase)</span>:</span>    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">setUp</span><span class="hljs-params">(self)</span>:</span>        self.mock_network_client = MagicMock(autospec=NetworkClient)        self.cloud_creator = CloudCreator(self.mock_network_client)</code>

We create a mock network client for unit testing, using the autospec argument of MagicMock to create a mock object that adheres to the NetworkClient interface.

 

In this simple example, patch allowed us to paper over an inflexible design by creating an even more inflexible unit test. As I noted earlier, if you're patching more than a few calls, it's a sign that you should refactor. Note that patch is still useful, for example, to patch calls to time.time() or other side-effect-free library calls.

 

More Dependency Injection

 

Dependency injection is useful, but what should we do when our class needs a large number of clients? We can add more parameters and inject them all separately:

<code class="language-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CloudCreator</span>:</span>    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, network_client=None: NetworkClient, authz_client=None: AuthzClient, accts_client=None: AccountsClient, log_writer=None: LogWriter, health_check=None: HealthCheckClient, metrics=None: MetricsWriter, database_client=None: DatabaseClient)</span>:</span>        self.network_client = network_client        self.authz_client = authz_client        self.accts_client = accts_client        self.log_writer = log_writer        self.health_check = health_check        self.metrics = metrics        self.database_client = database_client</code>

By adding default values, it's possible to selectively initialize clients, which makes unit testing easier (more on this later). However, this form is still unwieldy, especially when doing integration testing. Consider what happens if we make a change to our message handler and, subsequently, want to test that it communicates correctly with the server. Initializing a CloudCreator takes a lot of tedious work creating and initializing client objects. One of Python's strengths is its interactive interpreter, which enables an iterative development process, and preserving the ability to easily use the REPL makes developers' lives easier. Requiring developers to create and initialize a dozen client objects before they can test a small change to the core class creates frustration.

 

One solution is to encapsulate the client creation in a separate object. We can even encode information about order of operation and dependencies in the constructor:

<code class="language-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CloudCreatorServices</span>:</span>    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">()</span>:</span>        self.network_client = HTTPClient()        self.database_client = SQLClient()                self.authz_client = AuthzClient(self.database_client)        self.accts_client = AcctsClient(self.database_client)        self.log_writer = LogWriter()        self.health_check = HealthCheck()        self.metrics = Metrics()<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span><span class="hljs-params">()</span>:</span>    services = CloudCreatorServices()    cc = CloudCreator(services.network_client, services.authz_client, services.accts_client, services.log_writer, services.health_check, services.metrics, services.database_client)</code>

Note that CloudCreator is still initialized with explicit references to the services it requires. This makes it easy for future developers to understand which services CloudCreator requires. It's possible to make the argument for a design where CloudCreator's constructor only expects a CloudCreatorServices object:

<code class="language-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CloudCreator</span>:</span>    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, services: CloudCreatorServices)</span>:</span>        self.services = services</code>

However, this ties CloudCreator to a specific implementation of CloudCreatorServices, with exactly the services that CloudCreator requires. If CloudCreatorServices is generalized to create services for multiple classes, the caller must assume that every single service is required by any class that uses the generalized Service class.

 

Unfortunately, this naïve implementation loses the flexibility of raw dependency injection. One step forward, one step backward. This is easily remedied:

<code class="language-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CloudCreatorServices</span>:</span>    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, network_client: NetworkClient, authz_client: AuthzClient, accts_client: AccountsClient, log_writer: LogWriter, health_check: HealthCheckClient, metrics: MetricsWriter, database_client: DatabaseClient)</span>:</span>        self.network_client = network_client        self.authz_client = authz_client        self.accts_client = accts_client        self.log_writer = log_writer        self.health_check = health_check        self.metrics = metrics        self.database_client = database_client</code>

At this point, all we've done is move the complexity around. The developer is still responsible for initializing all the clients. We haven't provided any tools to make their life easier. We can do that by building complex initialization procedures into CloudCreatorServices:

<code class="language-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CloudCreatorServices</span>:</span>    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, network_client=None: NetworkClient, authz_client=None: AuthzClient, accts_client=None: AccountsClient, log_writer=None: LogWriter, health_check=None: HealthCheckClient, metrics=None: MetricsWriter, database_client=None: DatabaseClient)</span>:</span>        self.database_client = database_client <span class="hljs-keyword">or</span> self._get_database_client()        self.network_client = network_client <span class="hljs-keyword">or</span> self._get_network_client()        self.authz_client = authz_client <span class="hljs-keyword">or</span> self._get_authz_client(self.database_client)        self.accts_client = accts_client <span class="hljs-keyword">or</span> self._get_accts_client(self.database_client)        self.log_writer = log_writer <span class="hljs-keyword">or</span> self._get_log_writer()        self.health_check = health_check <span class="hljs-keyword">or</span> self._get_health_check()        self.metrics = metrics <span class="hljs-keyword">or</span> self._get_metrics()</code>

Now we've hidden client creation in these initialization methods. This seems like a good solution, but if we look more closely, there is a downside. When CloudCreatorServices is initialized, we create every client, even if we know we won't be using it. What do we do if one of our client services is misbehaving and times out, but we still want to test other functionality? Is there room for even more flexiblity?

 

We can use getter methods to change the order of the initialization invariant:

<code class="language-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CloudCreatorServices</span>:</span>    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, network_client=None: NetworkClient, authz_client=None: AuthzClient, accts_client=None: AccountsClient, log_writer=None: LogWriter, health_check=None: HealthCheckClient, metrics=None: MetricsWriter, database_client=None: DatabaseClient)</span>:</span>        self._database_client = database_client         self._network_client = network_client         self._authz_client = authz_client        self._accts_client = accts_client         self._log_writer = log_writer         self._health_check = health_check        self._metrics = metrics    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_database_client</span><span class="hljs-params">(self)</span> -&gt; <span class="hljs-title">DatabaseClient</span>:</span>        <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> self._database_client:            self._database_client = DatabaseClient()        <span class="hljs-keyword">return</span> self._database_client    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_authz_client</span><span class="hljs-params">(self)</span> -&gt; <span class="hljs-title">AuthzClient</span>:</span>        <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> self._authz_client:            self.authz_client = AuthzClient(self.get_database_client())        <span class="hljs-keyword">return</span> self.authz_client    ...</code>

This solution gives us lazy loading, so clients are only initialized as needed while maintaining the ability to swap out clients as needed. However, getter methods are not very Pythonic. Is there a language feature we could exploit to find a more Pythonic way?

 

@property

 

The feature we're looking for is @property:

<code class="language-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CloudCreatorServices</span>:</span>    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, network_client=None: NetworkClient, authz_client=None: AuthzClient, accts_client=None: AccountsClient, log_writer=None: LogWriter, health_check=None: HealthCheckClient, metrics=None: MetricsWriter, database_client=None: DatabaseClient)</span>:</span>        self._database_client = database_client         self._network_client = network_client         self._authz_client = authz_client        self._accts_client = accts_client         self._log_writer = log_writer         self._health_check = health_check        self._metrics = metrics<span class="hljs-decorator"> @property</span>    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">database_client</span><span class="hljs-params">(self)</span> -&gt; <span class="hljs-title">DatabaseClient</span>:</span>        <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> self._database_client:            self._database_client = DatabaseClient()        <span class="hljs-keyword">return</span> self._database_client    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">authz_client</span><span class="hljs-params">(self)</span> -&gt; <span class="hljs-title">AuthzClient</span>:</span>        <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> self._authz_client:            self.authz_client = AuthzClient(self.database_client)        <span class="hljs-keyword">return</span> self.authz_client        ...</code>

This looks like our previous solution with getter methods, but we've dropped the get_ and we've added the @property decorator. @property turns a getter method into a property. A property can be accessed directly, like CloudCreatorServices.database_client, without parentheses. Furthermore, using @property gives us the option to add a setter in the future, by decorating the setter function for a property with @<property_name>.setter, for example:

<code class="language-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CloudCreatorServices</span>:</span>    ...<span class="hljs-decorator"> @property</span>    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">database_client</span><span class="hljs-params">(self)</span> -&gt; <span class="hljs-title">DatabaseClient</span>:</span>        <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> self._database_client:            self._database_client = DatabaseClient()        <span class="hljs-keyword">return</span> self._database_client<span class="hljs-decorator"> @database_client.setter</span>    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_set_database_client</span><span class="hljs-params">(self, database_client: DatabaseClient)</span>:</span>        self.database_client = database_client</code>

The setter will be transparently called when we assign a value to database_client:

<code class="language-python">services = CloudCreatorServices()<span class="hljs-comment"># calls _set_database_client</span>services.database_client = DatabaseClient()</code>

Using @property preserves the Python standard of accessing instance attributes directly, while giving us the flexibility of wrapping attribute access in getters and setters.

 

Mocking With Dependency Injection

 

One of the advantages of dependency inversion is that it makes unit testing much simpler. Recall that CloudCreator's initialization arguments have default values, which allows us to selectively mock client service objects for specific tests:

<code class="language-python"><span class="hljs-keyword">import</span> unittest<span class="hljs-keyword">from</span> unittest.mock <span class="hljs-keyword">import</span> MagicMock<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">TestCloudCreator</span><span class="hljs-params">(unittest.TestCase)</span>:</span>    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_network_write_method</span><span class="hljs-params">(self)</span>:</span>        self.mock_network_client = MagicMock(autospec=NetworkClient)        self.cloud_creator = CloudCreator(network_client=self.mock_network_client)        ...</code>

Since other client services are not initialized, it's easy to tell if the network code path touches objects outside of its scope, which is usually a sign that something's not right. Of course, comprehensive unit tests would require mocks for every service, but those are easy to add.

 

By using dependency inversion, we've eliminated the need for patching in our unit tests, while giving developers a powerful, time-saving tool for integration testing.

 

Secure Your Cloud

Find security and compliance violations in your cloud infrastructure and ensure they never happen again.