Testing on Kotlin Multiplatform and a Strategy to Speed Up Development Time (2023 Update)

This is the new and improved version of this article, the previous one with the is still available here.

The main focus of Kotlin Multiplatform (KMP) is to avoid duplicating domain logic on different platforms. You write it once and reuse it on different targets.

If the shared code is broken, then all its platforms will work incorrectly and in my opinion, the best way to ensure that something works correctly is to write a tests covering all edge cases.

💡
Because the KMP code base is the heart of multiple platforms, it's important that it contains as few bugs as possible. 

In this article, I'll share my experience on writing tests for Kotlin Multiplatform, along with my strategy for speeding up development using tests. At FootballCo we use this strategy for our app, and we see that it helps our development cycle.

Even though the article focuses on Kotlin Multiplatform, a lot of the principles can also be applied to plain Kotlin applications or any other type of applications for that matter. Before starting, let me ask this question:

Why bother with testing at all?

Some engineers see tests as a waste of time, let me give some examples to change their mind.

  • Tests provide a fast feedback loop, after a couple of seconds you know if something works, or it doesn't. Verification through the UI requires building the whole app, navigating to the correct screen and performing the action. Which you can image, takes a lot more time.
  • It's easier to catch edge cases looking at the code, a lot of the times the UI might not reflect all possible cases that might happen.
  • Sometimes it's hard to set-up the app in the correct state for testing (e.g. network timeout, receiving a socket). It might be possible, but setting up the correct state in tests is much faster and easier.
  • A well written test suite is a safety net before the app is released. With a good CI set-up, regressions / bugs don't even reach the main branch because they are caught on PRs.
  • Tests are built in documentation, which needs to reflect the actual implementation of the app. If it isn't updated, then the tests will fail.

The Kotlin Multiplatform testing ecosystem

Testing Framework

Compared to the JVM, the Kotlin Multiplatform ecosystem is still relatively young. JUnit can only be used on JVM platforms, other platforms depend on the Kotlin standard library testing framework.

An alternative way for testing Kotlin Multiplatform code would be to use a different testing framework like Kotest. I don't have much experience using it, however I found it to be less reliable than writing tests using the standard testing framework (kotlin.test). For example: I was unable to run a singular test case (a function) through the IDE.

The standard testing library does lack some cool JUnit 5 features like parameterized tests or nesting, however it is possible to add them with some additional boilerplate:

Kotlin Multiplatform Parameterized Tests and Grouping Using The Standard Kotlin Testing Framework
Keeping Kotlin Multiplatform tests clean while using the standard kotlin.test framework

Assertions

Kotest also has a great assertion library in addition to the testing framework, which works flawlessly and can be used alongside the Kotlin standard library testing framework. Another library is Atrium, however it doesn't support Kotlin / Native.

Mocking

For a long time, Kotlin Multiplatform did not have a mocking framework, things seem to have changed because Mockk now supports Kotlin Multiplatform. However, there still might be issues on the Kotlin / Native side. An alternative to using a mocking framework is writing the mock or any other test double by hand, which will be explained in more detail in the next section.

Mocking is prevalent in tests that treat a single class as a unit, so let's touch on what can be defined as a unit before diving into the Kotlin Multiplatform testing strategy.

💬
The mock keyword is overloaded and misused in a lot of places. I won't get into the reasons why, but if you're interested in learning more about it, take a look at the article below
Mocks Aren’t Stubs
Explaining the difference between Mock Objects and Stubs (together with other forms of Test Double). Also the difference between classical and mockist styles of unit testing.

Definition of a Unit

One unit, one class

On Android, a unit is usually considered one class where all of its dependencies are mocked, probably using a framework like Mockito or Mockk. These frameworks are really easy to use, however they can be, easily abused, which leads to brittle tests that are coupled to the implementation details of the system under test. The upside is that, these types of unit tests are the easiest to write and read (Given that the number of mock logic is not that high).

Another benefit is that all the internal dependencies API (class names, function signature etc.) are more refined because they are used inside the tests (e.g. for setting up or verification) through mocks. The downside of this is that the these mocks often make refactoring harder, since changing implementation details (like for example extracting a class) will most likely break the test (because the extracted class needs to be mocked), even though the behavior of the feature did not change.

These types of tests work in isolation, which only verify that one unit (in this case, a class) works correctly. In order to verify that a group of units behave correctly together, there is a need for additional integration tests.

One unit, multiple classes

An alternative way of thinking about a unit could be a cohesive group of classes for a given feature. These tests try to use real dependencies instead of mocks, however awkward, complex or boundary dependencies (e.g. networks, persistence etc.) or are still replaced with a test double (usually written by hand instead of mocked by a framework).

The most frequent test doubles are Fakes, which resemble the real implementation but in a simpler form to allow testing (e.g. replacing a real database with an in-memory one).

There are also Stubs which are set-up before the action (AAA) allowing the system under test to use predefined values (e.g. instead of returning system time, a predefined time value is returned).

💡
Having a modularized project allows for creating a "core-test" module which contains all public Test Doubles. Thanks to this, they can be re-used across all other modules without having to duplicate implementations
bliki: TestDouble
Test Double is generic term for fakes, mocks, stubs, dummies and spies.
Mocking is not practical — Use fakes
This article talks about the benefits fakes provide over mocks in testing software. Fakes lead to better API and readable/robust tests.

My strategy for testing Kotlin Multiplatform

Because mocking the Kotlin Multiplatform is far from perfect, I went the path of writing test doubles by hand. The problem with this is that if we wanted to write every Kotlin Multiplatform unit test like on Android (unit == class). We would be forced to create interfaces for every class along with a test double for it. Which would add unnecessary complexity just for testing purposes.

This is why I decided for the most part to treat a unit as a feature / behavior (group of classes). This way, there are less test doubles involved, and the system is tested in a more "production" like setting.

Depending on the complexity, the tests might become integration tests rather than unit tests, but in the grand scheme of things it's not that important as long as the system is properly tested.

The system under test

Most of the time, the system under test would be the public domain class that the Kotlin Multiplatform module exposes or maybe some other complex class which delegates to other classes.

If we had a feature that allowed the user to input a keyword and get a search result based on that keyword, the Contract / API could have the following signature:

fun performSearch(input: String): List<String>

This could be a function of an interface, a use case or anything else, the point is that this class has some complex logic.

Tests for this feature could look like this:

class SuccesfulSearchTest
class NetworkErrorSearchTest
class InvalidKeywordSearchTest

Each test class exercise a different path that the system could take. In this case, one for a happy path, and two for unhappy paths. They could only focus the domain layer where the network API is faked, or they could also include the data layer where the real network layer is used but mocked somehow (e.g. Ktor MockEngine, SQLDelight In-Memory database, GraphQL Mock Interceptor).

The keyword validation might contain a lot of edges which may be hard to test through the InvalidKeywordSearchTest which could only focus on the domain aspects of what happens on invalid keywords. All the edge cases could be tested in a separate class:

class KeywordValidatorTest {

	fun `"ke" is invalid`()
	fun `"key" is valid`()
	fun `" ke " is invalid`()
	fun `" key " is valid`()
}

The example above is pretty simple, however, testing the KMP "public Contract / API" is a good start.

💡
For complex logic that does not involve orchestrating other classes (e.g. conversions, calculations, validation), try to extract a separate class which can be tested in isolation. This way you have one granular test which covers all the complex edge cases while keeping the "integration" tests simpler by only caring about one or two cases for the complex logic (making sure it is called)

Test set-up with Object mothers

Because this strategy might involve creating multiple test classes, this means that the system under test and its dependencies need to be created multiple times. Repeating the same set-up boilerplate is tedious and hard to maintain because one will require change in multiple places.

To keep things DRY, object mothers can be created, removing boilerplate and making the test set-up simpler:

class SuccesfulSearchTest : KoinTest {


    private lateinit api: Api
    private lateinit sut: SearchEngine

    @BeforeTest
    fun setUp() {
        api = FakeApi()
        sut = createSearchEngine(api)
    }


    // ...
}

fun createSearchEngine(
    api: SearchEngine, 
    keywordValidator: KeywordValidator = KeywordValidator()
) =
    SearchEngine(api, keywordValidator)

The top level function, createSearchEngine, can be used in all the SearchTests for creating the system under test. An added bonus of such object mothers is that irrelevant implementation details like the KeywordValidator are hidden inside the test class.

💡
Such object mothers can also be used for creating complex data structures like REST schemas or GraphQL queries

Test set-up with Koin

Another way to achieve the set-up would be to use dependency injection, luckily Koin allows for easy test integrations, which more or less comes down to this:

class SuccesfulSearchTest : KoinTest {

    private val sut: SearchEngine by inject()

    @BeforeTest
    fun setUp() {
        startKoin {
            modules(systemUnderTestModule, testDoubleModule)
        }
    }

    @AfterTest
    fun teardown() {
        stopKoin()
    }

    // ...
}

The test needs a Koin module which will provide all the needed dependencies. If the Kotlin Multiplatform code base is modularized, the systemUnderTestModule could be the public Koin module that is attached to the dependency graph (e.g. module, dependency graph). An example test suite which uses Koin for test set-up can be found in my ktor-mock-tests repository.

Contract tests for Test Doubles

When creating test doubles, there might be a point when they start becoming complex, just because the production code they are replacing is also complex. Writing tests for test helpers might seem unnecessary, however, how would you otherwise prove that a test double behaves like the production counterpart? Contract tests serve that exact purpose, they verify that multiple implementations behave in the same way (that their contract is preserved).

For example, the system under test uses a database to persist its data, using a real database for every test will make the tests run a lot longer. To help with this, a fake database could be written to make the tests faster. This would result in the real database being used only in one test class and the fake one in all other cases.

Let's say that real database has the following rules (contract):

  • adding a new item updates a "reactive stream"
  • new items cannot overwrite an existing item if their id is the same

The contract base test could look like this:

abstract class DatabaseContractTest {
   
   abstract var sut: Dao
   
   @Test
   fun `New items are correctly added`() {
        val item = Item(1, "name")
   
        sut.addItem(item)
        
        sut.items shouldContain item
   }
   
   @Test
   fun `Items with the same id are not overwritten`() {
        val existingItem = Item(1, "name")
        sut.addItem(existingItem)
        val newItem = Item(1, "new item")
   
        sut.addItem(newItem)
        
        assertSoftly {
            sut.items shouldNotContain newItem
            sut.items shouldContain existingItem
        }
   }
}

The base class contains the tests which will be run on the implementations (real and fake database):

class SqlDelightDatabaseContractTest : DatabaseContractTest() {
    override var sut: Dao = createDealDatabase()
}

class FakeDatabaseContractTest : DatabaseContractTest() {
    override var sut: Dao = createFakeDatabase()
}

This is just a trivial example to show a glimpse of what can be done with contract tests. If you'd like to learn more about this, feel free to check out these resources:

bliki: ContractTest
Test Doubles avoid non-deterministic errors, but you need Contract Tests to ensure they remain consistent with the real services.
Outside-In TDD - Search Functionality 6 (The Contract Tests)
In this video, we are leveraging a fake test-double to emerge contract tests that we could use later on to make sure that every new shipping implementation a...

Big shout out to Jov Mit for creating so many Android related testing content which inspired this testing strategy (If you're interested in Test Driven Development, be sure to check out his insightful screencast series on YouTube).

Benefits of the Strategy

Development speed

The strategy I'm proposing would verify the KMM feature / module correctness at a larger scale instead of focusing on verifying individual classes. This more closely resembles how the code behaves in production, which gives us more confidence that the feature will work correctly in the application. This in turn means that there is less need to actually open up the application every time.

Building applications using Kotlin Multiplatform usually takes longer than their fully native counterparts. The Android app can be built relatively fast thanks to incremental compilation on the JVM, however for iOS the story is different. Kotlin / Native compilation in itself is pretty fast, the issue arises when creating the Objective-C binary where the gradle tasks linkDebugFrameworkIos and linkReleaseFrameworkIos are called. Luckily, tests avoid that because they only compile Kotlin / Native without creating the Objective-C binary.

Ignoring the build speed issues, let's say that the build didn't take longer. Building the whole application means building all of its parts. But when we work on a feature, we typically only want to focus and verify a small portion of the entire application. Tests allow just that, verifying only a portion of the app without needing to build everything. When we're finished working on a feature, we can plug the code into the application and verify that it correctly integrates with other parts of the application.

Test function / test case names

Because these tests focus more on the end result of the feature rather than on implementation details of a single class, the test function names reflect the behavior of the feature. A lot of time, this behavior also represents the business requirements of the system.

Refactoring

With this testing strategy, refactoring would be easier because the tests don't dive into the implementation details of the system under test (like mocks tend to do). They only focus on the end result, as long as the behavior remains the same*, then the tests don't care how it was achieved.

* And it should be the same, since that's what refactoring is all about.

Kotlin Multiplatform threading

The new memory model is the default now, so there are no limitations like in the old memory model. To read more about the old one, you can visit the previous version of this article which covers that.

Test double reusability

The last thing I want to touch on is test double reusability. To keep the code DRY, the test doubles could also be moved to a common testing module, which helps with its reusability. For example, the data layer test doubles (e.g. network or persistence) can be often reused for UI tests. An example of this can be found in my Ktor Mock Engine article, where the integration tests and UI tests use the same engine for returning predefined data (not strictly a test double, but you get the idea). The repository in the article is Android only, but it can easily be applied to Kotlin Multiplatform since Ktor has great support for it.

Downsides of the Strategy

Test speed*

The first thing I want to address is the test speed because no one wants to wait too long for the tests to complete. Tests which treat unit as a class are superfast, but only when they use normal test doubles. Mocking frameworks with all their magic take up a lot of time and make the tests much slower compared to a test double written by hand.

The test strategy I'm proposing does not use any mocking framework, only test doubles written by hand. However, the tests might use a lot more production code in a single test case, which does take more time. From my experience working with these types of tests on Kotlin Multiplatform, I didn't see anything worrying about the test speed (besides Kotlin / Native taking longer). Additionally, if the KMM code base is modularized then only the tests from a given module are executed, which is a much smaller portion of the code base.

* Test speed is a subjective topic, where every one has a different opinion on it. Martin Fowler has an interesting article which touches on this topic:

bliki: UnitTest
Unit Tests are focused on small parts of a code-base, defined in regular programming tools, and fast. There is disagreement on whether units should be solitary or sociable.

Test readability

As I said before, these tests tend to be more on the integration side than the unit side (depending on how you define it). This means that more dependencies are involved, and more set-up is required.

To combat this, I recommend splitting the tests into multiple files, each focusing on a distinctive part of the behavior. Along with object mothers, the set-up boilerplate can be reduced and implementation details hidden.

To understand these tests, more internal knowledge about the system under test is required. This is a doubled edged sword because the tests are not as easy to understand, but after you understand them you'll most likely know how the system under test works along with its collaborators.

Hard to define what should be tested

Tests where unit is class are easy to write because they always focus on a single class. When a unit is a group of classes, it is hard to define what the group should be, how deep should the test go?

Unfortunately, there is no rule that works in every case. Every system is different, and has different business requirements. If you start noticing that the test class is becoming too big and too complex, this might be a sign that the test goes too deep.

💡
There might be a feature which just keeps growing, and the test becomes really hard to understand. In such cases, it might be good to refactor the SUT and extract some cohesive group of logic to a separate class which can be tested more granularly. Then the original test uses a Test Double for the extracted class, making it easier to understand. This does require refactoring test code, but makes the Test Suite easier to understand

CI / CD

Tests are useless when they are not executed, and it's not a good idea to rely on human memory for that. The best way would be to integrate tests into your Continuous Integration, so they are executed more frequently.

For example, tests could be run:

  • On every PR, making sure nothing broken is merged to the main branch.
  • Before starting the Release process.
  • Once a day
  • All of the above combined.
💡
For Kotlin Multiplatform, it is important to execute test for all targets. Sometimes everything works on the JVM, but Kotlin Native fails (e.g. Regex, NSDate). So the best way of making sure every platform behaves in the same way is to run tests for all targets

Summary

In my opinion, Kotlin Multiplatform should be the most heavily tested part of the whole application. It is used by multiple platforms, so it should be as bulletproof as possible.

Writing tests during development can cut down on compilation time and give confidence that any future regressions (even 5 minutes later) will be caught by the test suite.

I hope this article was informative for you, let me know what you think in the comments below!