Computer scienceProgramming languagesKotlinConcurrency and parallelismConcurrency

Structured Concurrency & Cancellation

19 minutes read

With coroutines we can write asynchronous operations in a regular sequential way, without the need for explicit verbose code that awaits for completion. In this respect, a few questions may arise: if many operations run asynchronously, how do we know when they all finish, or, more important, how do we know if they fail? If we start downloading a bunch of images for a gallery app but the authentication fails or the user decides to see another screen, what will happen with all the massive loading in the background? Well, we can actually control that in a nice way with structured concurrency.

Scopes

Coroutines offer scopes to define the restrictions for running operations. Loading images for one screen should be done in one scope, for another screen — in the second scope, and an application as a whole may define its own scope, which includes both. As you can see, scopes can and often should be nested.

Stopping scoped jobs

Check out the following code, which loads images for some application: two mandatory and one that may be potentially used in the future (pre-caching).

import kotlinx.coroutines.*

suspend fun loadImage(name: String) {
    delay(50) // imitate long-running operation
    println("Loaded $name")
}

suspend fun preCache(name: String) {
    delay(100) // imitate long-running operation
    println("Cached $name")
}

suspend fun loadScreenInGlobalScope() {
    GlobalScope.launch { loadImage("image_1") }
    GlobalScope.launch { loadImage("image_2") }
    GlobalScope.launch { preCache("image_3") }
    throw Exception("Unexpected failure") // simulate crash in main code
}

suspend fun main() {
    runCatching { // shortcut to ignore all exceptions with try/catch
        loadScreenInGlobalScope()
    }
    delay(200) // wait long enough to see the results
}

We are launching everything in the global scope of the application, which means that launched operations will stop only when they are done or when the whole application stops. Nothing that happens after they were launched affects them. Here we simulate an error, which won't affect the loading process. This app will print (in no particular order):

Loaded image_2

Loaded image_1

Cached image_3

Let's create our own scope now with a handy builder function coroutineScope.

suspend fun loadScreenInOwnScope(): Unit = coroutineScope {
    launch { loadImage("image_1") }
    launch { loadImage("image_2") }
    launch { preCache("image_3") }
    throw Exception("Unexpected failure") // simulate crash in scope
}

suspend fun main() {
    runCatching { // catch all exceptions
        loadScreenInOwnScope()
    }
    delay(200) // wait long enough to see the results
}

Now, it will print nothing, even with a 200ms delay at the end, because the exception inside the scope will cancel all unfinished operations. Remember that cancellation of coroutines is cooperative: they get canceled only if they actively check for cancellation or if they run suspending functions, so the compiler will do the check for us. In our case, delay is a suspending function that makes cancellation work as expected. If we insert a 75ms delay before the exception, we can see how loadImage still manages to finish while preCache gets stopped.

Manual cancellation

With an access to the context, we can now have even more control. It offers two methods to cancel work: cancel() and cancelChildren(). Let's see how they work. If we replace the exception in the previous example with the cancel() call, we'll get the same result — nothing will get printed.

suspend fun loadScreenInOwnScope(): Unit = coroutineScope {
    launch { loadImage("image_1") }
    launch { loadImage("image_2") }
    launch { preCache("image_3") }
    this.coroutineContext.cancel()
}

The difference between cancel() and cancelChildren() becomes visible when there is suspended work both in the scope itself and in its children (created by the launch builder). The following example will only print 6 lines and won't finish loading:

suspend fun doSelfCancelingJob() = coroutineScope {
    launch { preCache("image_3") }
    for (i in 1..10) {
        println("Running long operation for ${i * 10}ms")
        delay(10)
        if (i == 5) {
            this.coroutineContext.cancel()
        }
    }
}

And it's 6 lines, not 5, because of the cooperative nature of coroutines: the loop will continue after the job has been "already" canceled until the next suspending function call, which happens after println. Swap println and delay, and now it will stop after 5 iterations as expected.

So, if we need to stop all the background activity but don't want to break our main loop, we should replace this.coroutineContext.cancel() with this.coroutineContext.cancelChildren(). It will still stop the caching job but let the loop complete and print all 10 messages.

Failures in children

What about the situation when one of the children fails with an exception but we need them all to finish to do the job? For example, if we are downloading different chunks of a large file in parallel. The scope will cancel all the work, the same way it did before. All exceptions from children are propagated up to the parent. Let's modify the example slightly:

suspend fun loadImage(name: String) {
    if (name.endsWith("2")) {
        throw Exception("Error loading image")
    }
    delay(50) // imitate long-running operation
    println("Loaded $name")
}

// ... the rest is the same

suspend fun loadScreenInOwnScope(): Unit = coroutineScope {
    launch { loadImage("image_1") }
    launch { loadImage("image_2") }
    launch { preCache("image_3") }
}

There is no more explicit cancellation, but the second loading operation fails. This code will also print nothing because the parent job will cancel loading of image 1 and caching image 3 immediately. That's what we want, except that we do not want to cancel everything when pre-caching fails, as it is an optional operation. How can we tell the scope to treat it differently?

Supervised jobs

To control the cancellation policy, we need to change the type of the job of our root function and make it supervised. Each coroutine is represented by a job object. That's what the launch coroutine builder returns, and that's what actually gets canceled under the hood when we call CoroutineScope.cancel. A regular job class propagates errors both down- and up-stream in the hierarchy to cancel both children and parents. If we change it to the SupervisorJob, we'll say that now we are going to control cancellation manually. An error from a child job won't affect the parent anymore and consequentially won't cancel other children of that parent, while still canceling everything under the failed child. Let's look at this slightly modified example:

suspend fun loadScreenInSupervisedScope() = supervisorScope {
    val job1 = launch { loadImage("image_1") }
    val job2 = launch { loadImage("image_2") }
    val job3 = launch { throw Exception("Fail optional job") }
}

supervisorScope is another scope builder, it adds SupervisorJob instead of a regular Job to the context under the hood, and now failure in job3 won't affect job1 and job2 anymore (we can also manually create a scope like that for the later use with CoroutineScope(SupervisorJob()) call).

Execution status

Job interface has a lot of handy properties to check the current status of execution, start it if it's not running yet, or cancel it when it's not needed anymore. It also allows waiting for a job to finish: job1.join() will suspend until job1 completes. So we can wait for non-optional tasks as follows:

job1.join()
job2.join()

This code will first wait for job1, then for job2; if we want to wait for all jobs regardless of their order, there is a handy helper joinAll:

joinAll(job1, job2)

Conclusion

With structured concurrency, we can avoid a lot of typical issues with asynchronous code, like failures in the main code when a background operation fails, or failure to cancel background work when it's not needed anymore. And SupervisorJob allows for precise control over what should or should not be canceled when an error occurs.

38 learners liked this piece of theory. 3 didn't like it. What about you?

Report a typo