Computer scienceProgramming languagesPythonCode qualityCode performance

More on multithreading

13 minutes read

This topic will focus on how threads communicate with each other and their role in multithreading applications in Python. It's important to note that threads belonging to the same process in Python don't run exactly simultaneously. A single-core processor can perform only one computation at a time.

However, even before multicore processors appeared, it was possible to have several programs running concurrently. This is similar to how you can type a new address in your browser while a website is loading.

In this section, we will provide examples of how threads communicate and synchronize with each other.

Synchronizing threads

A common problem in a multithreading application concerns critical sections.

A critical section is a segment of code where threads access shared resources, such as common variables and files, and perform write operations on them.

But what is so critical in this part of code? You can imagine a critical section as a workplace where different workers read or write in the same notepad, and they should not delete the work of each other.

At this point, it is crucial to synchronize the threads that run simultaneously in this part of code.

Synchronizing threads ensures that two or more concurrent threads do not execute the instructions (a critical section) simultaneously.

Now we should define the concept of the race condition.

A race condition occurs when two or more threads can access the shared data, trying to change it at the same time. As a result, the values of variables will vary, depending on which thread is running first and which was the last.

Let's start with a concrete example of the race condition and its problems.

Understanding the problem of race condition

Let's explain the first example – a function called calc_price. It prints the name and the price of an item three times. Of course, this example is just a demonstration of the issue of multiple threads, since our function is very basic and has in essence only printing instructions.

The race here is for a global variable named total. We initialize two threads: t1 and t2, they both will try to write and later read the shared variable. They both call calc_price but with different arguments (name and price):

import time
from threading import Thread

total = 0


def calc_price(name, item_price):
    for i in range(3):
        print("Item: ", name)
        time.sleep(2)
        total = item_price
        print("Price: ", total)


t1 = Thread(target=calc_price, args=("Shirt", 5))
t2 = Thread(target=calc_price, args=("Jeans", 10))

t1.start()
t2.start()

In this application, each thread prints the item's name and price. The output shows us which thread was running first and which was the last. Note that the print results are not regular, in the sense that the values are mixed with each other. Each time we are running the application, the result depends on which thread is running first.

Item:  Shirt
Item:  Jeans
Price: Price:  10
Item:  Jeans
 5
Item:  Shirt
Price:  10
Item:  Jeans
Price:  5
Item:  Shirt
Price:  10
Price:  5

We need to synchronize them to solve this problem. Both threads t1 and t2 access the total variable, with the aim to write something inside it. The first threads that start will "lose" the value in the shared data, and the last running thread will overwrite it.

What is lock?

Lock is one of the synchronization techniques. A lock is an abstraction that allows one thread to own it at a time. Holding a lock is how one thread tells other threads: "This thing is mine, don't touch it right now."

Locks have two main functions:

Acquire allows a thread to take ownership of a lock. If a thread tries to acquire a lock currently owned by another thread, it blocks until the other thread releases the lock. At that point, it will contend with any other threads that are trying to acquire the lock. Only one thread can own the lock at a time.
Release relinquishes ownership of the lock, allowing another thread to take ownership of it.

A solution to the problem of the race condition is by adding acquire and release functions of a Lock object. In this example, we will add the Lock class, imported from the threading library. We will define a new Lock object — l. The functions acquire() and release() will lock the instructions where total is accessed.

from threading import Thread, Lock
import time

l = Lock()
total = 0


def calc_price(name, item_price):
    for i in range(3):
        l.acquire()
        print("Item:", name)
        time.sleep(2)
        total = item_price
        print("Price:", total)
        l.release()


t1 = Thread(target=calc_price, args=("Shirt", 5))
t2 = Thread(target=calc_price, args=("Jeans", 10))

t1.start()
t2.start()

In the console, we will see the result shown below. Note that once the first thread has finished, the second one starts its job.

Item: Shirt
Price: 5
Item: Shirt
Price: 5
Item: Shirt
Price: 5
Item: Jeans
Price: 10
Item: Jeans
Price: 10
Item: Jeans
Price: 10

In a Lock object, only one thread at a time is allowed to execute, but occasionally, we need to execute a particular number of threads simultaneously.

We can wait for a thread to finish the execution by calling the join() function. This method allows the current thread to be blocked until the target thread it has joined is finished.

...

t1.start()
t1.join()

t2.start()
t2.join()

Let's modify this example and add five items and prices, each represented with its thread. Also, let's change it, so three threads can simultaneously access the total price. In this case, we cannot use Lock anymore. We should go further and use the concept of the semaphore.

Semaphore

The semaphore concept is one of the oldest synchronization primitives in the history of computer science, invented by the early Dutch computer scientist Edsger W. Dijkstra. He used the names P() and V() instead of acquire() and release().

A semaphore manages an internal counter which is decremented by each acquire() call and incremented by each release() call. The counter can never go below zero; when acquire() finds that it is zero, it blocks, waiting until some other thread calls release().

Semaphores can be of two types:

Binary Semaphore — this semaphore can have only two values – 0 or 1. Its value is initialized to 1.
Counting Semaphore — its value can be 0, 1, or other integer values. It is used to control access to a resource that has multiple instances.

In the following example, we will create five threads, and the counter in the Semaphore object will be three. So three is the number of threads accessing the shared variable simultaneously.

from threading import Thread, Semaphore
import time

# creating Semaphore, where count = 3
sem = Semaphore(3)
total = 0


def calc_price(name, item_price):
    sem.acquire()
    for i in range(2):
        print("Item:", name)
        time.sleep(10)
        total = item_price
        print("Price:", total)
    sem.release()


# creating multiple threads
t1 = Thread(target=calc_price, args=("Shirt", 5))
t2 = Thread(target=calc_price, args=("Jeans", 10))
t3 = Thread(target=calc_price, args=("Dress", 12))
t4 = Thread(target=calc_price, args=("Belt", 3))
t5 = Thread(target=calc_price, args=("Bag", 20))

# calling the threads
t1.start()
t2.start()
t3.start()
t4.start()
t5.start()

As with files, you can also use the with context manager with a semaphore. It will allow you to omit the explicit calling of acquire() and release(), since it will be automatically managed by the manager. calc_price() may also look this way:

def calc_price(name, item_price):
    with sem:
        for i in range(2):
            print("Item:", name)
            time.sleep(10)
            total = item_price
            print("Price:", total)

In this example, we've created an instance of the Semaphore class, called sem where the value of count is 3. This means that three threads can access sem at a time.

Whenever we call the start() method, three threads are allowed to access the semaphore, and hence three threads are allowed to execute calc_price() method at a time.

In the console, you can see a result like this one:

Item: Shirt
Item: Jeans
Item: Dress
Price: 12
Price: 5
Item: Shirt
Price: 10
Item: Jeans
Item: Dress
Item: Bag
Item: Belt
Price: 12
Price: 5
Price: 20
Item: Bag
Price: 10
Price: 3
Item: Belt
Price: 20
Price: 3

In this example, whenever we will run the application, we will get the non-ordered values of pairs (item – price). Most probably, the result will begin with three prints; this is because three different threads are calling calc_price() simultaneously.

Lock vs. semaphore

Let's compare the two mechanisms: lock and semaphore. Both use buffers to store shared data temporarily. Each buffer is capable of holding a single item.

An overview of the main features of the Lock and Semaphore is shown below.

Lock	Semaphore
Only one thread per process can share the same lock.	Multiple threads of the same process can share the same semaphore.
Only one thread works with the entire buffer at a given instance of time.	Threads can work on different buffers at a given time.
A lock is considered an object.	A semaphore is considered an integer with values.
Locks do not have any subtypes.	A semaphore can be binary or counting.

Conclusion

The role of synchronization is crucial in a multi-threaded program. In this topic, we've introduced the lock object, which allows only one thread to write or read in a shared variable. Another mechanism for thread synchronization is semaphore, allowing a limited number of threads to access a shared variable simultaneously.

The examples we have seen in this tutorial are trivial, used to perform simple tasks such as setting the price of the product. In real applications, you should consider these techniques and apply them to have better communication between threads.

25 learners liked this piece of theory. 15 didn't like it. What about you?

Report a typo