When trying to improve the performance of a program in Python, concurrency is often considered. Choosing the right concurrency model can drastically impact your application, depending on the type of software you're developing.
In this article, we will explore async/await vs threads in Python. Understanding these concepts will help you write better code and know when to use each.
If you are interested in more content covering topics like this, subscribe to my newsletter for regular updates on software programming, architecture, and tech-related insights.
Async/await
Asynchronous programming in Python is relatively recent but is evolving rapidly. Asynchronous programming is particularly useful for tasks that involve waiting for external responses, such as network requests, database access, or I/O operations like reading or writing files. These operations can block the execution of your program unnecessarily. With asynchronous programming, you can continue executing other tasks while waiting for a response or feedback.
In Python, asynchronous programming is introduced through the asyncio
library using the async
and await
keywords. Let's dive into the key components of asynchronous programming in Python:
Event Loop: The event loop is the heart of asynchronous programming in Python. It manages the execution flow and task switching, keeping track of tasks to be run asynchronously.
Coroutines: Coroutines are special functions that can be paused and resumed, allowing other tasks to run while waiting. Coroutines specify where task-switching events should occur, returning control to the event loop. They are created by the event loop and stored in a task queue.
Futures: Futures are placeholders for the results of coroutines. They store the results or exceptions. When the event loop initiates a coroutine, a corresponding future is created, which stores the result or an exception if one occurs during the coroutine’s execution.
Example of Asynchronous Programming
Here is what asynchronous programming looks like in Python code:
import asyncio
async def fetch_data():
print("Start fetching")
await asyncio.sleep(2) # Simulate network delay
print("Done fetching")
async def main():
await asyncio.gather(fetch_data(), fetch_data(), fetch_data())
asyncio.run(main())
In this example:
The
fetch_data
function is a coroutine because it uses theasync def
keyword. It prints a start message, waits for 2 seconds (simulating a network delay), and then prints a done message.The
main
function runs threefetch_data
coroutines concurrently usingasyncio.gather
.asyncio.run
(main())
starts the event loop and executes themain
coroutine.
This example demonstrates how asynchronous programming allows multiple tasks to be executed concurrently without blocking each other.
Now that we've explored asynchronous programming with async/await, let's shift our focus to another concurrency model in Python: threading.
Threading in Python
Another way to improve the performance of your Python program is through threading. Threads allow you to run multiple threads of execution within a single process. Each thread can run independently, enabling you to perform tasks concurrently.
In Python, threads are managed using the threading
module. Threads can run in parallel, allowing concurrent execution of tasks. However, they share the same memory space, which means they can exchange data between each other. This should be done carefully to avoid race conditions.
Threads are particularly useful for CPU-bound operations, tasks that require heavy computation such as data processing, image processing, or machine learning. They are also useful for background tasks, such as periodic tasks for monitoring or synchronization. Threads are an alternative to asyncio
for I/O-bound tasks with blocking operations when the API or libraries you are dealing with do not support asynchronous programming.
Example of Threading
Here is how you can achieve threading in Python:
import threading
def worker():
print("Worker thread is running")
for _ in range(5):
print("Working...")
threads = []
for i in range(3):
thread = threading.Thread(target=worker)
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
In this example:
The
worker
function prints a message and runs a loop to print "Working..." five times.We create and start three threads, each running the
worker
function. These threads are added to a list so we can manage their execution and join them later.The
thread.join()
method waits for all threads to complete before the main program continues.
This example demonstrates how threading allows multiple tasks to be executed concurrently within a single process.
Having discussed both async/await and threading, it's important to understand how to choose between them based on your specific use case.
Async/Await vs Threads
When deciding between using threads or async/await in Python, consider the following points:
Are there I/O-bound tasks?
If yes, and the program allows for async, use
asyncio
.If no, or if the program does not support async, use threads.
Do you need high concurrency without blocking?
- If yes, use
asyncio
. It allows for handling multiple tasks concurrently and efficiently without blocking the execution.
- If yes, use
Is the task CPU-bound and requires heavy computation?
- If yes, use threads. They are suitable for data processing, image processing, machine learning, and other CPU-intensive tasks.
Are you working with existing blocking APIs or libraries?
- If yes, use threads. They can help manage blocking operations when async/await isn't an option.
Do you have background tasks or need periodic monitoring?
- If yes, use threads. They are effective for running tasks in the background without blocking the main program.
It is important to note that due to the Global Interpreter Lock (GIL) in Python, threads do not achieve true parallelism for CPU-bound tasks. The GIL is a mutex that protects access to Python objects, preventing multiple native threads from executing Python bytecodes at once.
This lock is necessary because Python's memory management is not thread-safe. Therefore, when using threads in Python, only one thread executes Python code at a time in a single process.
To achieve true parallelism, we need to use multiprocessing. This can be done using the multiprocessing
library, which bypasses the GIL by using separate processes, each with its own Python interpreter and memory space.
Example of Multiprocessing
Here is how you can achieve true parallelism using multiprocessing in Python:
from multiprocessing import Process
def worker():
print("Worker process is running")
for _ in range(5):
print("Working...")
processes = []
for i in range(3):
process = Process(target=worker)
processes.append(process)
process.start()
for process in processes:
process.join()
In this example:
The
worker
function prints a start message and runs a loop to print "Working..." five times.We create three
Process
objects targeting theworker
function, start them, and add them to a list to manage their execution.The
process.join()
method waits for all processes to complete before the main program continues.
This example shows how to achieve true parallelism using separate processes.
When to Use True Parallelism
When deciding between using threads or multiprocessing for true parallelism in Python, consider the following questions:
Do you need to bypass the Global Interpreter Lock (GIL) to achieve true parallelism?
If yes, use multiprocessing.
If not, threads can be used for simpler concurrency needs.
Is the task I/O-bound, and can it be managed with lightweight concurrency?
If yes, use threads.
If not, and you need true parallelism for CPU-bound tasks, use multiprocessing.
With a better understanding of when to use async/await, threads, or multiprocessing, you can now make informed decisions based on the specific requirements of your application.
Conclusion
In this article, we learned when to use async/await, threads, and multiprocessing in Python. Each concurrency model has its use cases depending on the type of tasks you are dealing with. Async/await is ideal for I/O-bound tasks, threads are suitable for CPU-bound tasks and background tasks, and multiprocessing is used for true parallelism to bypass the GIL.
And as every article can be made better so your suggestions or questions are welcome in the comment section.
If you enjoyed this article and want to stay updated with more content, subscribe to my newsletter. I send out a weekly or bi-weekly digest of articles, tips, and exclusive content that you won't want to miss 🚀