Computer scienceProgramming languagesPythonCode qualityCode performance

Profiling in Python

11 minutes read

Today, you've decided to finish an elaborate program. The code is almost finished; you've started the initial testing, but the program doesn't perform as fast as you've expected. "Why does it take this long and occupy so many resources?" you wonder. The answer is simple if you start looking in the right direction.

To improve the performance of your code, you need to analyze and measure the metrics of interest. This analysis is also called profiling. Profiling a Python program means a dynamic analysis that measures the execution time of each function. This may give you some insights into possible optimization.

At first glance, it may seem obvious where your code is struggling, but if you don’t profile, it could be difficult to pinpoint the bottlenecks. Lucky for us, Python provides many great modules to measure the program statistics.

The time module

To profile your code, you need to know how to clock the time. You can use a timer for this:

from time import time

start = time()
# your script here
end = time()
print('It took', end - start, 'seconds!')

Timers are easy to implement. However, using a timer on any function shows us the run time of that specific function only. If you want to use this approach for finding the bottlenecks, you need more information. You need to account for the following points to carry out efficient profiling:

We should know the total run time of the program to have a better picture of the relative function run time. For example, if a function takes three minutes to execute, does that represent 5%, 15%, or 60% of the total run time?
We should understand what we're trying to achieve to label a piece of code as a bottleneck. Even if a function takes 10 minutes to run, we should worry about its inefficiency only, provided we are confident that other parts are OK.

A profiler package like cProfile can help us find the bottlenecks in our code.

Profiling with cProfile.run()

cProfile is a deterministic profiler for Python and is recommended for most users, as the official documentation states. In general terms, it creates a set of statistics that displays the total time spent in certain parts of the code and how often the portion of the code was called.

cProfile is written in C as a Python extension and a built-in module. It doesn’t affect the amount of time much. The cProfile module provides a simple run() function that is sufficient for most cases. All you need to do is to pass what you want to profile as a string statement to run(). Let's take a look at an example of profiling the recursive algorithm for the Fibonacci sequence and gather some information on the algorithm performance:

import cProfile

def fib_recursive(n):
    if n < 2:
        return n

    return fib_recursive(n - 1) + fib_recursive(n - 2)

cProfile.run("fib_recursive(30)")

         2692540 function calls (4 primitive calls) in 1.322 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.321    1.321 <string>:1(<module>)
2692537/1    1.321    0.000    1.321    1.321 test1.py:4(fib_recursive)
        1    0.000    0.000    1.322    1.322 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

As you can see, the profiler has recorded information on our function. Let us see how we can interpret it:

Headers	Description
`ncalls`	The number of calls. We should try to optimize the functions that have a lot of calls or take too much time per call.
`tottime`	The total time spent in the function, excluding the time taken to call sub-functions. This column is crucial for us. We can see that the `fib_recursive` function is called 2692537 times, with a total time of 1.321 sec.
`cumtime`	The cumulative time. In other words, it is the total time spent in the function plus the time spent in all sub-functions. As an example, imagine that our `fib_recursive` function is appended to a list; every number there is passed in as an argument. The time spent calling the `append` function would be stated here, not in `tottime`.
`percall`	There are two “per call” metrics. The first one is the total time per call, and the second one is the cumulative time per call.
`filename: lineno (function)`	Provides data about the filename and the line number of each function.

Now, let us see another approach of the same Fibonacci sequence, but this time we use a list algorithm:

def fib_list(n):
    if n < 2:
        return n

    sequence = [0, 1]

    for i in range(2, n + 1):
        sequence.append(sequence[i - 1] + sequence[i - 2])

    return sequence[n]

cProfile.run("fib_list(30)")

         33 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 test1.py:4(fib_list)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
       29    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

We can see a huge improvement in the number of calls per function, and the algorithm gets executed in no time. The recursive function in Python for Fibonacci numbers seems so easy until you try it for yourself and look at the profiler.

The algorithms above are simple examples. Imagine a real-world scenario, where you need to create an algorithm that searches through an enormous amount of data. In this case, it is really important to have a good-performing algorithm.

Profile class in cProfile

cProfile.run() is sufficient in most cases, but if you need more control over profiling, use the Profile class in cProfile. Let's take a look at a for loop inside the fib_list function:

import cProfile

profiler = cProfile.Profile()

def fib_list(n):
    if n < 2:
        return n
    sequence = [0, 1]
    profiler.enable()
    for i in range(2, n + 1):
        sequence.append(sequence[i - 1] + sequence[i - 2])
    profiler.disable()
    return sequence[n]

fib_list(300)

profiler.print_stats()

First, create an instance of the Profile class called profiler and collect the profiling data by calling the enable method. When you want to stop collecting the profiling data, call the disable method. After calling the function, we can simply print the results to the standard output by calling the print_stats() method. The resulting table will look similar to the one from the first example, but it won't contain unnecessary profiling information from other parts of our code.

         300 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      299    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Since it’s hard to tell where your program spends the time, you should start by profiling your entire program using the cProfile.run() function before narrowing it down to a single section.

We can also format the collected statistics using the pstats.Stats class constructor to create an instance of the statistics object called stats and use a text stream to save the results. Don't forget to take a look at the official documentation.

import cProfile, pstats, io
from pstats import SortKey

profiler = cProfile.Profile()

def fib_list(n):
    if n < 2:
        return n
    sequence = [0, 1]
    profiler.enable()
    for i in range(2, n + 1):
        sequence.append(sequence[i - 1] + sequence[i - 2])
    profiler.disable()
    return sequence[n]

fib_list(300)

stream = io.StringIO()
stats = pstats.Stats(profiler, stream=stream)
stats = stats.sort_stats(SortKey.CUMULATIVE)
stats.print_stats()
print(stream.getvalue())

The Stats class creates a statistics object from a profile object and prints the output to the stream that is passed to it, in our case, io.StringIO() is a text stream. The Stats class also has a sort_stats method that sorts the results based on the provided criteria. In this case, the criterion is SortKey.CUMULATIVE. As described in the official documentation, the sorting criteria can be in the form of a SortKey enum (added in Python 3.7) or a string (using cumulative instead of SortKey.CUMULATIVE is valid). After creating the results and printing them to the stream using the print_stats() method, you can print them to the standard output by calling the getvalue() method from our text stream.

Conclusion

In this topic, we've discussed the importance of code profiling and explained different approaches to it in Python:

Profiling with the time module is an easy way to measure the execution time of any part of a program;
cProfile.run() provides more detailed information like the total number of calls for each function or the total time spent in the function;
The profile class from cProfile is fit for more precise control over profiling.

There are a lot of other things to cover (such as external modules for profiling), but we're going to do it in other topics. For now, let's turn to practice!

61 learners liked this piece of theory. 1 didn't like it. What about you?

Report a typo

Profiling in Python

The time module

Profiling with cProfile.run()

Profile class in cProfile

Conclusion

Related topics