Why is my program running so slow? For them to be faster, avoid memory leaks and get rid of garbage ASAP! Let's see how to do it on this topic.
Memory leaks
Memory leaks present a significant issue in software development. They occur when memory allocated for variables, references, and objects is not released correctly. It often leads to problems; for example, daemons and servers run indefinitely.
Luckily, we have CPython, an implementation that handles memory cleanup pretty well. It performs automatic memory cleanup, which is often implemented in high-level languages. This is convenient for the developer because you don't have to remember to clear your memory when writing code. Another way to do it is by cleaning the memory manually. There are special tools for that, such as the garbage collector. But first, let's figure out what memory-cleaning methods exist in Python.
Reference counting
There are two methods for memory management in Python: reference counting and generational garbage collection. Reference counting is an automatic memory management system designed to keep track of the references to objects to determine when an object is no longer needed and can be safely deallocated from memory.
In Python, every object contains a reference count, the number of references pointing to that object. When a new reference to an object is created by assigning it to a variable or passing it as an argument to a function, it increases the reference count. Similarly, the reference count is decremented when a reference is deleted or goes out of scope.
Python's reference counting mechanism is efficient and allows immediate memory reclamation when an object's reference count reaches zero. When an object's reference count becomes zero, there are no more references to the object, indicating that it is no longer needed. At this point, Python's garbage collector comes into play to deallocate the memory occupied by the object.
You can find the number of references using the sys module's getrefcount() function. However, note that the count returned by getrefcount() includes temporary references created by the function itself.
import sys
x = [1, 2, 3]
x.append(4)
ref_count = sys.getrefcount(x)
print(ref_count) # 2
While reference counting is efficient for managing memory in many cases, it does have limitations. It struggles with handling circular references, which occur when two or more objects reference each other, forming a loop where their reference counts never reach zero. To address this issue, Python employs an additional garbage collection technique — cyclic garbage collection to identify and collect cyclically referenced objects.
Overall, reference counting in Python provides an automatic and efficient approach to memory management, ensuring that objects are deallocated as soon as they are no longer needed. This contributes to the ease of programming in Python and helps prevent memory leaks.
Generational garbage collection
Generational garbage collection is an advanced memory management technique to improve the efficiency of garbage collection. It is based on the observation that most objects in a program tend to have a relatively short lifespan, meaning they are created and then become garbage relatively quickly.
Python's generational garbage collection divides objects into different generations based on age. The basic idea behind generational garbage collection is that "younger" objects are more likely to become garbage than "older" objects. Therefore, the garbage collector focuses primarily on the younger generations, performing garbage collection more frequently on these objects.
Python's generational garbage collector organizes objects into three generations: 0, 1, and 2. The generation 0 contains the youngest objects, while the generation 2 contains the oldest ones. As objects survive garbage collection cycles, they are promoted to higher generations. The assumption is that if an object survives multiple collection cycles, it will likely persist for longer.
The generational garbage collector in Python employs mark and sweep. During the garbage collection process, it traverses the object graph, starting from root objects (objects explicitly referenced by the program), and marks all reachable objects as live. Any objects that are not marked are considered garbage and can be safely deallocated.
Generational garbage collection provides several benefits. Since most objects become garbage quickly, the majority of the garbage collection effort is focused on the younger generations, reducing the overall time spent on garbage collection. This improves the performance of memory management in Python programs.
Additionally, generational garbage collection helps identify long-lived objects more efficiently. As objects get promoted to higher generations, the garbage collector performs garbage collection less frequently on those objects, reducing the overhead.
Python's generational garbage collection is critical to its automatic memory management system. It ensures efficient memory usage by targeting garbage collection efforts where they are most likely effective, ultimately contributing to the overall performance of Python programs.
Garbage collector
In Python, the garbage collector is the built-in module gc that provides automatic memory management. It helps manage the allocation and deallocation of memory in Python programs by tracking objects no longer referenced and reclaiming their memory.
The main function of gc the module is gc.collect(), which performs immediate garbage collection. The gc.collect() function has an optional generation argument, an integer indicating which generation to collect (between 0 and 2).
import gc
# collected garbage at 0 generation
gc.collect(0) # 32
# collected garbage at each generation
gc.collect() # 147
You can find out the current collection thresholds with the gc.get_threshold() function. This function returns the tuple with three numbers, which shows the threshold value for 0, 1, and 2 generations, respectively. By default, the following values are 700, 10, and 10. However, you can change these values to any others using the gc.set_threshold() function. Let's see how to do it below.
print(gc.get_threshold()) # (700, 10, 10)
# set new threshold
gc.set_threshold(200, 5, 5)
print(gc.get_threshold()) # (200, 5, 5)
Garbage will be automatically collected after the thresholds are reached. You can find out the current collection counts using gc.get_count() the function. The output of this function is a tuple, which indicates how many variables are currently collected in each generation.
gc.get_count() # (128, 3, 1)
It's important to note that the garbage collector is typically enabled by default, and Python's memory management is generally efficient and automatic. In most cases, you don't need to interact with the garbage collector directly. However, there may be situations where fine-tuning or understanding the garbage collection behavior can be helpful, such as when dealing with large or long-running programs.
If you're working with specific memory management requirements or have concerns about memory usage, the gc module can provide some control and insights into Python's memory management process.
Conclusion
In this topic, you have learned:
about two ways to clear memory: reference counting and generational garbage collection;
how to clean memory using the garbage collector module.
Let's try to collect garbage in practice!