Python memory management: how much data types take and in what cases

SHARE

Memory management in Python is very important for software developers to work effectively with any programming language. We cannot lose sight of the importance of memory management when implementing large amounts of data. Improper memory management slows down the application and server components.

In this article, we will analyze how memory allocation for objects in Python works. Then we’ll briefly mention how works clearing memory from unused objects. And, finally, we’ll discuss the difference in occupied memory by the example of list, dict and tuple types.

Memory allocation

Memory is not allocated directly from the code. All work on memory allocation is shifted to memory managers. There is one general manager of the “top” level, which is responsible for allocating a large block from the memory allocated to the program – the “arena“. It occupies 256Kb.

Further, the arena is divided into “pools” of 4Kb. Each pool can contain only blocks of a size predetermined for this pool – from 8 to 512 bytes. The arena can contain pools of different sizes, but the blocks in the same pool are always the same size. And at the block level, the memory managers work for each specific data type.

When a manager of a certain type requests memory for an object, it knows the size in advance and can immediately access the pool with the desired block size, placing the object in the first free block. If there are no free blocks or there are no pools of the required size, the top manager issues a new pool from the most filled arena. If all arenas are occupied, a new arena is requested.

It is interesting that it is impossible to partially return the memory allocated for the arena as long as it has at least one non-empty pool in which there is at least one non-empty block. We’ll discuss in the next section how blocks become empty.

Memory deallocation

In Python, there is no need for manual memory cleanup. If the object is no longer used, all this is passed on to the interpreter itself and two mechanisms: the reference counter and the garbage collector.

When a variable is assigned a value, in fact, an object is first created in a suitable free block (how memory allocation works was figured out in the last section) and then a reference to this object is put into the variable.

Reference counter

Each created object has a special field – the reference count. It stores the number of objects referring to it. It increments its value, for example, when an assignment operator is used; or when an object becomes part of a list. When a variable is deleted, or when del is used, the reference count is decremented by 1, for example, at the end of the function where this variable was declared.

Let’s look at an example of a small piece of code:


a = 1000
b = a
print(sys.getrefcount(1000))
# 5
del b
print(sys.getrefcount(1000))
# 4

In this case, 1000 is an immutable object, one for the entire program. After the two variables are initialized, the reference count is 5.

Why 5? Because object 1000 is referenced not only by these two variables, but by all variables with value 1000 in all used modules. Next, we delete the variable b and the reference counter changes its value to 4. As soon as the counter reaches 0, the object releases the block.

Garbage collector

The reference counter has a big drawback – the inability to catch cyclic links. If an object or multiple objects refer to each other, the counter will never drop below 1.

The gc module was created specifically to handle such cases. Its job is to periodically scan objects of the container type and detect the presence of circular references.

Speaking about the garbage collector, an important concept is the generation of objects – this is a set of objects that the collector monitors and subsequently scans. There are three generations, each scanned at a different frequency. Objects of the first generation are scanned more often, because new objects get there. As a rule, such objects have a short lifespan and are temporary. Therefore, it is advisable to check them with more attention. If the objects have been scanned by the garbage collector and have not been deleted, then they move to the next generation.

 

The first generation scan starts when the number of created container-type objects exceeds the number of deleted ones by a given threshold. For example, a second generation scan will start when the number of first generation scans exceeds a given threshold. By default, the trigger thresholds are 700, 10, and 10, respectively.

You can view them via gc.get_threshold() and change via gc.set_threshold().


print(gc.get_threshold())
# (700, 10, 10)

All this is relevant for list, dict, tuple and also for classes. Not for simple types.

What about data types

List

Let’s start the comparisons with list. Since it is variable, it will be interesting to see how it behaves when the number of elements changes. The list is an array of non-fixed size, with the possibility of arbitrary insertion or removal of any element. The list element can be any data type. In terms of storing the list in memory, it consists of two blocks. One block is of fixed size and stores information about the list. The other one stores references to elements and can move from block to block if the number of elements changes.

The size of the memory block occupied by an object can be viewed using sys.getsizeof().


empty_list = []
print(sys.getsizeof(empty_list))
# 64

As we can see, the empty list object already takes 64 bytes.


a = [1, 2, 3]
print(sys.getsizeof(a))
# 88

In the usual creation of a list through the enumeration of elements, it will always take: 64 bytes of an empty list + 8 bytes for each element, because the list is an array of object references.


b = [i for i in range(3)]
print(sys.getsizeof(b))
# 96

When created through a list comprehension, the size will be already 96 bytes that is more than the size of an empty list + 8 bytes per element. The operation of this mechanism comes down to calling the append method on the list object being created. Append works like this: depending on the number of elements already present in the list, it pre-allocates more memory when an element is added. The extra amount of memory allocated does not always double the list. Space can be allocated for just a few elements. For example, if one more item is added to a list of 8 items, space will be reserved for another 8 items. And when added to a list of 16 items, space will be reserved for only 9 items. This avoids the cost of resizing the list with frequent append requests. In this case, unused, but already allocated, memory is not available for reading.


c_tuple = (1, 2, 3)
c = list(c_tuple)
print(sys.getsizeof(c))
# 112

When creating a list based on Tuple, the resulting list will take up 112 bytes. In this case, space is reserved in advance for 3 more list elements, in addition to those already present in the tuple.

In total, we get 64 bytes, + 8 * 3 are elements from Tuple, + 8 * 3 is the reserved space for new elements.

Tuple

Tuple is an array of fixed length specified when the object was created. The elements of a tuple can also be objects of any type. Unlike a list, tuple is represented in memory by a single object as there is no changeable part that needs to be moved between blocks. Yes, and tuple also does not have methods for changing elements. But if the element itself is of a mutable type, it can still be changed.


empty_tuple = ()
print(sys.getsizeof(empty_tuple))
# 48

An empty tuple occupies a block of 48 bytes. Remember that tuples are immutable. If there is already the same tuple in memory, then a new object will not be created. Instead, the variable is assigned a reference to an existing object.


empty_tuple = ()
empty_tuple_2 = ()
print(id(empty_tuple) == id(empty_tuple_2))
# True

The example shows that the memory addresses of the two tuples are the same. A similar comparison for lists would return False.


a = (1, 2, 3, 4)
print(sys.getsizeof(a))
# 80

In the case of a non-empty memory tuple, everything is just as simple. The block size will be 48 bytes + 8 bytes for each element.


b_list = [1, 2, 3, 4]
b = tuple(b_list)
print(sys.getsizeof(b))
# 80

With the creation of tuple from a list or other collection, there are also no such surprises as with a list as there’s no need to lay a place for changing the number of elements.

Dict

A dict is an array of keys and an array of values, where each key is associated with one value. The key is subject to a uniqueness constraint within the dict. Therefore, only objects of immutable types can be keys. The value can be an object of any type.

Like lists, dicts are stored as two objects. The first contains information about the dict itself and always stays in the same block. The second one stores key-value pairs and can move between blocks when resized. But at the same time, an empty dict takes up much more space.


empty_dict = {}
print(sys.getsizeof(empty_dict))
# 240

As you can see in the example, the dict is 240 bytes. When creating a dict, space is allocated for several elements, and not only after adding an element, as was with the list.

If you call the clear method, not only all elements are cleared. The initially reserved memory will also be freed.


empty_dict = {}
print(sys.getsizeof(empty_dict))
# 240
empty_dict.clear()
print(sys.getsizeof(empty_dict))
# 72

As a result, the dict began to occupy only 72 bytes – much less than when created.

As with a list, for each new key-value pair, the entire object is not moved to a new block. To avoid frequent relocation costs, a new block is taken with a margin of several elements.

Conclusion

Knowing how much memory an object will take up is useful. For example, the terms of a project or task specify hard memory limits. Or a piece of code is called very often, you need to understand whether allocating / freeing memory will be a bottleneck in your system.

Therefore, I recommend that you estimate the approximate memory costs, even at the time of writing the code. And no one cancels the reuse of existing objects.

If you are interested in this topic and decide to delve into it, then pay attention to the following materials:

— Memory management in Python. It describes in detail the mechanism of operation of memory managers and the organization of blocks.

— Memory Management.Don’t forget about the official documentation.

— Garbage Collection for Python. Here you’ll find the in-detail info about the algorithm of the garbage collector.

— The Garbage Collector. One more article about the Garbage Collector