Diagnosing and fixing memory leaks in Python
7. März 2017
0 Min. LesezeitEditor's note
This blog originally appeared on fugue.co. Fugue joined Snyk in 2022 and is a key component of Snyk IaC.
Fugue uses Python extensively throughout our cloud security SaaS product and in our support tools, due to its ease-of-use, python security, extensive package library, and powerful language tools. One thing we've learned from building complex software for the cloud is that a language is only as good as its debugging and profiling tools. Logic errors, CPU spikes, and memory leaks are inevitable, but a good debugger, CPU profiler, and memory profiler can make finding these errors significantly easier and faster, letting our developers get back to creating Fugue’s dynamic cloud orchestration and enforcement system. Let’s look at a case in point.
In the fall, our metrics reported that a Python component of Fugue called the reflector was experiencing random restarts and instability after a few days of uptime. Looking at memory usage showed that the reflector's memory footprint increased monotonically and continuously, indicating a memory leak. tracemalloc
, a powerful memory tracking tool in the Python standard library, made it possible to quickly diagnose and fix the leak. We discovered that the memory leak was related to our use of requests
, a popular third-party Python HTTP library. Rewriting the component to use urllib
from the Python standard library eliminated the memory leak. In this blog, we'll explore the details.
Memory Allocation in Python
In most scenarios, there's no need to understand memory management in Python beyond knowing that the interpreter manages memory for you. However, when writing large, complex Python programs with high stability requirements, it’s useful to peek behind the curtain to understand how to write code that interacts well with Python's memory management algorithms. Notably, Python uses reference counting and garbage collection to free memory blocks, and only frees memory to the system when certain internal requirements are met. A pure Python script will never have direct control over memory allocation in the interpreter. If direct control over memory allocation is desired, the interpreter's memory allocation can be bypassed by writing or using an extension. For example, numpy
manages memory for large data arrays using its own memory allocator.
Fundamentally, Python is a garbage-collected language that uses reference counting. The interpreter automatically allocates memory for objects as they are created and tracks the number of references to those objects in a data structure associated with the object itself. This memory will be freed when the reference count for those objects reaches zero. In addition, garbage collection will detect cycles and remove objects that are only referenced in cycles. Every byte of memory allocated within the Python interpreter is able to be freed between these two mechanisms, but no claims can be made about memory allocated in extensions.
Python manages its own heap, separate from the system heap. Memory is allocated in the Python interpreter by different methods according to the type of the object to be created. Scalar types, such as integers and floats, use different memory allocation methods than composite types, such as lists, tuples, and dictionaries. In general, memory is allocated on the Python heap in fixed-size blocks, depending on the type. These blocks are organized into pools, which are further organized into arenas. Memory is pre-allocated using arenas, pools, and blocks, which are then used to store data as needed over the course of program’s execution. Since these blocks, pools, and arenas are kept in Python's own heap, freeing a memory block merely marks it as available for future use in the interpreter. Freeing memory in Python does not immediately free the memory at the system level. When an entire arena is marked as free, its memory is released by the Python interpreter and returned to the system. However, this may occur infrequently due to memory fragmentation.
Due to these abstractions, memory usage in Python often exhibits high-water-mark behavior, where peak memory usage determines the memory usage for the remainder of execution, regardless of whether that memory is actively being used. Furthermore, the relationship between memory being "freed" in code and being returned to the system is vague and difficult to predict. These behaviors make completely understanding the memory usage of complex Python programs notoriously difficult.
Memory Profiling Using tracemalloc
tracemalloc
is a package included in the Python standard library (as of version 3.4). It provides detailed, block-level traces of memory allocation, including the full traceback to the line where the memory allocation occurred, and statistics for the overall memory behavior of a program. The documentation is available here and provides a good introduction to its capabilities. The original Python Enhancement Proposal (PEP) introducing it also has some insight on its design.
tracemalloc
can be used to locate high-memory-usage areas of code in two ways:
looking at cumulative statistics on memory use to identify which object allocations are using the most memory, and
tracing execution frames to identify where those objects are allocated in the code.
Module-level Memory Usage
We start by tracing the memory usage of the entire program, so we can identify, at a high level, which objects are using the most memory. This will hopefully provide us with enough insight to know where and how to look more deeply. The following wrapper starts tracing and prints statistics when Ctrl-C is hit:
import tracemalloctracemalloc.start(10)
try:
run_reflector()
except:
snapshot = tracemalloc.take_snapshot()
top_n(25, snapshot, trace_type='filename')
tracemalloc.start(10)
starts memory tracing, while saving 10 frames of traceback for each entry. The default is 1, but saving more traceback frames is useful if you plan on using tracebacks to locate memory leaks, which will be discussed later. tracemalloc.take_snapshot()
takes a snapshot of currently allocated memory in the Python heap. It stores the number of allocated blocks, their size, and tracebacks to identify which lines of code allocated which blocks of memory. Once a snapshot is created, we can compute statistics on memory use, compare snapshots, or save them to analyze later. top_n
is a helper function I wrote to pretty print the output from tracemalloc
. Here, I ask for the top 25 memory allocations in the snapshot, grouped by filename. After running for a few minutes, the output looks like this:
[ Top 25 with filename tracebacks ]
197618 blocks 17.02311134338379 MB/Users/mike/.pyenv/versions/3.4.2/lib/python3.4/collections/__init__.py:0: size=17.0 MiB,
count=197618,
average=90 B105364 blocks 11.34091567993164 MB frozen importlib._bootstrap:0:
size=11.3 MiB,
count=105364,
average=113 B60339 blocks 9.233230590820312 MB/Users/mike/.pyenv/versions/3.4.2/lib/python3.4/json/decoder.py:0:
size=9455 KiB,
count=60339,
average=160 B...
This shows the cumulative amount of memory allocated by the component over the entire runtime, grouped by filename. At this level of granularity, it's hard to make sense of the results. For instance, the first line shows us that 17 MB of collections
objects are created, but this view doesn't provide enough detail for us to know which objects, or where they're being used. A different approach is needed to isolate the problem.
Understanding tracemalloc Output
tracemalloc
shows the net memory usage at the time a memory snapshot is taken. When comparing two snapshots, it shows the net memory usage between the two snapshots. If memory is allocated and freed between snapshots, it won't be shown in the output. Therefore, if snapshots are created at the same point in a loop, any memory allocations visible in the differences between two snapshots are contributing to the long-term total amount of memory used, rather than being a temporary allocation made in the course of execution.
In the case of reference cycles that require garbage collection, uncollected cycles are recorded in the output, while collected cycles are not. Any blocks freed by the garbage collector in the time covered by a snapshot will be recorded as freed memory. Therefore, forcing garbage collection with gc.collect()
before taking a snapshot will reduce noise in the output.
Per-Iteration Memory Usage
Since we're looking for a memory leak, it's useful to understand how the memory usage of our program changes over time. We can instrument the main loop of the component, to see how much memory is allocated in each iteration, by calling the following method from the main loop:
def collect_stats(self):
self.snapshots.append(tracemalloc.take_snapshot())
if len(self.snapshots) 1:
stats = self.snapshots[-1].filter_traces(filters).compare_to(self.snapshots[-2], 'filename')
for stat in stats[:10]:
print("{} new KiB {} total KiB {} new {} total memory blocks: ".format(stat.size_diff/1024, stat.size / 1024, stat.count_diff ,stat.count))
for line in stat.traceback.format():
print(line)
This code takes a memory snapshot and saves it, then uses snapshot.compare_to(other_snapshot, group_by='filename')
to compare the newest snapshot with the previous snapshot, with results grouped by filename. After a few iterations to warm up memory, the output looks like this:
[ Top 5 with filename tracebacks ]190.7421875
new KiB 1356.5634765625 total KiB 1930
new 13574 total memory blocks:
(1) File "/Users/mike/.pyenv/versions/3.4.2/lib/python3.4/linecache.py",
line 02.1328125
new KiB 12.375 total KiB 32
new 86 total memory blocks:
(2) File "/Users/mike/.pyenv/versions/3.4.2/lib/python3.4/tracemalloc.py",
line 01.859375
new KiB 18.7001953125 total KiB 3
new 53 total memory blocks:
(3) File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests/packages/urllib3/connection.py",
line 0-1.71875
new KiB 34.5224609375 total KiB -2
new 91 total memory blocks: File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py",
line 01.66015625 new KiB 61.662109375 total KiB 18 new 260 total memory blocks:
File "/Users/mike/.pyenv/versions/3.4.2/lib/python3.4/urllib/parse.py", line 0
The linecache
(1) and tracemalloc
(2) allocations are part of the instrumentation, but we can also see some memory allocations made by the requests
HTTP package (3) that warrant further investigation. Recall that tracemalloc tracks net memory usage, so these memory allocations are accumulating on each iteration. Although the individual allocations are small and don't jump out as problematic, the memory leak only becomes apparent over the course of a few days, so it's likely to be a case of small losses adding up.
Filtering Snapshots
Now that we have an idea of where to look, we can use tracemalloc's filtering capabilities to show only memory allocations related to the requests package:
from tracemalloc
import Filter
filters = [Filter(inclusive=True, filename_pattern="*requests*")]
filtered_stats = snapshot.filter_traces(filters).compare_to(old_snapshot.filter_traces(filters), 'traceback')
for stat in stats[:10]:
print("{}
new KiB {}
total KiB {}
new {}
total memory blocks: ".format(stat.size_diff/1024, stat.size / 1024, stat.count_diff ,stat.count))
for line in stat.traceback.format():
print(line)
snapshot.filter_traces()
takes a list of Filters
to apply to the snapshot. Here, we create a Filter
in inclusive
mode, so it includes only traces that match the filename_pattern
. When inclusive
is False
, the filter excludes traces that match the filename_pattern
. The filename_pattern
uses UNIX-style wildcards to match filenames in the traceback. In this example, the wildcards in "requests" match occurrences of "requests" in the middle of a path, such as "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests/sessions.py"
.
We then use compare_to() to compare the results to the previous snapshot. The filtered output is below:
48.7890625
new KiB 373.974609375 total KiB 4
new 1440 total memory blocks:
(4) File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests/structures.py",
line 01.46875
new KiB 16.2939453125 total KiB 2
new 49 total memory blocks:
File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests_unixsocket/__init__.py",
line 0 -1.4453125
new KiB 34.2802734375 total KiB -2
new 96 total memory blocks:
(5) File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests/sessions.py",
line 0-0.859375
new KiB 31.8505859375 total KiB -1
new 85 total memory blocks:
File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py",
line 00.6484375
new KiB 20.8330078125 total KiB 1
new 56 total memory blocks:
File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests/packages/urllib3/connection.py", line 0
With the Filter
in place, we can clearly see how requests
is using memory. Line (4) shows that roughly 50 KiB of memory is lost in requests
on each iteration of the main loop. Note that negative memory allocations, such as (5), are visible in this output. These allocations are freeing memory allocated in previous loop iterations.
Tracking Down Memory Allocations
To determine which uses of requests
are leaking memory, we can take a detailed look at where problematic memory allocations occur by calling compare_to()
with traceback
instead of filename
, while using a Filter
to narrow down the output:
stats = snapshot.filter_traces(filters).compare_to(old_snapshot.filter_traces(filters), 'traceback')
This prints 10 frames of traceback (since we started tracing with tracemalloc.start(10)
) for each entry in the output, a truncated example of which is below:
5 memory blocks: 4.4921875 KiB
File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests/sessions.py",
line 585
r = adapter.send(request, **kwargs)
File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests/sessions.py",
line 475
resp = self.send(prep, **send_kwargs)
File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests_unixsocket/__init__.py",
line 46
return session.request(method=method, url=url, **kwargs)
File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests_unixsocket/__init__.py",
line 60
return request('post', url, data=data, json=json, **kwargs)
The full traceback gives us the ability to trace backwards from memory allocations to the lines in our project code that generate them. In the case of this component, our uses of requests
came from an internal storage library that used an HTTP API. Rewriting the library to use urllib
directly eliminated the memory leak.
Memory Profiling: Art or Science?
tracemalloc
is a powerful tool for understanding the memory usage of Python programs. It helped us understand module-level memory usage, find out which objects are being allocated the most, and it demonstrated how the reflector’s memory usage changed on a per-iteration basis. It comes with useful filtering tools and gives us the ability to see the full traceback for any memory allocation. Despite all of its features, however, finding memory leaks in Python can still feel like more of an art than a science. Memory profilers give us the ability to see how memory is being used, but oftentimes it’s difficult to find the exact memory allocation that is causing problems. It’s up to us to synthesize the information we get from our tools into a conclusion about the memory behavior of the program, then make a decision about what actions to take from there.
We use virtually every available Python tool (test frameworks, cProfile, etc.) to make Fugue’s system reliable, performant, and easy to maintain. The broker and reflector both take advantage of Python's introspection to make judgments about dynamic calls to the AWS API, which allows us to focus on logic rather than coding exhaustive cases. Fugue leverages the strengths of Python where it makes sense in the system, which ultimately means more product stability and extensibility for end-users.
IaC-Sicherheit mit Developer-Mindset
Snyk sichert Infrastructure-as-Code ab, vom SDLC bis zur Runtime in der Cloud. Zentral über eine Engine für Policy as Code, mit der Ihre Teams die Cloud durchgängig sicher machen – von der Entwicklung der Infrastruktur bis hin zu ihrem Deployment und Betrieb.