Welcome to the third post in our series on Python at scale at Instagram! As we mentioned in the first post in the series, Instagram Server is a several-million-line Python monolith, and it moves quickly: hundreds of commits each day, deployed to production every few minutes.
We’ve run into a few pain points working with Python at that scale and speed. This article takes a look at a few that we imagine might impact others as well.
Consider this innocuous-looking sample module:
import refrom mywebframework import db, routeVALID_NAME_RE = re.compile("^[a-zA-Z0-9]+$")@route('/')
def home():
return "Hello World!"class Person(db.Model):
name: str
When someone imports this module, what code will run?
- We’ll run a bunch of regex code to compile that string to a pattern object.
- We’ll run the
@route
decorator. Based on what we see here, we can assume that it’s probably registering this view in some url mapping. This means that just by importing this module, we’re mutating global state somewhere else. - We’re going to run all the code inside the body of the
Person
class, which can include arbitrary code. And the Model base class might have a meta-class or an__init_subclass__
method, which is still more arbitrary code we might be running at import.
The only line of code in this module that (probably) doesn’t run on import is return "Hello World!"
, but we can’t even say that for sure! So by just importing this simple eight line module (not even doing anything with it yet!), we are probably running hundreds, if not thousands of lines of Python code, not to mention modifying a global URL mapping somewhere else in our program.
So what? This is part of what it means for Python to be a dynamic, interpreted language. This lets us do all kinds of useful meta-programming. What’s wrong with that?
Nothing is wrong with it, when you’re working with relatively small codebases and teams, and you can guarantee some level of discipline in how you use these features. But some aspects of this dynamism can become a concern when you have millions of lines of code worked on by hundreds of developers, many of whom are new to Python.
For example, one of the great things about Python is how fast you can iterate with it: make a change and see the result, no compile needed! But with a few million lines of code (and a messy dependency graph), that advantage starts to turn sour.
Our server startup takes over 20s, and sometimes regresses to more like a minute if we aren’t paying attention to keeping it optimized. That means 20-60 seconds between a developer making a change and being able to see the results of that change in their browser, or even in a unit test. This, unfortunately, is the perfect amount of time to get distracted by something shiny and forget what you were doing. Most of that time is spent literally just importing modules, creating function and class objects.
In some ways, that’s no different from waiting for another language to compile. But typically compilation can be incremental: you can just recompile the stuff you changed and things that directly depend on it, so many smaller changes can compile quickly. But in Python, because imports can have arbitrary side effects, there is no safe way to incrementally reload our server. No matter how small the change, we have to start from scratch every time, importing all those modules, re-creating all those classes and functions, re-compiling all of those regular expressions, etc. Usually 99% of the code hasn’t changed since last time we reloaded the server, but we have to re-do all that slow work anyway.
In addition to slowing down developers, this is a significant amount of wasted compute in production, too, since we continuously deploy and are thus reloading the site on production servers constantly all day long.
So that’s our first pain point: slow server startup and reload due to lots of wasted repeat work at import time.
Here’s another thing we often find developers doing at import time: fetching configuration from a network configuration source.
MY_CONFIG = get_config_from_network_service()
In addition to slowing down server startup even further, this is dangerous, too. If the network service is not available, we won’t just get a runtime error failing certain requests, our server will fail to start up.
Let’s make this a bit worse, and imagine that someone has added some import-time code in another module that does some critical initialization of the network service. They don’t know where to put this code, so they stick it in some module that happens to get imported pretty early on. Everything works, so they move on.
But then someone else comes along, adds an innocuous import in some other part of the codebase, and through an import chain twelve modules deep, it causes the config-fetching module to now be imported before the one that does the initialization.
Now we’re trying to use the service before it’s initialized, so it blows up. In the best case, where the interaction is fully deterministic, this could still result in a developer tearing their hair out for an hour or two trying to understand why their innocent change is causing something unrelated to break. In a more complex case where it’s not fully deterministic, this could bring down production. And there’s no obvious way to generically lint against or prevent this category of issue.
The root of the problem here is two factors that interact badly:
1) Python allows modules to have arbitrary and unsafe import side effects, and
2) the order of imports is not explicitly determined or controlled, it’s an emergent property of the imports present in all modules in the entire system (and can also vary based on the entry point to the system).
Let’s look at one more category of common errors.
def myview(request):
SomeClass.id = request.GET.get("id")
Here we’re in a view function, and we’re attaching an attribute to some class based on data from the request. Likely you’ve already spotted the problem: classes are global singletons, so we’re putting per-request state onto a long-lived object, and in a long-lived web server process, that has the potential to pollute every future request in that process.
The same thing can easily happen in tests, if people try to monkeypatch without a contextmanager like mock.patch
. The effect here is pollution of all future tests run in that process, rather than pollution of all future requests. This is a huge cause of flakiness in our test suite. It’s so bad, and so hard to thoroughly prevent, that we have basically given up and are moving to one-test-per-process isolation instead.
So that’s a third pain point for us. Mutable global state is not merely available in Python, it’s underfoot everywhere you look: every module, every class, every list or dictionary or set attached to a module or class, every singleton object created at module level. It requires discipline and some Python expertise to avoid accidentally polluting global state at runtime of your program.
One reasonable take might be that we’re stretching Python beyond what it was intended for. It works great for smaller teams on smaller codebases that can maintain good discipline around how to use it, and we should switch to a less dynamic language.
But we’re past the point of codebase size where a rewrite is even feasible. And more importantly, despite these pain points, there’s a lot more that we like about Python, and overall our developers enjoy working in Python. So it’s up to us to figure out how we can make Python work at this scale, and continue to work as we grow.
We have an idea: strict modules.
Strict modules are a new Python module type marked with __strict__ = True
at the top of the module, and implemented by leveraging many of the low-level extensibility mechanisms already provided by Python. A custom module loader parses the code using the ast
module, performs abstract interpretation on the loaded code to analyze it, applies various transformations to the AST, and then compiles the modified AST back into Python byte code using the built-in compile
function.
Strict modules place some limitations on what can happen at module top-level. All module-level code, including decorators and functions/initializers called at module level, must be pure (side-effect free, no I/O). This is verified statically at compile time via the abstract interpreter.
This means that strict modules are side-effect-free on import: bad interactions of import-time side effects are no longer possible! Because we verify this with abstract interpretation that is able to understand a large subset of Python, we avoid over-restricting Python’s expressiveness: many types of dynamic code without side effects are still fine at module level, including many kinds of decorators, defining module-level constants via list or dictionary comprehensions, etc.
Let’s make that a bit more concrete with an example. This is a valid strict module:
"""Module docstring."""
__strict__ = Truefrom utils import log_to_networkMY_LIST = [1, 2, 3]
MY_DICT = {x: x+1 for x in MY_LIST}def log_calls(func):
def _wrapped(*args, **kwargs):
log_to_network(f"{func.__name__} called!")
return func(*args, **kwargs)
return _wrapped@log_calls
def hello_world():
log_to_network("Hello World!")
We can still use Python normally, including dynamic code such as a dictionary comprehension and a decorator used at module level. It’s no problem that we talk to the network within the _wrapped
function or within hello_world
, because they are not called at module level. But if we moved the log_to_network
call out into the outer log_calls
function, or we tried to use a side-effecting decorator like the earlier @route
example, or added a hello_world()
call at module level, this would no longer compile as a strict module.
How do we know that the log_to_network
or route
functions are not safe to call at module level? We assume that anything imported from a non-strict module is unsafe, except for certain standard library functions that are known safe. If the utils
module is strict, then we’d rely on the analysis of that module to tell us in turn whether log_to_network
is safe.
In addition to improving reliability, side-effect-free imports also remove a major barrier to safe incremental reload, as well as unlocking other avenues to explore speeding up imports. If module-level code is side-effect-free, we can safely execute individual statements in a module lazily on-demand when module attributes are accessed, instead of eagerly all at once. And given that the shape of all classes in a strict module are fully understood at compile time, in the future we could even try persisting module metadata (classes, functions, constants) resulting from module execution in order to provide a fast-path import for unchanged modules that doesn’t require re-executing the module-level byte-code from scratch.
Strict modules and classes defined in them are immutable after creation. The modules are made immutable by internally transforming the module body into a function with all of the global variables accessed as closure variables. These changes greatly reduce the surface area for accidental mutation of global state, though mutable global state is still available if you opt-in via module-level mutable containers.
Classes defined in strict modules must also have all members defined in __init__
and are automatically given __slots__
by the module loader’s AST transformation, so it’s not possible to tack on additional ad-hoc instance attributes later. So for example, in this class:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
The strict-modules AST transformation will observe the assignments to attributes name
and age
in __init__
and add an implicit __slots__ = ('name', 'age')
to the class, preventing assignment of any other attributes to instances of the class. (If you are using type annotations, we will also pick up class-level attribute type declarations such as name: str
and add them to the slots list as well.)
These restrictions don’t just make the code more reliable, they help it run faster as well. Automatically transforming classes to add __slots__
makes them more memory efficient and eliminates per-instance dictionary lookups, speeding up attribute access. Transforming the module body to make it immutable also eliminates dictionary lookups for accessing top-level variables. And we can further optimize these patterns within the Python runtime for further benefits.
Strict modules are still experimental. We have a working prototype and are in the early stages of rolling it out in production. We hope to follow up on this blog post in the future, with a report on our experience and a more detailed review of the implementation. If you’ve run into similar problems and have thoughts on this approach, we’d love to hear them!