Lazy imports for Python
Starting a Python application typically results in a flurry of imports as modules from various locations (and the modules they import) get added into the application process. All of that occurs before the application even gets started doing whatever it is the user actually launched it for; that delay can be significant—and annoying. Beyond that, many of those imports may not be necessary at all for the code path being followed, so eagerly doing the import is purely wasted time. A proposal back in May would add a way for applications to choose lazy imports, where the import is deferred until the module is actually used.
PEP 690
The lazy imports proposal was posted
to the Python discussion forum by one of its authors, Germán
Méndez Bravo. He noted that the feature is in use in the Cinder CPython fork
at Meta, where it has demonstrated
"startup time improvements up to 70% and memory-use reductions up to
40%
" on real-world Python command-line tools. So he and Carl Meyer
teamed up on PEP 690
("Lazy Imports") to propose the feature for CPython itself; since neither
of them is a core developer, Barry Warsaw stepped up as the PEP's sponsor.
The PEP has changed some since it was posted based on feedback in the
discussion, some of which will be covered below; the May 3
version can be found at GitHub, along with other
historical
versions.
The core of the idea is the concept of a lazy reference that is not visible to Python programs; it is purely a construct in the C code of the interpreter. When run with lazy imports enabled, a statement like import foo will simply add the name foo to the global namespace (i.e. globals()) as a lazy reference; any access to that name will cause the import to be executed, so the lazy reference acts like a thunk. Similarly, from foo import bar will add bar to the namespace, such that when it is used it will be resolved as foo.bar, which will import foo at that time.
The original proposal enabled lazy imports by way of a command-line flag (-L) to the interpreter or via an environment variable. But Inada Naoki pointed out that it would be better to have an API to enable lazy imports. For an application like Mercurial, which already uses a form of lazy importing but might want to switch to the new mechanism, setting an environment variable just for the tool is not sensible and adding a command-line argument to the "#!/usr/bin/env python" shebang (or hashbang) line of a Python script is not possible on Linux. Meyer agreed that an API (e.g. importlib.set_lazy_imports()) should be added.
The PEP specifically targets application developers as the ones who should choose lazy imports and test their applications to ensure that it works. The PEP says:
Since lazy imports are a potentially-breaking semantic change, they should be enabled only by the author or maintainer of a Python application, who is prepared to thoroughly test the application under the new semantics, ensure it behaves as expected, and opt-out any specific imports as needed (see below). Lazy imports should not be enabled speculatively by the end user of a Python application with any expectation of success.It is the responsibility of the application developer enabling lazy imports for their application to opt-out any library imports that turn out to need to be eager for their application to work correctly; it is not the responsibility of library authors to ensure that their library behaves exactly the same under lazy imports.
The environment variable for enabling the feature fell by the wayside, but the -L option remains and an explicit API was added. There are also ways for developers to opt out of lazy imports; the PEP proposes a few different mechanisms. To start with, there are several types of imports that will never be processed lazily, regardless of any settings. For example:
import importlib # not lazy (aka "eager")
# make subsequent imports lazy
importlib.set_lazy_imports()
import foo # lazy
from bar import baz # lazy
from xyz import * # star imports are always eager
try:
import abc # always eager in try except block
except ImportError:
import def # eager
with A() as a:
import ghi # always eager inside with block
Imports that are not at the top level (i.e. outside of any class or function
definition) are also always eager. If a developer knows that an import
needs to be done eagerly, presumably because the side effects from importing
it need to happen before the rest of the code is executed—or perhaps
because the module does
not work correctly when lazily imported—the import can be done in a
try block or the proposed new context manager can be used:
from importlib import eager_imports
with eager_imports():
import foo # eager
For third-party code that cannot (or should not) be modified, there is an
exclude list available, which will force modules on the list to be
eagerly imported when they are encountered:
from importlib import set_lazy_imports
set_lazy_imports(excluding=['foo', 'bar.baz'])
In that example, foo and bar.baz will be eagerly
imported, though bar is still lazily imported, as are all of the
imports contained in foo and bar.baz.
Libraries
Several library authors expressed concerns that they would effectively be forced to support (and test) lazy imports of their library, which is an added burden for maintainers. Thomas Kluyver put it this way:
Realistically, we won't get to tell everyone that if they want to use our library they can't use this new lazy import thing that Python just added. Especially as it's meant to make startup faster, and performance tricks always get cargo-culted to people who don't want to think about what they mean (one weird trick to make your Python scripts start 70% faster!). Within a year or so of releasing a version of Python with this option, we'll probably have to ensure our libraries and examples work with and without it. I'm sure we'd manage, but please remember that opt-in features for application developers aren't really optional for library developers.
Marc-Andre Lemburg said
that it makes more sense to explicitly choose which imports are done
lazily. Enabling it globally for a large code base is potentially
dangerous; instead something like "lazy import foo"
should be used for imports that are only accessed from some subset of the
program. He acknowledged that can already be done, by placing the import
where the functionality is being used, but thought that explicitly calling
out the lazy imports was a better approach. Gregory P. Smith disagreed:
"The startup time benefit of lazy imports only comes from enabling them
broadly, not by requiring all code in your application's entire transitive
dependencies to be modified to declare most of their imports as lazy.
"
Meyer wondered about real world examples of the kinds of problems that library authors might encounter. There is a persistent idea in the discussion about libraries opting into being imported lazily, but he does not think that makes sense. However, library authors may not really want to determine whether their library can be imported that way, Paul Moore said:
My concern is more that as a library developer I have no intention of even thinking about whether my code is "lazy import safe". I just write "normal" Python code, and if my test suite passes, I'm done. I don't particularly want to run my test suite twice (with and without lazy imports) and even if I did, what am I supposed to do if something fails under lazy imports? The fact that it works under "normal" imports means it's correct Python, so why should make my life harder by avoiding constructs just to satisfy an entirely theoretical possibility that someone might want to import my code lazily?
Meyer said
that was a reasonable position for a library developer to take, but also
recognized that the maintainer "might get user complaints about it, and
this is a significant cost of the PEP
". He also pointed
out
that most of the concerns being raised also apply to the existing importlib.util.LazyLoader
class, which provides a more limited kind of lazy imports. Beyond
that, there is no real
way to decide that a module is "safe" for lazy import:
What I think the discussions of "library opt-out" are missing is that "safe for lazy imports" is fundamentally not even a meaningful or coherent property of a single module or library in isolation. It is only meaningful in the context of an actual application codebase. This is because no single module or library can ever control the ordering of imports or how the import-time code path flows: it is an emergent property of the interaction of all modules in the codebase and their imports.[...] I think the nature of the opt-out in PEP 690 is not well understood. It is not an exercise in categorizing modules into neatly-defined objective categories of "safe for lazy import" and "not safe for lazy import." (If it were, the only possible answer would be that no module is ever fully lazy import safe.) Rather, it is a way for an application developer to say "in the context of my specific entire application and how it actually works, I need to force these particular imports to be eager in order for the effects I actually need to happen in time."
Warsaw agreed
with that; library authors "can't declare their modules safe for lazy
import because
they have no idea how their libraries are consumed or in what order they
will be imported
". On the other hand, application authors are in a
position to work all of that out:
As an application author though, I know everything I need to know about what modules I consume, how they are imported, and whether they are safe or not. At least theoretically. Nobody is in a more advantageous position to understand the behavior of my application, and to make declarations about what modules can and cannot be safely lazily imported. And nobody else is in a position to actually test that assumption.To me, the PEP gives the application author end-consumer the tools they need to build a lazy-friendly application.
As one data point on the real-world prevalence of lazy-import problems,
Meyer said: "the Instagram Server codebase is multiple million lines of
code, uses lazy imports applied globally, and has precisely five modules
opted out
". Méndez Bravo published
a lengthy blog post that described the process of converting that code base
to use lazy imports. For the most part, the problems encountered were not
due to importing libraries; even third-party and standard library modules
largely just worked when lazy imports was enabled globally.
Toward the end of June the discussion picked back up when Matplotlib developer Thomas A Caswell reiterated
the
concerns about the feature's impact on libraries and their authors,
though he is "still enthusiastic about this proposal
". Matplotlib
and other SciPy libraries have lengthy
import times and have tried various ways of deferring imports but "at
every step discovered a subtle way that a user was relying on a
side-effect
". He expects that PEP 690 will "produce a stream of
really interesting bugs across the whole ecosystem
", though he would be
happy to be wrong about that.
David Lord, who helps maintain Flask, Jinja, and other
libraries, focused
on the push from users to support lazy imports in libraries. He
said that other features added to Python over the years
(asyncio
and typing)
had created a lot of extra work when users clamored for them to be
supported. "I really hope this doesn't add a third huge workload to my
list of things to juggle as a maintainer.
"
Moore is worried
that users will perceive lazy imports as a magic button they can press for
better performance:
My fear is that most users will get the impression that "enable lazy imports" is essentially a "go_faster=True" setting, and will enable them and have little or no ability to handle the (inevitable) problems. They will therefore push the issue back to library maintainers.
He is in favor of improving startup time and reducing the cost of imports
but would prefer to see it done with some form of opt-in for library
authors. Meyer reminded
everyone that a form of the feature already exists in the language "in a very
similar global opt-in way
" with LazyLoader. The PEP makes the
feature "more usable and more effective and faster
", however, which
may make it more popular, thus library developers may see more user
requests. Furthermore,
the existence of
LazyLoader has not led to the problems envisioned: "The Python
ecosystem doesn't seem to have been overwhelmed by people trying it 'just
to see if it makes things faster.'
" But it may be that
LazyLoader is not all that well-known so it has not been
(ab)used much.
The discussion has wound down at this point, though in early August, Mark
Shannon argued
that the PEP "just feels too magical
" because it does not use an
explicit mechanism to mark lazy imports
(e.g. lazy import foo). He said that "explicit
approaches have been rejected with insufficient justification
". Warsaw
disagreed
and thought that the PEP did justify its choices; he encouraged those who
want to see an explicit approach to create a competing PEP.
Méndez Bravo recounted
the process he went through when converting the Instagram code, which
started out with an explicit approach. As he worked through it, he
realized that nearly all of the imports could be done lazily so he switched
to the global approach. All in all, it worked well:
There are many different types of uses of Python and some communities have different patterns, but all the evidence we do have is that the percentage of modules we tried and worked without any issues out of the box with lazy imports enabled was high, and that just enabling lazy imports in a few modules doesn't yield many benefits at all. The true power comes when you enable laziness in whole systems. We've saved terabytes of memory in some systems and reduced start times from minutes to just a few seconds, just by making things lazy.
Opinions are split on PEP 690, but it seems clear that it provides a
useful tool for some. Python creator Guido van Rossum is in
favor: "I am eager to have this available even if there are
potential problems
", but others are less enthusiastic even though the
underlying problem is widely acknowledged. The PEP is targeted at the 3.12
release, which does not have a feature freeze until next May, so there
is still plenty of time. One
might guess that the next step is to ask the steering council to decide on
the PEP. The outcome of that is not obvious, though if more people start using
lazy imports in Cinder without major problems, it might help sway the
decision. Time will tell.
| Index entries for this article | |
|---|---|
| Python | Import |
| Python | Python Enhancement Proposals (PEP)/PEP 690 |
