From 23c09f51ae8071b712ba243ba80ae0795153c036 Mon Sep 17 00:00:00 2001
From: Brandt Bucher <brandtbucher@microsoft.com>
Date: Mon, 8 Apr 2024 08:53:57 -0700
Subject: [PATCH 01/14] Everything except Rejected Ideas and Open Issues

---
 peps/pep-0744.rst | 532 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 532 insertions(+)
 create mode 100644 peps/pep-0744.rst

diff --git a/peps/pep-0744.rst b/peps/pep-0744.rst
new file mode 100644
index 00000000000..97a85e778ad
--- /dev/null
+++ b/peps/pep-0744.rst
@@ -0,0 +1,532 @@
+PEP: 744
+Title: JIT Compilation
+Author: Brandt Bucher <brandt@python.org>
+Discussions-To:
+Status: Draft
+Type: Informational
+Created: 09-APR-2024
+Python-Version: 3.13
+Post-History:
+
+Abstract
+========
+
+.. A short (~200 word) description of the technical issue being addressed.
+
+Earlier this year, an `experimental "just-in-time" compiler
+<https://github.com/python/cpython/pull/113465>`_ was merged into CPython's
+``main`` development branch. While recent CPython releases have included other
+substantial internal changes, this addition represents a particularly
+significant departure from the way CPython has traditionally executed Python
+code (and thus, deserves wider discussion).
+
+This PEP aims to summarize the design decisions behind this addition, the
+current state of the implementation, and future plans for making the JIT a
+permanent, non-experimental part of CPython. It does *not* seek to provide a
+comprehensive overview of *how* the JIT works, instead focusing on the
+particular advantages and disadvantages of the chosen approach, as well as
+answering many questions that have been asked about the JIT since its
+introduction.
+
+Readers interested in learning more about the new JIT are encouraged to consult
+the following resources:
+
+- The `presentation <https://youtu.be/HxSHIpEQRjs>`_ which first introduced the
+  JIT at the 2023 CPython Core Developer Sprint. It includes relevant
+  background, a light technical introduction to the "copy-and-patch" technique
+  used, and an open discussion of its design amongst the core developers
+  present.
+- The `paper <https://dl.acm.org/doi/10.1145/3485513>`_ originally describing
+  copy-and-patch.
+- The `blog post <https://sillycross.github.io/2023/05/12/2023-05-12>`_ by the
+  paper's author detailing the implementation of a copy-and-patch JIT compiler
+  for Lua. (While this is a great low-level explanation of the approach, it also
+  incorporates other techniques and makes implementation decisions that are not
+  relevant to CPython's JIT.)
+- The `implementation <#reference-implementation>`_ itself.
+
+Motivation
+==========
+
+.. Clearly explain why the existing language specification is inadequate to
+   address the problem that the PEP solves.
+
+Until this point, CPython has always executed Python code by compiling it to
+bytecode, which is interpreted at runtime. This bytecode is a more-or-less
+direct translation of the source code itself: it is untyped, and largely
+unoptimized.
+
+Since the Python 3.11 release, CPython has used a "specializing adaptive
+interpreter" (:pep:`659`), which `rewrites these bytecode instructions in-place
+<https://youtu.be/shQtrn1v7sQ>`_ with type-specialized versions as they run.
+This new interpreter delivers significant performance improvements, despite the
+fact that its optimization potential is limited by the boundaries of individual
+bytecode instructions. It also collects a wealth of new profiling information:
+the types flowing though a program, the memory layout of the data being operated
+upon, and what paths through the program are being executed the most. In other
+words, *what* to optimize, and *how* to optimize it.
+
+Since the Python 3.12 release, CPython has generated this interpreter from a
+`C-like domain-specific language
+<https://github.com/python/cpython/blob/main/Python/bytecodes.c>`_. In addition
+to taming some of the complexity of the the new adaptive interpreter, the DSL
+also allows CPython's maintainers to avoid hand-writing tedious boilerplate code
+in many parts of the interpreter, compiler, and standard library that must be
+kept in sync with the instruction definitions. This ability to generate large
+amounts of runtime infrastructure from a single source of truth is not only
+convenient for maintenance; it also unlocks many possibilities for expanding
+CPython's execution in new ways. For instance, it makes it feasible to
+automatically generate tables for translating a sequence of instructions into an
+equivalent sequence of smaller "micro-ops", generate an optimizer for sequences
+of these micro-ops, and even generate an entire second interpreter for executing
+them.
+
+In fact, since early in the Python 3.13 release cycle, all CPython builds have
+included this exact micro-op translation, optimization, and execution machinery.
+However, it is disabled by default; the overhead of interpreting even optimized
+traces of micro-ops is just too large for most code. Heavier optimization
+probably won't improve the situation much either, since any efficiency gains
+made by new optimizations will likely be offset by the interpretive overhead of
+even smaller, more complex micro-ops.
+
+The most obvious strategy to overcome this new bottleneck is to statically
+compile these optimized traces. This presents opportunities to avoid several
+sources of indirection and overhead introduced by interpretation. In particular,
+it allows the removal of dispatch overhead between micro-ops (by replacing a
+generic interpreter with a straight-line sequence of hot code), instruction
+decoding overhead for individual micro-ops (by "burning" the values or addresses
+of arguments, constants, and cached values directly into machine instructions),
+and memory traffic (by moving data off of heap-allocated Python frames and into
+physical hardware registers).
+
+Since much of this data varies between runs of a program (even functionally
+identical ones) and the existing optimization pipeline makes heavy use of
+runtime profiling information, it doesn't make much sense to compile these
+traces ahead of time. As has been demonstrated for many other dynamic languages
+(`even Python itself <https://www.pypy.org/>`_), the most promising approach is
+to compile the optimized micro-ops "just in time" for execution.
+
+Rationale
+=========
+
+.. Describe why particular design decisions were made.
+
+Despite their reputation, JIT compilers are not magic "go faster" machines.
+Developing and maintaining any sort of optimizing compiler for even a single
+platform is an incredibly complicated, expensive task. Using an existing
+compiler framework like LLVM can make this task simpler, but only at the cost of
+introducing heavy runtime dependencies and significantly higher JIT compilation
+overhead.
+
+It's clear that successfully compiling Python code at runtime requires not only
+high-quality Python-specific optimizations for the code being run, *but also*
+quick generation of efficient machine code for the optimized program. The Python
+core development team has the necessary skills and experience for the former (a
+middle-end tightly coupled to the interpreter itself), and copy-and-patch
+compilation provides an attractive solution for the latter. 
+
+In a nutshell, copy-and-patch allows a high-quality template JIT compiler to be
+generated from the same DSL used to generate the rest of the interpreter. For a
+widely-used, volunteer-driven project like CPython, this benefit cannot be
+overstated: CPython's maintainers, by merely editing the bytecode definitions,
+will also get the JIT backend updated "for free", for *all* platforms, at once.
+This is equally true whether adding new instructions, removing old ones, or
+fixing bugs in existing ones.
+
+Like the rest of the interpreter, the JIT compiler is generated at build time,
+and has no runtime dependencies. It supports a wide range of platforms, and has
+comparatively low maintenance burden. In all, the current implementation is made
+up of about 900 lines of build-time Python code and 500 lines of runtime C code.
+
+Specification
+=============
+
+The JIT will become non-experimental once all of the following conditions are
+met:
+
+#. It provides a meaningful performance improvement for at least one popular
+   platform (realistically, on the order of 5%).
+
+#. It can be built, distributed, and deployed with minimal disruption.
+
+#. The Steering Council, upon request, has determined that it would provide more
+   value to the community if enabled than if disabled (considering tradeoffs
+   such as maintenance burden, memory usage, or the feasibility of alternate
+   designs).
+
+These critera should be considered a starting point, and may be expanded over
+time. For example, discussion of this PEP may reveal that additional
+requirements (such as multiple committed maintainers, a security audit,
+documentation in the devguide, support for out-of-process debugging, or a
+runtime option to disable the JIT) should be added to this list.
+
+Until the JIT is non-experimental, it should *not* be used in production, and
+may be broken or removed at any time without warning.
+
+Of course, at any point, it is also within the Steering Council's power to ask
+for the JIT to be removed entirely if they feel it is necessary to do so.
+
+Once the JIT is no longer experimental, it should be treated in much the same
+way as other build options (such as ``--enable-optimizations`` or
+``--with-lto``). It may be a recommended (or even default) option for some
+platforms, and release managers *may* choose to enable it in official releases.
+
+Support
+-------
+
+The JIT has been developed for all of :pep:`11`'s current tier one platforms,
+most of its tier two platforms, and one of its tier three platforms.
+Specifically, CPython's ``main`` branch has `CI
+<https://github.com/python/cpython/blob/main/.github/workflows/jit.yml>`_
+building and testing the JIT for both release and debug builds ons:
+
+- ``aarch64-apple-darwin/clang``
+
+- ``aarch64-pc-windows/msvc`` [#untested]_
+
+- ``aarch64-unknown-linux-gnu/clang`` [#emulated]_
+
+- ``aarch64-unknown-linux-gnu/gcc`` [#emulated]_
+
+- ``i686-pc-windows-msvc/msvc``
+
+- ``x86_64-apple-darwin/clang``
+
+- ``x86_64-pc-windows-msvc/msvc``
+
+- ``x86_64-unknown-linux-gnu/clang``
+
+- ``x86_64-unknown-linux-gnu/gcc``
+
+It's worth noting that some platforms, even future tier one platforms, may never
+gain JIT support. This can be for a variety of reasons, including insufficient
+LLVM support (``powerpc64le-unknown-linux-gnu/gcc``), inherent limitations in
+the platform itself (``wasm32-unknown-wasi/clang``), or lack of developer
+interest (``x86_64-unknown-freebsd/clang``).
+
+Once JIT support for a platform is added (meaning, the JIT builds successfully
+without displaying warnings to the user), it should be treated in much the same
+way as :pep:`11` prescribes: it should have reliable CI/buildbots, and JIT
+failures on tier one and tier two platforms should block releases. Though it's
+not necessary to update :pep:`11` to specify JIT support, it may be helpful to
+do so anyways.
+
+Since it should always be possible to build CPython without the JIT, removing
+JIT support for a platform should *not* be considered a backwards-incompatible
+change. However, if it is reasonable to do so, the normal deprecation process
+should be followed as outlined in :pep:`387`.
+
+The JIT's dependencies may be changed between releases (within reason).
+
+Backwards Compatibility
+=======================
+
+.. Describe potential impact and severity on pre-existing code.
+
+Due to the fact that the current intepreter and the JIT backend are both
+generated from the same specification, the behavior of Python code should be
+completely unchanged. In practice, observable differences that have arisen (and
+been fixed) during testing have more often been bugs in the micro-ops and the
+way they are optimized, rather than bugs in the JIT backend itself.
+
+Debugging
+---------
+
+Tools that profile and debug Python code will continue to work fine.
+
+Currently, it appears that C profilers and debuggers are unable to trace back
+*through* JIT frames. Working with leaf frames is possible (this is how the JIT
+itself is debugged), though it is of limited utility due to the absense of
+proper debugging information for JIT frames.
+
+Since the code templates emitted by the JIT are compiled by Clang (and it's
+straightforward to pass normal compiler flags as part of the build step), it
+*may* be possible to allow JIT frames to be traced through by simply modifying
+the flags to use frame pointers more carefully. It may also be possible to
+harvest and emit the debugging information produced by Clang. Neither of these
+ideas have been explored very deeply. 
+
+While this is an issue that *should* be fixed, fixing it is not a particularly
+high priority at this time. This is probably a problem best explored by somebody
+with more domain expertise *in collaboration with* those maintaining the JIT
+itself (who have little experience with the inner workings of these tools).
+
+Security Implications
+=====================
+
+.. How could a malicious user take advantage of this new feature?
+
+This JIT, like any JIT, produces large amounts of executable data at runtime.
+This introduces a potential new attack surface to CPython, since a malicious
+actor capable of influencing the contents of this data is therefore capable of
+executing arbitrary code. This is a `well-known vulnerability
+<https://en.wikipedia.org/wiki/Just-in-time_compilation#Security>`_ of JIT
+compilers.
+
+In order to mitigate this risk, the JIT has been written with best practices in
+mind. In particular, the data in question is not exposed by the JIT compiler to
+other parts of the program while it remains writeable, and at *no* point is the
+data both |wx|_.
+
+.. Apparently this how you hack together a formatted link:
+
+.. |wx| replace:: writable *and* executable
+.. _wx: https://en.wikipedia.org/wiki/W%5EX
+
+The nature of template-based JITs also seriously limits the kinds of code that
+can be generated, further reducing the likelihood of a successful exploit. As an
+additional precaution, the templates themselves (including all of their
+metadata) are stored in static, read-only memory.
+
+However, it would be naive to assume that no possible vulnerabilities exist in
+the JIT, especially at this early stage. The author is not a security expert,
+but is available to join or work closely with the Python Security Response Team
+to triage and fix security issues as they arise.
+
+Apple Silicon
+--------------
+
+Though difficult to test without actually signing and packaging a macOS release,
+it *appears* that macOS releases should `enable the JIT Entitlement for the
+Hardened Runtime
+<https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon#Enable-the-JIT-Entitlement-for-the-Hardened-Runtime>`_.
+
+How to Teach This
+=================
+
+.. How to teach users, new and experienced, how to apply the PEP to their work.
+
+Choose the sections that best describe you:
+
+- **If you are a Python programmer or end user...**
+  
+  - ...nothing changes for you. Nobody should be distributing JIT-enabled
+    CPython interpreters to you while it is still an experimental feature. Once
+    it is non-experimental, you will probably notice slightly better performance
+    and slightly higher memory usage. You shouldn't be able to observe any other
+    changes.
+
+- **If you maintain third-party packages...**
+
+  - ...nothing changes for you. There are no API or ABI changes, and the JIT is
+    not exposed to third-party code. You shouldn't need to change your CI
+    matrix, and you shouldn't be able to observe differences in the way your
+    packages work when the JIT is enabled.
+
+- **If you profile or debug Python code...**
+
+  - ...nothing changes for you. All Python profiling and tracing functionality
+    remains.
+  
+- **If you profile or debug C code...**
+
+  - ...currently, the ability to trace *through* JIT frames is limited. This may
+    cause issues if you need to observe the entire C call stack, rather than
+    just "leaf" frames. See the `Debugging`_ section above for more information.
+
+- **If you compile your own Python interpreter....**
+
+  - ...if you don't wish to build the JIT, you can simply ignore it. Otherwise,
+    you will need to install a compatible version of LLVM, and pass the
+    appropriate build flag to the build scripts. Your build may take up to a
+    minute longer. Note that the JIT should *not* be distributed to end users or
+    used in production while it is still in the experimental phase.
+
+- **If you're a maintainer of CPython (or a fork of CPython)...**
+
+  - **...and you change the bytecode definitions or the main interpreter
+    loop...**
+
+    - ..in general, the JIT shouldn't be much of an inconvenience to you
+      (depending on what you're trying to do). The micro-op interpreter is still
+      around, and isn't going anywhere, so your day-to-day development will
+      probably with the JIT itself disabled (as it is now). There is moderate
+      likelihood that larger changes to the interpreter itself (such as adding
+      new local variables, changing error handlers and deoptimization points,
+      changing the micro-op format, etc.) will require changes to the C template
+      used to generate the JIT, which is meant to mimic the main interpreter
+      loop. You may also occasionally just get unlucky, and break JIT code
+      generation, which will require you either modify the Python build scripts
+      yourself, or solicit the help of somebody more familiar with them (below).
+
+  - **...and you work on the JIT itself...**
+
+    - ...you hopefully already have a decent idea of what you're getting
+      yourself into. You will be regularly modifying the Python build scripts,
+      the C template used to generate the JIT, and the C code that actually
+      makes up the runtime portion of the JIT. You'll be regularly dealing with
+      all sorts of crashes, stepping over machine code in a debugger, staring at
+      COFF/ELF/Mach-O dumps, developing on a wide range of platforms, and
+      generally being the point of contact for the people changing the bytecode
+      when CI starts failing on their PRs (above). Ideally, you're at least
+      *familiar* with assembly, have taken a couple of courses with "compilers"
+      in their name, and have read a blog post or two about linkers.
+
+  - **...and you maintain other parts of CPython...**
+
+    - ...nothing changes for you. You shouldn't need to develop locally with JIT
+      builds. If you choose to do so (for example, to help reproduce and triage
+      JIT issues), your builds may take up to a minute longer; however, the
+      built JIT will be cached for subsequent runs (provided that the input
+      files are unmodified).
+
+
+Reference Implementation
+========================
+
+.. Link to any existing implementation and details about its state, e.g.
+   proof-of-concept.
+
+Key parts of the implementation include:
+
+- |readme|_: Instructions for how to build the JIT.
+- |jit|_: The entire runtime portion of the JIT compiler.
+- |jit_stencils|_: An example of the JIT's generated templates.
+- |template|_: The code which is compiled to produce the JIT's templates.
+- |targets|_: The code to compile and parse the templates at build time.
+
+.. |readme| replace:: ``Tools/jit/README.md``
+.. _readme: https://github.com/python/cpython/blob/main/Tools/jit/README.md
+.. |jit| replace:: ``Python/jit.c``
+.. _jit: https://github.com/python/cpython/blob/main/Python/jit.c
+.. |jit_stencils| replace:: ``jit_stencils.h``
+.. _jit_stencils: https://gist.github.com/brandtbucher/9d3cc396dcb15d13f7e971175e987f3a
+.. |template| replace:: ``Tools/jit/template.c``
+.. _template: https://github.com/python/cpython/blob/main/Tools/jit/template.c
+.. |targets| replace:: ``Tools/jit/_targets.py``
+.. _targets: https://github.com/python/cpython/blob/main/Tools/jit/_targets.py
+
+Rejected Ideas
+==============
+
+.. Why certain ideas that were brought while discussing this PEP were not
+   ultimately pursued.
+
+Maintain it outside of CPython
+------------------------------
+
+.. Q: Do you want to merge this into CPython, or maintain it separately?
+
+.. Q: Is the implementation under your GitHub account?
+
+.. Q: Is it possible to maintain it outside of CPython?
+
+Turn it on by default
+---------------------
+
+.. Q: Shouldn’t we merge this as soon as possible?
+
+.. Q: Why was it merged now, without a PEP or wide discussion among Core
+   Developers, especially since the immediate performance benefit is pretty much
+   non-existent?
+
+.. Q: How much follow-up work relies on this being merged, now?
+
+.. Q: How difficult would it be to revert now and merge again later?
+
+.. XXX: ...for now, this is a good compromise between always turning it on and
+   not having it in at all.
+
+Support multiple compiler toolchains
+------------------------------------
+
+Clang is specifically needed because it's the only C compiler with support for
+guaranteed tail calls (|musttail|_), which are required by CPython's
+`continuation-passing-style
+<https://en.wikipedia.org/wiki/Continuation-passing_style#Tail_calls>`_ approach
+to JIT compilation.
+
+Since LLVM also includes other functionalities required by the JIT build process
+(namely, utilities for object file parsing and disassembly), it's convenient to
+only support one toolchain at this time.
+
+.. |musttail| replace:: ``musttail``
+.. _musttail: https://clang.llvm.org/docs/AttributeReference.html#musttail
+
+Use the JIT to compile "tier one" code
+--------------------------------------
+
+Most of the prior art for copy-and-patch uses it as a fast baseline JIT, whereas
+CPython's JIT is using the technique for optimized "tier two" traces.
+
+This is because CPython uses the "tier one" specializing adaptive interpreter to
+collect runtime profiling information, and uses that data to detect and optimize
+"hot" paths through the code. This uses self-modifying code, a technique which
+is much more difficult to implement with using a JIT compiler.
+
+In theory, it should be possible to compile tier one bytecode using
+copy-and-patch (in fact, early prototypes predated the tier two interpreter and
+did exactly this). In practice, the JIT sits somewhere between the "baseline"
+and "optimizing" compiler tiers of other dynamic language runtimes.
+
+Add GPU support
+---------------
+
+The JIT is currently CPU-only. It does not, for example, offload NumPy array
+computations to CUDA GPUs, as JITs like `Numba
+<https://numba.pydata.org/numba-doc/latest/cuda/overview.html>`_ do.
+
+There is already a rich ecosystem of tools for accelerating these sorts of
+specialized tasks, and CPython's JIT is not intended to replace them. Instead,
+it is meant to improve the performance of general-purpose Python code, which is
+less likely to benefit from deeper GPU integration.
+
+Open Issues
+===========
+
+.. Any points that are still being decided/discussed.
+
+Speed
+-----
+
+.. XXX: ... 
+
+Memory
+------
+
+.. XXX: Because it emits ...
+
+Earlier versions of the JIT had a more complicated memory allocation scheme
+which imposed a number of fragile limitations on the size and layout of the
+emitted code, and significantly bloated the memory footprint of Python
+executable itself. These issues are no longer present in the current design.
+
+Dependencies
+------------
+
+.. Q: Could we put the build-time dependencies in a container?
+
+.. Q: Could JITs for every platform be generated on Linux?
+
+.. Q: Will the generated header files be tracked by Git?
+
+.. Q: Is the JIT generated at “generate files time” or “build time”?
+
+Building the JIT adds between 3 and 60 seconds to the build process, depending
+on platform. It is only rebuilt whenever the generated files become out-of-date,
+so only those who are actively developing the main interpreter loop (or the JIT
+itself) will be rebuilding it frequently.
+
+.. XXX: Unlike many other generated files in CPython, the JIT's generated files
+   are not tracked by Git. This is because...
+
+Footnotes
+=========
+
+.. A collection of footnotes cited in the PEP, and a place to list non-inline
+   hyperlink targets.
+
+.. [#untested] Due to lack of available hardware, the JIT is built, but not
+   tested, for this platform.
+
+.. [#emulated] Due to lack of available hardware, the JIT is built using
+   cross-compilation and tested using hardware emulation for this platform. Some
+   tests are skipped because emulation (not the JIT) causes them to fail.
+   However, the JIT has been successfully built and tested for this platform
+   locally.
+
+Copyright
+=========
+
+This document is placed in the public domain or under the CC0-1.0-Universal
+license, whichever is more permissive.

From 905c46acefa0cf3c75edf578b973fa190a0484be Mon Sep 17 00:00:00 2001
From: Brandt Bucher <brandtbucher@microsoft.com>
Date: Thu, 11 Apr 2024 09:01:45 -0700
Subject: [PATCH 02/14] Apply feedback and fill in Rejected Ideas

---
 peps/pep-0744.rst | 228 +++++++++++++++++++++++++---------------------
 1 file changed, 123 insertions(+), 105 deletions(-)

diff --git a/peps/pep-0744.rst b/peps/pep-0744.rst
index 97a85e778ad..44f4d0929ea 100644
--- a/peps/pep-0744.rst
+++ b/peps/pep-0744.rst
@@ -4,15 +4,13 @@ Author: Brandt Bucher <brandt@python.org>
 Discussions-To:
 Status: Draft
 Type: Informational
-Created: 09-APR-2024
+Created: 12-APR-2024
 Python-Version: 3.13
 Post-History:
 
 Abstract
 ========
 
-.. A short (~200 word) description of the technical issue being addressed.
-
 Earlier this year, an `experimental "just-in-time" compiler
 <https://github.com/python/cpython/pull/113465>`_ was merged into CPython's
 ``main`` development branch. While recent CPython releases have included other
@@ -36,25 +34,24 @@ the following resources:
   background, a light technical introduction to the "copy-and-patch" technique
   used, and an open discussion of its design amongst the core developers
   present.
-- The `paper <https://dl.acm.org/doi/10.1145/3485513>`_ originally describing
-  copy-and-patch.
+
+- The `open access paper <https://dl.acm.org/doi/10.1145/3485513>`_ originally
+  describing copy-and-patch.
+
 - The `blog post <https://sillycross.github.io/2023/05/12/2023-05-12>`_ by the
   paper's author detailing the implementation of a copy-and-patch JIT compiler
   for Lua. (While this is a great low-level explanation of the approach, it also
   incorporates other techniques and makes implementation decisions that are not
   relevant to CPython's JIT.)
+
 - The `implementation <#reference-implementation>`_ itself.
 
 Motivation
 ==========
 
-.. Clearly explain why the existing language specification is inadequate to
-   address the problem that the PEP solves.
-
 Until this point, CPython has always executed Python code by compiling it to
 bytecode, which is interpreted at runtime. This bytecode is a more-or-less
-direct translation of the source code itself: it is untyped, and largely
-unoptimized.
+direct translation of the source code: it is untyped, and largely unoptimized.
 
 Since the Python 3.11 release, CPython has used a "specializing adaptive
 interpreter" (:pep:`659`), which `rewrites these bytecode instructions in-place
@@ -62,9 +59,9 @@ interpreter" (:pep:`659`), which `rewrites these bytecode instructions in-place
 This new interpreter delivers significant performance improvements, despite the
 fact that its optimization potential is limited by the boundaries of individual
 bytecode instructions. It also collects a wealth of new profiling information:
-the types flowing though a program, the memory layout of the data being operated
-upon, and what paths through the program are being executed the most. In other
-words, *what* to optimize, and *how* to optimize it.
+the types flowing though a program, the memory layout of particular objects, and
+what paths through the program are being executed the most. In other words,
+*what* to optimize, and *how* to optimize it.
 
 Since the Python 3.12 release, CPython has generated this interpreter from a
 `C-like domain-specific language
@@ -103,40 +100,39 @@ Since much of this data varies between runs of a program (even functionally
 identical ones) and the existing optimization pipeline makes heavy use of
 runtime profiling information, it doesn't make much sense to compile these
 traces ahead of time. As has been demonstrated for many other dynamic languages
-(`even Python itself <https://www.pypy.org/>`_), the most promising approach is
-to compile the optimized micro-ops "just in time" for execution.
+(`and even Python itself <https://www.pypy.org/>`_), the most promising approach
+is to compile the optimized micro-ops "just in time" for execution.
 
 Rationale
 =========
 
-.. Describe why particular design decisions were made.
-
 Despite their reputation, JIT compilers are not magic "go faster" machines.
 Developing and maintaining any sort of optimizing compiler for even a single
-platform is an incredibly complicated, expensive task. Using an existing
-compiler framework like LLVM can make this task simpler, but only at the cost of
-introducing heavy runtime dependencies and significantly higher JIT compilation
-overhead.
+platform (let alone all of CPython's most popular supported platforms) is an
+incredibly complicated, expensive task. Using an existing compiler framework
+like LLVM can make this task simpler, but only at the cost of introducing heavy
+runtime dependencies and significantly higher JIT compilation overhead.
 
 It's clear that successfully compiling Python code at runtime requires not only
 high-quality Python-specific optimizations for the code being run, *but also*
 quick generation of efficient machine code for the optimized program. The Python
 core development team has the necessary skills and experience for the former (a
-middle-end tightly coupled to the interpreter itself), and copy-and-patch
-compilation provides an attractive solution for the latter. 
+middle-end tightly coupled to the interpreter), and copy-and-patch compilation
+provides an attractive solution for the latter. 
 
 In a nutshell, copy-and-patch allows a high-quality template JIT compiler to be
 generated from the same DSL used to generate the rest of the interpreter. For a
 widely-used, volunteer-driven project like CPython, this benefit cannot be
 overstated: CPython's maintainers, by merely editing the bytecode definitions,
-will also get the JIT backend updated "for free", for *all* platforms, at once.
-This is equally true whether adding new instructions, removing old ones, or
-fixing bugs in existing ones.
+will also get the JIT backend updated "for free", for *all* JIT-supported
+platforms, at once. This is equally true whether adding new instructions,
+removing old ones, or fixing bugs in existing ones.
 
 Like the rest of the interpreter, the JIT compiler is generated at build time,
-and has no runtime dependencies. It supports a wide range of platforms, and has
-comparatively low maintenance burden. In all, the current implementation is made
-up of about 900 lines of build-time Python code and 500 lines of runtime C code.
+and has no runtime dependencies. It supports a wide range of platforms (see the
+`Support`_ section below), and has comparatively low maintenance burden. In all,
+the current implementation is made up of about 900 lines of build-time Python
+code and 500 lines of runtime C code.
 
 Specification
 =============
@@ -163,9 +159,6 @@ runtime option to disable the JIT) should be added to this list.
 Until the JIT is non-experimental, it should *not* be used in production, and
 may be broken or removed at any time without warning.
 
-Of course, at any point, it is also within the Steering Council's power to ask
-for the JIT to be removed entirely if they feel it is necessary to do so.
-
 Once the JIT is no longer experimental, it should be treated in much the same
 way as other build options (such as ``--enable-optimizations`` or
 ``--with-lto``). It may be a recommended (or even default) option for some
@@ -178,7 +171,7 @@ The JIT has been developed for all of :pep:`11`'s current tier one platforms,
 most of its tier two platforms, and one of its tier three platforms.
 Specifically, CPython's ``main`` branch has `CI
 <https://github.com/python/cpython/blob/main/.github/workflows/jit.yml>`_
-building and testing the JIT for both release and debug builds ons:
+building and testing the JIT for both release and debug builds on:
 
 - ``aarch64-apple-darwin/clang``
 
@@ -200,9 +193,9 @@ building and testing the JIT for both release and debug builds ons:
 
 It's worth noting that some platforms, even future tier one platforms, may never
 gain JIT support. This can be for a variety of reasons, including insufficient
-LLVM support (``powerpc64le-unknown-linux-gnu/gcc``), inherent limitations in
-the platform itself (``wasm32-unknown-wasi/clang``), or lack of developer
-interest (``x86_64-unknown-freebsd/clang``).
+LLVM support (``powerpc64le-unknown-linux-gnu/gcc``), inherent limitations of
+the platform (``wasm32-unknown-wasi/clang``), or lack of developer interest
+(``x86_64-unknown-freebsd/clang``).
 
 Once JIT support for a platform is added (meaning, the JIT builds successfully
 without displaying warnings to the user), it should be treated in much the same
@@ -216,18 +209,18 @@ JIT support for a platform should *not* be considered a backwards-incompatible
 change. However, if it is reasonable to do so, the normal deprecation process
 should be followed as outlined in :pep:`387`.
 
-The JIT's dependencies may be changed between releases (within reason).
+The JIT's build-time dependencies may be changed between releases (within
+reason).
 
 Backwards Compatibility
 =======================
 
-.. Describe potential impact and severity on pre-existing code.
-
 Due to the fact that the current intepreter and the JIT backend are both
 generated from the same specification, the behavior of Python code should be
 completely unchanged. In practice, observable differences that have arisen (and
-been fixed) during testing have more often been bugs in the micro-ops and the
-way they are optimized, rather than bugs in the JIT backend itself.
+been fixed) during testing have tended to be bugs in the existing micro-op
+translation and optimization stages, rather than bugs in the copy-and-patch
+step.
 
 Debugging
 ---------
@@ -239,23 +232,20 @@ Currently, it appears that C profilers and debuggers are unable to trace back
 itself is debugged), though it is of limited utility due to the absense of
 proper debugging information for JIT frames.
 
-Since the code templates emitted by the JIT are compiled by Clang (and it's
-straightforward to pass normal compiler flags as part of the build step), it
-*may* be possible to allow JIT frames to be traced through by simply modifying
-the flags to use frame pointers more carefully. It may also be possible to
+Since the code templates emitted by the JIT are compiled by Clang, it *may* be
+possible to allow JIT frames to be traced through by simply modifying the
+compiler flags to use frame pointers more carefully. It may also be possible to
 harvest and emit the debugging information produced by Clang. Neither of these
 ideas have been explored very deeply. 
 
 While this is an issue that *should* be fixed, fixing it is not a particularly
 high priority at this time. This is probably a problem best explored by somebody
-with more domain expertise *in collaboration with* those maintaining the JIT
-itself (who have little experience with the inner workings of these tools).
+with more domain expertise in collaboration with those maintaining the JIT, who
+have little experience with the inner workings of these tools.
 
 Security Implications
 =====================
 
-.. How could a malicious user take advantage of this new feature?
-
 This JIT, like any JIT, produces large amounts of executable data at runtime.
 This introduces a potential new attack surface to CPython, since a malicious
 actor capable of influencing the contents of this data is therefore capable of
@@ -291,11 +281,12 @@ it *appears* that macOS releases should `enable the JIT Entitlement for the
 Hardened Runtime
 <https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon#Enable-the-JIT-Entitlement-for-the-Hardened-Runtime>`_.
 
+This shouldn't make *installing* Python any harder, but may add additional steps
+for release managers to perform.
+
 How to Teach This
 =================
 
-.. How to teach users, new and experienced, how to apply the PEP to their work.
-
 Choose the sections that best describe you:
 
 - **If you are a Python programmer or end user...**
@@ -327,8 +318,9 @@ Choose the sections that best describe you:
 - **If you compile your own Python interpreter....**
 
   - ...if you don't wish to build the JIT, you can simply ignore it. Otherwise,
-    you will need to install a compatible version of LLVM, and pass the
-    appropriate build flag to the build scripts. Your build may take up to a
+    you will need to `install a compatible version of LLVM
+    <https://github.com/python/cpython/blob/main/Tools/jit/README.md>`_, and
+    pass the appropriate flag to the build scripts. Your build may take up to a
     minute longer. Note that the JIT should *not* be distributed to end users or
     used in production while it is still in the experimental phase.
 
@@ -337,24 +329,24 @@ Choose the sections that best describe you:
   - **...and you change the bytecode definitions or the main interpreter
     loop...**
 
-    - ..in general, the JIT shouldn't be much of an inconvenience to you
-      (depending on what you're trying to do). The micro-op interpreter is still
-      around, and isn't going anywhere, so your day-to-day development will
-      probably with the JIT itself disabled (as it is now). There is moderate
-      likelihood that larger changes to the interpreter itself (such as adding
-      new local variables, changing error handlers and deoptimization points,
-      changing the micro-op format, etc.) will require changes to the C template
-      used to generate the JIT, which is meant to mimic the main interpreter
-      loop. You may also occasionally just get unlucky, and break JIT code
-      generation, which will require you either modify the Python build scripts
-      yourself, or solicit the help of somebody more familiar with them (below).
+    - ...in general, the JIT shouldn't be much of an inconvenience to you
+      (depending on what you're trying to do). The micro-op interpreter isn't
+      going anywhere, and still offers a debugging experience similer to what
+      the main bytecode interpreter provides today. There is moderate likelihood
+      that larger changes to the interpreter (such as adding new local
+      variables, changing error handling and deoptimization logic, changing the
+      micro-op format, etc.) will require changes to the C template used to
+      generate the JIT, which is meant to mimic the main interpreter loop. You
+      may also occasionally just get unlucky and break JIT code generation,
+      which will require you to either modify the Python build scripts yourself,
+      or solicit the help of somebody more familiar with them (below).
 
   - **...and you work on the JIT itself...**
 
     - ...you hopefully already have a decent idea of what you're getting
       yourself into. You will be regularly modifying the Python build scripts,
       the C template used to generate the JIT, and the C code that actually
-      makes up the runtime portion of the JIT. You'll be regularly dealing with
+      makes up the runtime portion of the JIT. You will also be dealing with
       all sorts of crashes, stepping over machine code in a debugger, staring at
       COFF/ELF/Mach-O dumps, developing on a wide range of platforms, and
       generally being the point of contact for the people changing the bytecode
@@ -366,66 +358,85 @@ Choose the sections that best describe you:
 
     - ...nothing changes for you. You shouldn't need to develop locally with JIT
       builds. If you choose to do so (for example, to help reproduce and triage
-      JIT issues), your builds may take up to a minute longer; however, the
-      built JIT will be cached for subsequent runs (provided that the input
-      files are unmodified).
+      JIT issues), your builds may take up to a minute longer each time the
+      relevant files are modified.
 
 
 Reference Implementation
 ========================
 
-.. Link to any existing implementation and details about its state, e.g.
-   proof-of-concept.
-
 Key parts of the implementation include:
 
 - |readme|_: Instructions for how to build the JIT.
+  
 - |jit|_: The entire runtime portion of the JIT compiler.
+  
 - |jit_stencils|_: An example of the JIT's generated templates.
+  
 - |template|_: The code which is compiled to produce the JIT's templates.
+  
 - |targets|_: The code to compile and parse the templates at build time.
 
 .. |readme| replace:: ``Tools/jit/README.md``
 .. _readme: https://github.com/python/cpython/blob/main/Tools/jit/README.md
+
 .. |jit| replace:: ``Python/jit.c``
 .. _jit: https://github.com/python/cpython/blob/main/Python/jit.c
+
 .. |jit_stencils| replace:: ``jit_stencils.h``
 .. _jit_stencils: https://gist.github.com/brandtbucher/9d3cc396dcb15d13f7e971175e987f3a
+
 .. |template| replace:: ``Tools/jit/template.c``
 .. _template: https://github.com/python/cpython/blob/main/Tools/jit/template.c
+
 .. |targets| replace:: ``Tools/jit/_targets.py``
 .. _targets: https://github.com/python/cpython/blob/main/Tools/jit/_targets.py
 
 Rejected Ideas
 ==============
 
-.. Why certain ideas that were brought while discussing this PEP were not
-   ultimately pursued.
-
 Maintain it outside of CPython
 ------------------------------
 
-.. Q: Do you want to merge this into CPython, or maintain it separately?
-
-.. Q: Is the implementation under your GitHub account?
-
-.. Q: Is it possible to maintain it outside of CPython?
+While it is *probably* possible to maintain the JIT outside of CPython, its
+implementation is tied tightly enough to the rest of the interpreter that
+keeping it up-to-date would probably be more difficult than actually developing
+the JIT itself. Additionally, contributors working on the existing micro-op
+definitions and optimizations would need to modify and build two separate
+projects to measure the effects of their changes under the JIT (whereas today,
+infrastructure exists to do this automatically for any proposed change).
+
+Releases of the separate "JIT" project would probably also need to correspond to
+specific CPython pre-releases and patch releases, depending on exactly what
+changes are present. Individual CPython commits between releases likely wouldn't
+have corresponding JIT releases at all, further complicating debugging efforts
+(such as bisection to find breaking changes upstream).
+
+Since the JIT is already quite stable, and the ultimate goal is for it to be a
+non-experimental part of CPython, keeping it in ``main`` seems to be the best
+path forward. With that said, the relevant code is organized in such a way that
+the JIT can be easily "deleted" if it does not end up meeting its goals.
 
 Turn it on by default
 ---------------------
 
-.. Q: Shouldn’t we merge this as soon as possible?
+On the other hand, some have suggested that the JIT should be enabled by default
+in its current form.
 
-.. Q: Why was it merged now, without a PEP or wide discussion among Core
-   Developers, especially since the immediate performance benefit is pretty much
-   non-existent?
+Again, it is important to remember that a JIT is not a magic "go faster"
+machine; currently, the JIT is about as fast as the existing specializing
+interpreter. This may sound underwhelming, but it is actually a fairly
+significant achievement, and it's the main reason why this approach was
+considered viable enough to be merged into ``main`` for further development.
 
-.. Q: How much follow-up work relies on this being merged, now?
+While the JIT provides significant gains over the existing micro-op interpreter,
+it isn't yet a clear win when always enabled (especially considering its
+increased memory consumption and additional build-time dependencies). That's the
+purpose of this PEP: to clarify expectations about the objective criteria that
+should be met in order to "flip the switch".
 
-.. Q: How difficult would it be to revert now and merge again later?
-
-.. XXX: ...for now, this is a good compromise between always turning it on and
-   not having it in at all.
+At least for now, having this in ``main``, but off by default, seems to be a
+good compromise between always turning it on and not having it available at all.
 
 Support multiple compiler toolchains
 ------------------------------------
@@ -434,30 +445,37 @@ Clang is specifically needed because it's the only C compiler with support for
 guaranteed tail calls (|musttail|_), which are required by CPython's
 `continuation-passing-style
 <https://en.wikipedia.org/wiki/Continuation-passing_style#Tail_calls>`_ approach
-to JIT compilation.
-
-Since LLVM also includes other functionalities required by the JIT build process
-(namely, utilities for object file parsing and disassembly), it's convenient to
-only support one toolchain at this time.
+to JIT compilation. Without it, the tail-recursive calls between templates could
+result in unbounded C stack growth (and eventual overflow).
 
 .. |musttail| replace:: ``musttail``
 .. _musttail: https://clang.llvm.org/docs/AttributeReference.html#musttail
 
-Use the JIT to compile "tier one" code
---------------------------------------
+Since LLVM also includes other functionalities required by the JIT build process
+(namely, utilities for object file parsing and disassembly), and additional
+toolchains introduce additional testing and maintenance burden, it's convenient
+to only support one major version of one toolchain at this time.
+
+Compile the base interpreter's bytecode
+---------------------------------------
+
+.. XXX: Rework this section.
 
 Most of the prior art for copy-and-patch uses it as a fast baseline JIT, whereas
-CPython's JIT is using the technique for optimized "tier two" traces.
+CPython's JIT is using the technique to compile optimized micro-op traces.
+
+In practice, the JIT currently sits somewhere between the "baseline" and
+"optimizing" compiler tiers of other dynamic language runtimes. This is because
+CPython uses the specializing adaptive interpreter to collect runtime profiling
+information, which is used to detect and optimize "hot" paths through the code.
+This employs self-modifying code, a technique which is much more difficult to
+implement with a JIT compiler.
 
-This is because CPython uses the "tier one" specializing adaptive interpreter to
-collect runtime profiling information, and uses that data to detect and optimize
-"hot" paths through the code. This uses self-modifying code, a technique which
-is much more difficult to implement with using a JIT compiler.
+While it's *possible* to compile tier one bytecode using copy-and-patch (in
+fact, early prototypes predated the tier two interpreter and did exactly this),
+it just doesn't provide enough optimization potential as the more granular
+micro-op format.
 
-In theory, it should be possible to compile tier one bytecode using
-copy-and-patch (in fact, early prototypes predated the tier two interpreter and
-did exactly this). In practice, the JIT sits somewhere between the "baseline"
-and "optimizing" compiler tiers of other dynamic language runtimes.
 
 Add GPU support
 ---------------
@@ -489,7 +507,7 @@ Memory
 Earlier versions of the JIT had a more complicated memory allocation scheme
 which imposed a number of fragile limitations on the size and layout of the
 emitted code, and significantly bloated the memory footprint of Python
-executable itself. These issues are no longer present in the current design.
+executable. These issues are no longer present in the current design.
 
 Dependencies
 ------------

From 048dc43050a47ab4304c724cc14340daf9d431c6 Mon Sep 17 00:00:00 2001
From: Brandt Bucher <brandtbucher@microsoft.com>
Date: Thu, 11 Apr 2024 11:36:17 -0700
Subject: [PATCH 03/14] Finish Open Issues

---
 peps/pep-0744.rst | 128 ++++++++++++++++++++++++++--------------------
 1 file changed, 72 insertions(+), 56 deletions(-)

diff --git a/peps/pep-0744.rst b/peps/pep-0744.rst
index 44f4d0929ea..d35577e7fb7 100644
--- a/peps/pep-0744.rst
+++ b/peps/pep-0744.rst
@@ -4,7 +4,7 @@ Author: Brandt Bucher <brandt@python.org>
 Discussions-To:
 Status: Draft
 Type: Informational
-Created: 12-APR-2024
+Created: 11-APR-2024
 Python-Version: 3.13
 Post-History:
 
@@ -16,7 +16,7 @@ Earlier this year, an `experimental "just-in-time" compiler
 ``main`` development branch. While recent CPython releases have included other
 substantial internal changes, this addition represents a particularly
 significant departure from the way CPython has traditionally executed Python
-code (and thus, deserves wider discussion).
+code. As such, it deserves wider discussion.
 
 This PEP aims to summarize the design decisions behind this addition, the
 current state of the implementation, and future plans for making the JIT a
@@ -40,9 +40,9 @@ the following resources:
 
 - The `blog post <https://sillycross.github.io/2023/05/12/2023-05-12>`_ by the
   paper's author detailing the implementation of a copy-and-patch JIT compiler
-  for Lua. (While this is a great low-level explanation of the approach, it also
-  incorporates other techniques and makes implementation decisions that are not
-  relevant to CPython's JIT.)
+  for Lua. While this is a great low-level explanation of the approach, note
+  that it also incorporates other techniques and makes implementation decisions
+  that are not particularly relevant to CPython's JIT.
 
 - The `implementation <#reference-implementation>`_ itself.
 
@@ -96,19 +96,19 @@ of arguments, constants, and cached values directly into machine instructions),
 and memory traffic (by moving data off of heap-allocated Python frames and into
 physical hardware registers).
 
-Since much of this data varies between runs of a program (even functionally
-identical ones) and the existing optimization pipeline makes heavy use of
-runtime profiling information, it doesn't make much sense to compile these
-traces ahead of time. As has been demonstrated for many other dynamic languages
-(`and even Python itself <https://www.pypy.org/>`_), the most promising approach
-is to compile the optimized micro-ops "just in time" for execution.
+Since much of this data varies even between identical runs of a program and the
+existing optimization pipeline makes heavy use of runtime profiling information,
+it doesn't make much sense to compile these traces ahead of time. As has been
+demonstrated for many other dynamic languages (`and even Python itself
+<https://www.pypy.org/>`_), the most promising approach is to compile the
+optimized micro-ops "just in time" for execution.
 
 Rationale
 =========
 
 Despite their reputation, JIT compilers are not magic "go faster" machines.
 Developing and maintaining any sort of optimizing compiler for even a single
-platform (let alone all of CPython's most popular supported platforms) is an
+platform, let alone all of CPython's most popular supported platforms, is an
 incredibly complicated, expensive task. Using an existing compiler framework
 like LLVM can make this task simpler, but only at the cost of introducing heavy
 runtime dependencies and significantly higher JIT compilation overhead.
@@ -160,9 +160,9 @@ Until the JIT is non-experimental, it should *not* be used in production, and
 may be broken or removed at any time without warning.
 
 Once the JIT is no longer experimental, it should be treated in much the same
-way as other build options (such as ``--enable-optimizations`` or
-``--with-lto``). It may be a recommended (or even default) option for some
-platforms, and release managers *may* choose to enable it in official releases.
+way as other build options such as ``--enable-optimizations`` or ``--with-lto``.
+It may be a recommended (or even default) option for some platforms, and release
+managers *may* choose to enable it in official releases.
 
 Support
 -------
@@ -209,18 +209,17 @@ JIT support for a platform should *not* be considered a backwards-incompatible
 change. However, if it is reasonable to do so, the normal deprecation process
 should be followed as outlined in :pep:`387`.
 
-The JIT's build-time dependencies may be changed between releases (within
-reason).
+The JIT's build-time dependencies may be changed between releases, within
+reason.
 
 Backwards Compatibility
 =======================
 
 Due to the fact that the current intepreter and the JIT backend are both
 generated from the same specification, the behavior of Python code should be
-completely unchanged. In practice, observable differences that have arisen (and
-been fixed) during testing have tended to be bugs in the existing micro-op
-translation and optimization stages, rather than bugs in the copy-and-patch
-step.
+completely unchanged. In practice, observable differences that have found and
+fixed during testing have tended to be bugs in the existing micro-op translation
+and optimization stages, rather than bugs in the copy-and-patch step.
 
 Debugging
 ---------
@@ -265,8 +264,8 @@ data both |wx|_.
 
 The nature of template-based JITs also seriously limits the kinds of code that
 can be generated, further reducing the likelihood of a successful exploit. As an
-additional precaution, the templates themselves (including all of their
-metadata) are stored in static, read-only memory.
+additional precaution, the templates themselves are stored in static, read-only
+memory.
 
 However, it would be naive to assume that no possible vulnerabilities exist in
 the JIT, especially at this early stage. The author is not a security expert,
@@ -334,8 +333,8 @@ Choose the sections that best describe you:
       going anywhere, and still offers a debugging experience similer to what
       the main bytecode interpreter provides today. There is moderate likelihood
       that larger changes to the interpreter (such as adding new local
-      variables, changing error handling and deoptimization logic, changing the
-      micro-op format, etc.) will require changes to the C template used to
+      variables, changing error handling and deoptimization logic, or changing
+      the micro-op format) will require changes to the C template used to
       generate the JIT, which is meant to mimic the main interpreter loop. You
       may also occasionally just get unlucky and break JIT code generation,
       which will require you to either modify the Python build scripts yourself,
@@ -459,21 +458,19 @@ to only support one major version of one toolchain at this time.
 Compile the base interpreter's bytecode
 ---------------------------------------
 
-.. XXX: Rework this section.
-
 Most of the prior art for copy-and-patch uses it as a fast baseline JIT, whereas
 CPython's JIT is using the technique to compile optimized micro-op traces.
 
-In practice, the JIT currently sits somewhere between the "baseline" and
+In practice, the new JIT currently sits somewhere between the "baseline" and
 "optimizing" compiler tiers of other dynamic language runtimes. This is because
-CPython uses the specializing adaptive interpreter to collect runtime profiling
+CPython uses its specializing adaptive interpreter to collect runtime profiling
 information, which is used to detect and optimize "hot" paths through the code.
-This employs self-modifying code, a technique which is much more difficult to
-implement with a JIT compiler.
+This step is carried out using self-modifying code, a technique which is much
+more difficult to implement with a JIT compiler.
 
-While it's *possible* to compile tier one bytecode using copy-and-patch (in
-fact, early prototypes predated the tier two interpreter and did exactly this),
-it just doesn't provide enough optimization potential as the more granular
+While it's *possible* to compile normal bytecode using copy-and-patch (in fact,
+early prototypes predated the micro-op interpreter and did exactly this), it
+just doesn't seem to provide enough optimization potential as the more granular
 micro-op format.
 
 
@@ -492,17 +489,38 @@ less likely to benefit from deeper GPU integration.
 Open Issues
 ===========
 
-.. Any points that are still being decided/discussed.
-
 Speed
 -----
 
-.. XXX: ... 
+Currently, the JIT is `about as fast as the existing specializing interpreter 
+<https://github.com/faster-cpython/benchmarking-public/blob/main/configs.png>`_
+on most platforms. Improving this is obviously a top priority at this point,
+since providing a significant performance gain is the entire motivation for
+having a JIT at all. A number of proposed improvements are already underway, and
+this ongoing work is being tracked in `GH-115802
+<https://github.com/python/cpython/issues/115802>`_.
 
 Memory
 ------
 
-.. XXX: Because it emits ...
+Because it allocates additional memory for executable machine code, the JIT does
+use more memory than the existing interpreter at runtime. According to the
+official benchmarks, the JIT currently uses about `10-20% more memory than the
+base interpreter
+<https://github.com/faster-cpython/benchmarking-public/blob/main/memory_configs.png>`_.
+The upper end of this range is due to ``aarch64-apple-darwin``, which has larger
+page sizes (and thus, a larger minimum allocation granularity).
+
+However, these numbers should be taken with a grain of salt, as the benchmarks
+themselves don't actually have a very high baseline of memory usage. Since they
+have a higher ratio of code to data, the JIT's memory overhead is more
+pronounced than it would be in a typical workload where memory pressure is more
+likely to be a real concern.
+
+Not much effort has been put into optimizing the JIT's memory usage yet, so
+these numbers likely represent a maximum that will be reduced over time.
+Improving this is a medium priority, and is being tracked in `GH-116017
+<https://github.com/python/cpython/issues/116017>`_.
 
 Earlier versions of the JIT had a more complicated memory allocation scheme
 which imposed a number of fragile limitations on the size and layout of the
@@ -512,36 +530,34 @@ executable. These issues are no longer present in the current design.
 Dependencies
 ------------
 
-.. Q: Could we put the build-time dependencies in a container?
-
-.. Q: Could JITs for every platform be generated on Linux?
-
-.. Q: Will the generated header files be tracked by Git?
-
-.. Q: Is the JIT generated at “generate files time” or “build time”?
-
 Building the JIT adds between 3 and 60 seconds to the build process, depending
 on platform. It is only rebuilt whenever the generated files become out-of-date,
-so only those who are actively developing the main interpreter loop (or the JIT
-itself) will be rebuilding it frequently.
+so only those who are actively developing the main interpreter loop will be
+rebuilding it with any frequency.
 
-.. XXX: Unlike many other generated files in CPython, the JIT's generated files
-   are not tracked by Git. This is because...
+Unlike many other generated files in CPython, the JIT's generated files are not
+tracked by Git. This is because they contain compiled binary code templates
+specific to not only the host platform, but also the current build configuration
+for that platform. As such, hosting them would require a significant engineering
+effort in order to build and host dozens of large binary files for each commit
+that changes the generated code. While perhaps feasible, this is not a priority,
+since downloading the required tools is not difficult for most users, and the
+build step is not particularly time-consuming.
+
+Since some still remain interested in this possibility, discussion is being
+tracked in `GH-115869 <https://github.com/python/cpython/issues/115869>`_.
 
 Footnotes
 =========
 
-.. A collection of footnotes cited in the PEP, and a place to list non-inline
-   hyperlink targets.
-
 .. [#untested] Due to lack of available hardware, the JIT is built, but not
    tested, for this platform.
 
 .. [#emulated] Due to lack of available hardware, the JIT is built using
    cross-compilation and tested using hardware emulation for this platform. Some
-   tests are skipped because emulation (not the JIT) causes them to fail.
-   However, the JIT has been successfully built and tested for this platform
-   locally.
+   tests are skipped because emulation causes them to fail. However, the JIT has
+   been successfully built and tested for this platform on non-emulated
+   hardware.
 
 Copyright
 =========

From 82577f8d69071794238dc15f93e8eb452d6286d0 Mon Sep 17 00:00:00 2001
From: Brandt Bucher <brandtbucher@microsoft.com>
Date: Thu, 11 Apr 2024 11:40:17 -0700
Subject: [PATCH 04/14] fixup

---
 peps/pep-0744.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/peps/pep-0744.rst b/peps/pep-0744.rst
index d35577e7fb7..d48f039bc11 100644
--- a/peps/pep-0744.rst
+++ b/peps/pep-0744.rst
@@ -100,7 +100,7 @@ Since much of this data varies even between identical runs of a program and the
 existing optimization pipeline makes heavy use of runtime profiling information,
 it doesn't make much sense to compile these traces ahead of time. As has been
 demonstrated for many other dynamic languages (`and even Python itself
-<https://www.pypy.org/>`_), the most promising approach is to compile the
+<https://www.pypy.org>`_), the most promising approach is to compile the
 optimized micro-ops "just in time" for execution.
 
 Rationale

From 67f2013c17496954b6d644ce60e3c3369e3ec87e Mon Sep 17 00:00:00 2001
From: Brandt Bucher <brandtbucher@microsoft.com>
Date: Thu, 11 Apr 2024 11:42:28 -0700
Subject: [PATCH 05/14] Own the code

---
 .github/CODEOWNERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
index aa116678b19..5c010e8027f 100644
--- a/.github/CODEOWNERS
+++ b/.github/CODEOWNERS
@@ -622,6 +622,7 @@ peps/pep-0740.rst  @dstufft
 peps/pep-0741.rst  @vstinner
 peps/pep-0742.rst  @JelleZijlstra
 peps/pep-0743.rst  @vstinner
+peps/pep-0744.rst  @brandtbucher
 # ...
 # peps/pep-0754.rst
 # ...

From 5fe80420cf6d47d0b5bd9dadcedf9dcebe2d4de1 Mon Sep 17 00:00:00 2001
From: Brandt Bucher <brandtbucher@microsoft.com>
Date: Thu, 11 Apr 2024 11:48:15 -0700
Subject: [PATCH 06/14] Remove empty header sections

---
 peps/pep-0744.rst | 2 --
 1 file changed, 2 deletions(-)

diff --git a/peps/pep-0744.rst b/peps/pep-0744.rst
index d48f039bc11..abc0f6f8b91 100644
--- a/peps/pep-0744.rst
+++ b/peps/pep-0744.rst
@@ -1,12 +1,10 @@
 PEP: 744
 Title: JIT Compilation
 Author: Brandt Bucher <brandt@python.org>
-Discussions-To:
 Status: Draft
 Type: Informational
 Created: 11-APR-2024
 Python-Version: 3.13
-Post-History:
 
 Abstract
 ========

From 1af3127b19b67d36f63a8189b9d3d02aa20da209 Mon Sep 17 00:00:00 2001
From: Brandt Bucher <brandtbucher@microsoft.com>
Date: Thu, 11 Apr 2024 11:51:29 -0700
Subject: [PATCH 07/14] APR -> Apr

---
 peps/pep-0744.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/peps/pep-0744.rst b/peps/pep-0744.rst
index abc0f6f8b91..8a91e2ca165 100644
--- a/peps/pep-0744.rst
+++ b/peps/pep-0744.rst
@@ -3,7 +3,7 @@ Title: JIT Compilation
 Author: Brandt Bucher <brandt@python.org>
 Status: Draft
 Type: Informational
-Created: 11-APR-2024
+Created: 11-Apr-2024
 Python-Version: 3.13
 
 Abstract

From 69bc7c16f167981c92ccc6850feb38efc686653b Mon Sep 17 00:00:00 2001
From: Brandt Bucher <brandtbucher@microsoft.com>
Date: Thu, 11 Apr 2024 12:10:49 -0700
Subject: [PATCH 08/14] the the -> the (and reflow)

---
 peps/pep-0744.rst | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/peps/pep-0744.rst b/peps/pep-0744.rst
index 8a91e2ca165..c459ffb5043 100644
--- a/peps/pep-0744.rst
+++ b/peps/pep-0744.rst
@@ -64,17 +64,16 @@ what paths through the program are being executed the most. In other words,
 Since the Python 3.12 release, CPython has generated this interpreter from a
 `C-like domain-specific language
 <https://github.com/python/cpython/blob/main/Python/bytecodes.c>`_. In addition
-to taming some of the complexity of the the new adaptive interpreter, the DSL
-also allows CPython's maintainers to avoid hand-writing tedious boilerplate code
-in many parts of the interpreter, compiler, and standard library that must be
-kept in sync with the instruction definitions. This ability to generate large
-amounts of runtime infrastructure from a single source of truth is not only
-convenient for maintenance; it also unlocks many possibilities for expanding
-CPython's execution in new ways. For instance, it makes it feasible to
-automatically generate tables for translating a sequence of instructions into an
-equivalent sequence of smaller "micro-ops", generate an optimizer for sequences
-of these micro-ops, and even generate an entire second interpreter for executing
-them.
+to taming some of the complexity of the new adaptive interpreter, the DSL also
+allows CPython's maintainers to avoid hand-writing tedious boilerplate code in
+many parts of the interpreter, compiler, and standard library that must be kept
+in sync with the instruction definitions. This ability to generate large amounts
+of runtime infrastructure from a single source of truth is not only convenient
+for maintenance; it also unlocks many possibilities for expanding CPython's
+execution in new ways. For instance, it makes it feasible to automatically
+generate tables for translating a sequence of instructions into an equivalent
+sequence of smaller "micro-ops", generate an optimizer for sequences of these
+micro-ops, and even generate an entire second interpreter for executing them.
 
 In fact, since early in the Python 3.13 release cycle, all CPython builds have
 included this exact micro-op translation, optimization, and execution machinery.

From 93eef2c12d6914e7dfc11a55d69adafe322a8eb4 Mon Sep 17 00:00:00 2001
From: Brandt Bucher <brandtbucher@microsoft.com>
Date: Thu, 11 Apr 2024 12:13:46 -0700
Subject: [PATCH 09/14] have -> have been (and reflow)

---
 peps/pep-0744.rst | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/peps/pep-0744.rst b/peps/pep-0744.rst
index c459ffb5043..eb089c7ed01 100644
--- a/peps/pep-0744.rst
+++ b/peps/pep-0744.rst
@@ -214,9 +214,10 @@ Backwards Compatibility
 
 Due to the fact that the current intepreter and the JIT backend are both
 generated from the same specification, the behavior of Python code should be
-completely unchanged. In practice, observable differences that have found and
-fixed during testing have tended to be bugs in the existing micro-op translation
-and optimization stages, rather than bugs in the copy-and-patch step.
+completely unchanged. In practice, observable differences that have been found
+and fixed during testing have tended to be bugs in the existing micro-op
+translation and optimization stages, rather than bugs in the copy-and-patch
+step.
 
 Debugging
 ---------

From da66f5c26ea1b2d84a3a1e81601df430643a98f0 Mon Sep 17 00:00:00 2001
From: Brandt Bucher <brandtbucher@microsoft.com>
Date: Thu, 11 Apr 2024 12:22:04 -0700
Subject: [PATCH 10/14] Clarify where we should advertise support

---
 peps/pep-0744.rst | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/peps/pep-0744.rst b/peps/pep-0744.rst
index eb089c7ed01..5c939fa22ed 100644
--- a/peps/pep-0744.rst
+++ b/peps/pep-0744.rst
@@ -199,7 +199,9 @@ without displaying warnings to the user), it should be treated in much the same
 way as :pep:`11` prescribes: it should have reliable CI/buildbots, and JIT
 failures on tier one and tier two platforms should block releases. Though it's
 not necessary to update :pep:`11` to specify JIT support, it may be helpful to
-do so anyways.
+do so anyways. Otherwise, a list of supported platforms should be maintained in
+`the JIT's README
+<https://github.com/python/cpython/blob/main/Tools/jit/README.md>`_.
 
 Since it should always be possible to build CPython without the JIT, removing
 JIT support for a platform should *not* be considered a backwards-incompatible

From 43b4f7f5c5207b840fa9cd28e3de802e0cddb701 Mon Sep 17 00:00:00 2001
From: Brandt Bucher <brandtbucher@microsoft.com>
Date: Thu, 11 Apr 2024 12:35:12 -0700
Subject: [PATCH 11/14] Typos

---
 peps/pep-0744.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/peps/pep-0744.rst b/peps/pep-0744.rst
index 5c939fa22ed..3d97d6374f4 100644
--- a/peps/pep-0744.rst
+++ b/peps/pep-0744.rst
@@ -147,7 +147,7 @@ met:
    such as maintenance burden, memory usage, or the feasibility of alternate
    designs).
 
-These critera should be considered a starting point, and may be expanded over
+These criteria should be considered a starting point, and may be expanded over
 time. For example, discussion of this PEP may reveal that additional
 requirements (such as multiple committed maintainers, a security audit,
 documentation in the devguide, support for out-of-process debugging, or a
@@ -214,7 +214,7 @@ reason.
 Backwards Compatibility
 =======================
 
-Due to the fact that the current intepreter and the JIT backend are both
+Due to the fact that the current interpreter and the JIT backend are both
 generated from the same specification, the behavior of Python code should be
 completely unchanged. In practice, observable differences that have been found
 and fixed during testing have tended to be bugs in the existing micro-op
@@ -228,7 +228,7 @@ Tools that profile and debug Python code will continue to work fine.
 
 Currently, it appears that C profilers and debuggers are unable to trace back
 *through* JIT frames. Working with leaf frames is possible (this is how the JIT
-itself is debugged), though it is of limited utility due to the absense of
+itself is debugged), though it is of limited utility due to the absence of
 proper debugging information for JIT frames.
 
 Since the code templates emitted by the JIT are compiled by Clang, it *may* be

From 796a9bd6f329fe2312f565836a652a85dd77f069 Mon Sep 17 00:00:00 2001
From: Brandt Bucher <brandtbucher@microsoft.com>
Date: Thu, 11 Apr 2024 15:19:58 -0700
Subject: [PATCH 12/14] Apply suggestions and reflow

---
 peps/pep-0744.rst | 54 ++++++++++++++++++++++++-----------------------
 1 file changed, 28 insertions(+), 26 deletions(-)

diff --git a/peps/pep-0744.rst b/peps/pep-0744.rst
index 3d97d6374f4..05b4456d569 100644
--- a/peps/pep-0744.rst
+++ b/peps/pep-0744.rst
@@ -63,17 +63,18 @@ what paths through the program are being executed the most. In other words,
 
 Since the Python 3.12 release, CPython has generated this interpreter from a
 `C-like domain-specific language
-<https://github.com/python/cpython/blob/main/Python/bytecodes.c>`_. In addition
-to taming some of the complexity of the new adaptive interpreter, the DSL also
-allows CPython's maintainers to avoid hand-writing tedious boilerplate code in
-many parts of the interpreter, compiler, and standard library that must be kept
-in sync with the instruction definitions. This ability to generate large amounts
-of runtime infrastructure from a single source of truth is not only convenient
-for maintenance; it also unlocks many possibilities for expanding CPython's
-execution in new ways. For instance, it makes it feasible to automatically
-generate tables for translating a sequence of instructions into an equivalent
-sequence of smaller "micro-ops", generate an optimizer for sequences of these
-micro-ops, and even generate an entire second interpreter for executing them.
+<https://github.com/python/cpython/blob/main/Python/bytecodes.c>`_ (DSL). In
+addition to taming some of the complexity of the new adaptive interpreter, the
+DSL also allows CPython's maintainers to avoid hand-writing tedious boilerplate
+code in many parts of the interpreter, compiler, and standard library that must
+be kept in sync with the instruction definitions. This ability to generate large
+amounts of runtime infrastructure from a single source of truth is not only
+convenient for maintenance; it also unlocks many possibilities for expanding
+CPython's execution in new ways. For instance, it makes it feasible to
+automatically generate tables for translating a sequence of instructions into an
+equivalent sequence of smaller "micro-ops", generate an optimizer for sequences
+of these micro-ops, and even generate an entire second interpreter for executing
+them.
 
 In fact, since early in the Python 3.13 release cycle, all CPython builds have
 included this exact micro-op translation, optimization, and execution machinery.
@@ -122,8 +123,8 @@ generated from the same DSL used to generate the rest of the interpreter. For a
 widely-used, volunteer-driven project like CPython, this benefit cannot be
 overstated: CPython's maintainers, by merely editing the bytecode definitions,
 will also get the JIT backend updated "for free", for *all* JIT-supported
-platforms, at once. This is equally true whether adding new instructions,
-removing old ones, or fixing bugs in existing ones.
+platforms, at once. This is equally true whether instructions are being added,
+modified, or removed.
 
 Like the rest of the interpreter, the JIT compiler is generated at build time,
 and has no runtime dependencies. It supports a wide range of platforms (see the
@@ -199,7 +200,7 @@ without displaying warnings to the user), it should be treated in much the same
 way as :pep:`11` prescribes: it should have reliable CI/buildbots, and JIT
 failures on tier one and tier two platforms should block releases. Though it's
 not necessary to update :pep:`11` to specify JIT support, it may be helpful to
-do so anyways. Otherwise, a list of supported platforms should be maintained in
+do so anyway. Otherwise, a list of supported platforms should be maintained in
 `the JIT's README
 <https://github.com/python/cpython/blob/main/Tools/jit/README.md>`_.
 
@@ -224,12 +225,15 @@ step.
 Debugging
 ---------
 
-Tools that profile and debug Python code will continue to work fine.
+Tools that profile and debug Python code will continue to work fine. This
+includes in-process tools that use Python-provided functionality (like
+``sys.monitoring``, ``sys.settrace``, or  ``sys.setprofile``), as well as
+out-of-process tools that walk Python frames from the interpreter state.
 
-Currently, it appears that C profilers and debuggers are unable to trace back
-*through* JIT frames. Working with leaf frames is possible (this is how the JIT
-itself is debugged), though it is of limited utility due to the absence of
-proper debugging information for JIT frames.
+However, it appears that profilers and debuggers *for C code* are currently
+unable to trace back through JIT frames. Working with leaf frames is possible
+(this is how the JIT itself is debugged), though it is of limited utility due to
+the absence of proper debugging information for JIT frames.
 
 Since the code templates emitted by the JIT are compiled by Clang, it *may* be
 possible to allow JIT frames to be traced through by simply modifying the
@@ -254,7 +258,7 @@ compilers.
 
 In order to mitigate this risk, the JIT has been written with best practices in
 mind. In particular, the data in question is not exposed by the JIT compiler to
-other parts of the program while it remains writeable, and at *no* point is the
+other parts of the program while it remains writable, and at *no* point is the
 data both |wx|_.
 
 .. Apparently this how you hack together a formatted link:
@@ -303,12 +307,12 @@ Choose the sections that best describe you:
     matrix, and you shouldn't be able to observe differences in the way your
     packages work when the JIT is enabled.
 
-- **If you profile or debug Python code...**
+- **If you profile or debug *Python* code...**
 
   - ...nothing changes for you. All Python profiling and tracing functionality
     remains.
   
-- **If you profile or debug C code...**
+- **If you profile or debug *C* code...**
 
   - ...currently, the ability to trace *through* JIT frames is limited. This may
     cause issues if you need to observe the entire C call stack, rather than
@@ -338,7 +342,7 @@ Choose the sections that best describe you:
       generate the JIT, which is meant to mimic the main interpreter loop. You
       may also occasionally just get unlucky and break JIT code generation,
       which will require you to either modify the Python build scripts yourself,
-      or solicit the help of somebody more familiar with them (below).
+      or solicit the help of somebody more familiar with them (see below).
 
   - **...and you work on the JIT itself...**
 
@@ -349,7 +353,7 @@ Choose the sections that best describe you:
       all sorts of crashes, stepping over machine code in a debugger, staring at
       COFF/ELF/Mach-O dumps, developing on a wide range of platforms, and
       generally being the point of contact for the people changing the bytecode
-      when CI starts failing on their PRs (above). Ideally, you're at least
+      when CI starts failing on their PRs (see above). Ideally, you're at least
       *familiar* with assembly, have taken a couple of courses with "compilers"
       in their name, and have read a blog post or two about linkers.
 
@@ -360,7 +364,6 @@ Choose the sections that best describe you:
       JIT issues), your builds may take up to a minute longer each time the
       relevant files are modified.
 
-
 Reference Implementation
 ========================
 
@@ -473,7 +476,6 @@ early prototypes predated the micro-op interpreter and did exactly this), it
 just doesn't seem to provide enough optimization potential as the more granular
 micro-op format.
 
-
 Add GPU support
 ---------------
 

From 658242958ecdf5192de295272b5a89f406d48c5e Mon Sep 17 00:00:00 2001
From: Brandt Bucher <brandtbucher@microsoft.com>
Date: Thu, 11 Apr 2024 16:58:03 -0700
Subject: [PATCH 13/14] Fix formatting

---
 peps/pep-0744.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/peps/pep-0744.rst b/peps/pep-0744.rst
index 05b4456d569..505a8d831e0 100644
--- a/peps/pep-0744.rst
+++ b/peps/pep-0744.rst
@@ -307,12 +307,12 @@ Choose the sections that best describe you:
     matrix, and you shouldn't be able to observe differences in the way your
     packages work when the JIT is enabled.
 
-- **If you profile or debug *Python* code...**
+- **If you profile or debug Python code...**
 
   - ...nothing changes for you. All Python profiling and tracing functionality
     remains.
   
-- **If you profile or debug *C* code...**
+- **If you profile or debug C code...**
 
   - ...currently, the ability to trace *through* JIT frames is limited. This may
     cause issues if you need to observe the entire C call stack, rather than

From 6959c9f71db5b1027a781e008c48e3ed325b269a Mon Sep 17 00:00:00 2001
From: Brandt Bucher <brandtbucher@microsoft.com>
Date: Thu, 11 Apr 2024 17:05:16 -0700
Subject: [PATCH 14/14] One last tweak

---
 peps/pep-0744.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/peps/pep-0744.rst b/peps/pep-0744.rst
index 505a8d831e0..3ea5ffb308d 100644
--- a/peps/pep-0744.rst
+++ b/peps/pep-0744.rst
@@ -543,8 +543,8 @@ specific to not only the host platform, but also the current build configuration
 for that platform. As such, hosting them would require a significant engineering
 effort in order to build and host dozens of large binary files for each commit
 that changes the generated code. While perhaps feasible, this is not a priority,
-since downloading the required tools is not difficult for most users, and the
-build step is not particularly time-consuming.
+since installing the required tools is not prohibitively difficult for most
+people building CPython, and the build step is not particularly time-consuming.
 
 Since some still remain interested in this possibility, discussion is being
 tracked in `GH-115869 <https://github.com/python/cpython/issues/115869>`_.