From 60b16891aa9c00f4412494a35df890e5991226dc Mon Sep 17 00:00:00 2001 From: Kirill Podoprigora Date: Tue, 24 Sep 2024 19:38:56 +0300 Subject: [PATCH 01/11] Move files to InternalDocs --- {Python => InternalDocs}/tier2_engine.md | 0 {Python => InternalDocs}/vm-state.md | 14 ++------------ 2 files changed, 2 insertions(+), 12 deletions(-) rename {Python => InternalDocs}/tier2_engine.md (100%) rename {Python => InternalDocs}/vm-state.md (80%) diff --git a/Python/tier2_engine.md b/InternalDocs/tier2_engine.md similarity index 100% rename from Python/tier2_engine.md rename to InternalDocs/tier2_engine.md diff --git a/Python/vm-state.md b/InternalDocs/vm-state.md similarity index 80% rename from Python/vm-state.md rename to InternalDocs/vm-state.md index b3246557dbeea3..0e68e9ff559862 100644 --- a/Python/vm-state.md +++ b/InternalDocs/vm-state.md @@ -5,22 +5,12 @@ - **Tier 1** is the classic Python bytecode interpreter. This includes the specializing adaptive interpreter described in [PEP 659](https://peps.python.org/pep-0659/) and introduced in Python 3.11. - **Tier 2**, also known as the micro-instruction ("uop") interpreter, is a new interpreter with a different instruction format. - It will be introduced in Python 3.13, and also forms the basis for a JIT using copy-and-patch technology that is likely to be introduced at the same time (but, unlike the Tier 2 interpreter, hasn't landed in the main branch yet). + It introduced in Python 3.13, and also forms the basis for a JIT using copy-and-patch technology. See [Tier 2](tier2_engine.md) for more information. # Frame state Almost all interpreter state is nominally stored in the frame structure. -A pointer to the current frame is held in `frame`. It contains: - -- **local variables** (a.k.a. "fast locals") -- **evaluation stack** (tacked onto the end of the locals) -- **stack top** (an integer giving the top of the evaluation stack) -- **instruction pointer** -- **code object**, which holds things like the array of instructions, lists of constants and names referenced by certain instructions, the exception handling table, and the table that translates instruction offsets to line numbers -- **return offset**, only relevant during calls, telling the interpreter where to return - -There are some other fields in the frame structure of less importance; notably frames are linked together in a singly-linked list via the `previous` pointer, pointing from callee to caller. -The frame also holds a pointer to the current function, globals, builtins, and the locals converted to dict (used to support the `locals()` built-in). +A pointer to the current frame is held in `frame`, for more information about what `frame` contains see [Frames](frames.md): ## Fast locals and evaluation stack From fbf47e45c9946e9e68ba91391d6ce5b8f89953dd Mon Sep 17 00:00:00 2001 From: Kirill Podoprigora Date: Tue, 24 Sep 2024 20:33:43 +0300 Subject: [PATCH 02/11] Fix grammar --- InternalDocs/vm-state.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/InternalDocs/vm-state.md b/InternalDocs/vm-state.md index 0e68e9ff559862..8511f41ec980ca 100644 --- a/InternalDocs/vm-state.md +++ b/InternalDocs/vm-state.md @@ -5,7 +5,7 @@ - **Tier 1** is the classic Python bytecode interpreter. This includes the specializing adaptive interpreter described in [PEP 659](https://peps.python.org/pep-0659/) and introduced in Python 3.11. - **Tier 2**, also known as the micro-instruction ("uop") interpreter, is a new interpreter with a different instruction format. - It introduced in Python 3.13, and also forms the basis for a JIT using copy-and-patch technology. See [Tier 2](tier2_engine.md) for more information. + It was introduced in Python 3.13, and also forms the basis for a JIT using copy-and-patch technology. See [Tier 2](tier2_engine.md) for more information. # Frame state From 3aaa70c3ed7393aaf2597fd95f7112eda647e497 Mon Sep 17 00:00:00 2001 From: Kirill Podoprigora Date: Tue, 24 Sep 2024 22:02:17 +0300 Subject: [PATCH 03/11] Add link to VM in the README --- InternalDocs/README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/InternalDocs/README.md b/InternalDocs/README.md index 95181a420f1dfb..4a96be00e3ea24 100644 --- a/InternalDocs/README.md +++ b/InternalDocs/README.md @@ -21,3 +21,5 @@ it is not, please report that through the [The Source Code Locations Table](locations.md) [Exception Handling](exception_handling.md) + +[Virtual Machine](vm-state.md) From 1a6df69e3e46347f5a78c58392390caf93f61648 Mon Sep 17 00:00:00 2001 From: Kirill Podoprigora Date: Fri, 27 Sep 2024 20:46:45 +0300 Subject: [PATCH 04/11] Apply suggestions from code review Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com> --- InternalDocs/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/InternalDocs/README.md b/InternalDocs/README.md index 4a96be00e3ea24..34acfc382cff57 100644 --- a/InternalDocs/README.md +++ b/InternalDocs/README.md @@ -22,4 +22,4 @@ it is not, please report that through the [Exception Handling](exception_handling.md) -[Virtual Machine](vm-state.md) +[The Virtual Machine](vm-state.md) From 944d1ab1b817260b1f312fa2682bee3a60155b70 Mon Sep 17 00:00:00 2001 From: Kirill Podoprigora Date: Fri, 27 Sep 2024 20:48:52 +0300 Subject: [PATCH 05/11] Remove reference to PEP 695; instead add reference to adaptive.md --- InternalDocs/vm-state.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/InternalDocs/vm-state.md b/InternalDocs/vm-state.md index 8511f41ec980ca..b6a22a31871b97 100644 --- a/InternalDocs/vm-state.md +++ b/InternalDocs/vm-state.md @@ -3,7 +3,7 @@ ## Definition of Tiers - **Tier 1** is the classic Python bytecode interpreter. - This includes the specializing adaptive interpreter described in [PEP 659](https://peps.python.org/pep-0659/) and introduced in Python 3.11. + This includes the specializing [Adaptive Interpreter](adaptive.md). - **Tier 2**, also known as the micro-instruction ("uop") interpreter, is a new interpreter with a different instruction format. It was introduced in Python 3.13, and also forms the basis for a JIT using copy-and-patch technology. See [Tier 2](tier2_engine.md) for more information. From 0c2489fa63eed972a64086b9cdede37dd87e8100 Mon Sep 17 00:00:00 2001 From: Kirill Podoprigora Date: Fri, 27 Sep 2024 20:58:06 +0300 Subject: [PATCH 06/11] Add reference to exception_handling.md --- InternalDocs/vm-state.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/InternalDocs/vm-state.md b/InternalDocs/vm-state.md index b6a22a31871b97..2182ada46afabf 100644 --- a/InternalDocs/vm-state.md +++ b/InternalDocs/vm-state.md @@ -3,7 +3,7 @@ ## Definition of Tiers - **Tier 1** is the classic Python bytecode interpreter. - This includes the specializing [Adaptive Interpreter](adaptive.md). + This includes the specializing [adaptive interpreter](adaptive.md). - **Tier 2**, also known as the micro-instruction ("uop") interpreter, is a new interpreter with a different instruction format. It was introduced in Python 3.13, and also forms the basis for a JIT using copy-and-patch technology. See [Tier 2](tier2_engine.md) for more information. @@ -37,7 +37,7 @@ The Tier 2 instruction pointer is strictly internal to the Tier 2 interpreter, s ## Unwinding -Unwinding uses exception tables to find the next point at which normal execution can occur, or fail if there are no exception handlers. +Unwinding uses exception tables to find the next point at which normal execution can occur, or fail if there are no exception handlers. For more information on what exception tables are, see [exception handling](exception_handling.md). During unwinding both the stack and the instruction pointer should be in their canonical, in-memory representation. ## Jumps in bytecode From c7cdfb5f6b071d3818953f7b440289995d9f23ad Mon Sep 17 00:00:00 2001 From: Kirill Podoprigora Date: Fri, 27 Sep 2024 21:00:29 +0300 Subject: [PATCH 07/11] Move thread state section right after the frame state section --- InternalDocs/vm-state.md | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/InternalDocs/vm-state.md b/InternalDocs/vm-state.md index 2182ada46afabf..572159dd851742 100644 --- a/InternalDocs/vm-state.md +++ b/InternalDocs/vm-state.md @@ -7,11 +7,22 @@ - **Tier 2**, also known as the micro-instruction ("uop") interpreter, is a new interpreter with a different instruction format. It was introduced in Python 3.13, and also forms the basis for a JIT using copy-and-patch technology. See [Tier 2](tier2_engine.md) for more information. + # Frame state Almost all interpreter state is nominally stored in the frame structure. A pointer to the current frame is held in `frame`, for more information about what `frame` contains see [Frames](frames.md): +# Thread state and interpreter state + +Another important piece of VM state is the **thread state**, held in `tstate`. +The current frame pointer, `frame`, is always equal to `tstate->current_frame`. +The thread state also holds the exception state (`tstate->exc_info`) and the recursion counters (`tstate->c_recursion_remaining` and `tstate->py_recursion_remaining`). + +The thread state is also used to access the **interpreter state** (`tstate->interp`), which is important since the "eval breaker" flags are stored there (`tstate->interp->ceval.eval_breaker`, an "atomic" variable), as well as the "PEP 523 function" (`tstate->interp->eval_frame`). +The interpreter state also holds the optimizer state (`optimizer` and some counters). +Note that the eval breaker may be moved to the thread state soon as part of the multicore (PEP 703) work. + ## Fast locals and evaluation stack The frame contains a single array of object pointers, `localsplus`, which contains both the fast locals and the stack. @@ -59,16 +70,6 @@ It will be more complex in the JIT. (We might also consider deoptimizations as a separate jump type.) -# Thread state and interpreter state - -Another important piece of VM state is the **thread state**, held in `tstate`. -The current frame pointer, `frame`, is always equal to `tstate->current_frame`. -The thread state also holds the exception state (`tstate->exc_info`) and the recursion counters (`tstate->c_recursion_remaining` and `tstate->py_recursion_remaining`). - -The thread state is also used to access the **interpreter state** (`tstate->interp`), which is important since the "eval breaker" flags are stored there (`tstate->interp->ceval.eval_breaker`, an "atomic" variable), as well as the "PEP 523 function" (`tstate->interp->eval_frame`). -The interpreter state also holds the optimizer state (`optimizer` and some counters). -Note that the eval breaker may be moved to the thread state soon as part of the multicore (PEP 703) work. - # Tier 2 IR format The tier 2 IR (Internal Representation) format is also the basis for the Tier 2 interpreter (though the two formats may eventually differ). This format is also used as the input to the machine code generator (the JIT compiler). From be38623947e73086a2e575b27f97206c33b07c88 Mon Sep 17 00:00:00 2001 From: Kirill Podoprigora Date: Fri, 27 Sep 2024 21:01:00 +0300 Subject: [PATCH 08/11] Remove Tier2 IR section from vm-state --- InternalDocs/vm-state.md | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/InternalDocs/vm-state.md b/InternalDocs/vm-state.md index 572159dd851742..8c42f3fc8eccf9 100644 --- a/InternalDocs/vm-state.md +++ b/InternalDocs/vm-state.md @@ -69,13 +69,3 @@ Patching exits should be fairly straightforward in the interpreter. It will be more complex in the JIT. (We might also consider deoptimizations as a separate jump type.) - -# Tier 2 IR format - -The tier 2 IR (Internal Representation) format is also the basis for the Tier 2 interpreter (though the two formats may eventually differ). This format is also used as the input to the machine code generator (the JIT compiler). - -Tier 2 IR entries are all the same size; there is no equivalent to `EXTENDED_ARG` or trailing inline cache entries. Each instruction is a struct with the following fields (all integers of varying sizes): - -- **opcode**: Sometimes the same as a Tier 1 opcode, sometimes a separate micro opcode. Tier 2 opcodes are 9 bits (as opposed to Tier 1 opcodes, which fit in 8 bits). By convention, Tier 2 opcode names start with `_`. -- **oparg**: The argument. Usually the same as the Tier 1 oparg after expansion of `EXTENDED_ARG` prefixes. Up to 32 bits. -- **operand**: An additional argument, Typically the value of *one* cache item from the Tier 1 inline cache, up to 64 bits. From 26d55595970fea72bd596a2e424453e2b789ca2d Mon Sep 17 00:00:00 2001 From: Kirill Podoprigora Date: Mon, 13 Jan 2025 14:11:53 +0200 Subject: [PATCH 09/11] Address review --- InternalDocs/README.md | 2 -- InternalDocs/frames.md | 16 ++++++++++++++++ {InternalDocs => Python}/tier2_engine.md | 10 ++++++++++ {InternalDocs => Python}/vm-state.md | 22 ++-------------------- 4 files changed, 28 insertions(+), 22 deletions(-) rename {InternalDocs => Python}/tier2_engine.md (85%) rename {InternalDocs => Python}/vm-state.md (69%) diff --git a/InternalDocs/README.md b/InternalDocs/README.md index 34acfc382cff57..95181a420f1dfb 100644 --- a/InternalDocs/README.md +++ b/InternalDocs/README.md @@ -21,5 +21,3 @@ it is not, please report that through the [The Source Code Locations Table](locations.md) [Exception Handling](exception_handling.md) - -[The Virtual Machine](vm-state.md) diff --git a/InternalDocs/frames.md b/InternalDocs/frames.md index 34682adb1b422e..19b7c8a665425b 100644 --- a/InternalDocs/frames.md +++ b/InternalDocs/frames.md @@ -36,6 +36,20 @@ This seems to provide the best performance without excessive complexity. The specials have a fixed size, so the offset of the locals is know. The interpreter needs to hold two pointers, a frame pointer and a stack pointer. +### Fast locals and evaluation stack + +The frame contains a single array of object pointers, `localsplus`, +which contains both the fast locals and the stack. The top of the +stack, including the locals, is indicated by `stacktop`. +For example, in a function with three locals, if the stack contains +one value, `frame->stacktop == 4`. + +The interpreters share an implementation which uses the same memory +but caches the depth (as a pointer) in a C local, `stack_pointer`. +We aren't sure yet exactly how the JIT will implement the stack; +likely some of the values near the top of the stack will be held in registers. + + #### Alternative layout An alternative layout that was used for part of 3.11 alpha was: @@ -124,6 +138,8 @@ if the frame were to resume. After `frame.f_lineno` is set, `instr_ptr` points t the next instruction to be executed. During a call to a python function, `instr_ptr` points to the call instruction, because this is what we would expect to see in an exception traceback. +Dispatching on `instr_ptr` would be very inefficient, so in Tier 1 we cache the +upcoming value of `instr_ptr` in the C local `next_instr`. The `return_offset` field determines where a `RETURN` should go in the caller, relative to `instr_ptr`. It is only meaningful to the callee, so it needs to diff --git a/InternalDocs/tier2_engine.md b/Python/tier2_engine.md similarity index 85% rename from InternalDocs/tier2_engine.md rename to Python/tier2_engine.md index 5ceda8e806045d..5974fd156dab1b 100644 --- a/InternalDocs/tier2_engine.md +++ b/Python/tier2_engine.md @@ -148,3 +148,13 @@ TO DO. The implementation will change soon, so there is no point in documenting it until then. + +# Tier 2 IR format + +The tier 2 IR (Internal Representation) format is also the basis for the Tier 2 interpreter (though the two formats may eventually differ). This format is also used as the input to the machine code generator (the JIT compiler). + +Tier 2 IR entries are all the same size; there is no equivalent to `EXTENDED_ARG` or trailing inline cache entries. Each instruction is a struct with the following fields (all integers of varying sizes): + +- **opcode**: Sometimes the same as a Tier 1 opcode, sometimes a separate micro opcode. Tier 2 opcodes are 9 bits (as opposed to Tier 1 opcodes, which fit in 8 bits). By convention, Tier 2 opcode names start with `_`. +- **oparg**: The argument. Usually the same as the Tier 1 oparg after expansion of `EXTENDED_ARG` prefixes. Up to 32 bits. +- **operand**: An additional argument, Typically the value of *one* cache item from the Tier 1 inline cache, up to 64 bits. diff --git a/InternalDocs/vm-state.md b/Python/vm-state.md similarity index 69% rename from InternalDocs/vm-state.md rename to Python/vm-state.md index 8c42f3fc8eccf9..3df621695abc9d 100644 --- a/InternalDocs/vm-state.md +++ b/Python/vm-state.md @@ -3,15 +3,11 @@ ## Definition of Tiers - **Tier 1** is the classic Python bytecode interpreter. - This includes the specializing [adaptive interpreter](adaptive.md). -- **Tier 2**, also known as the micro-instruction ("uop") interpreter, is a new interpreter with a different instruction format. + This includes the specializing [adaptive interpreter](../InternalDocs/adaptive.md). +- **Tier 2**, also known as the micro-instruction ("uop") interpreter, is a new execution engine. It was introduced in Python 3.13, and also forms the basis for a JIT using copy-and-patch technology. See [Tier 2](tier2_engine.md) for more information. -# Frame state - -Almost all interpreter state is nominally stored in the frame structure. -A pointer to the current frame is held in `frame`, for more information about what `frame` contains see [Frames](frames.md): # Thread state and interpreter state @@ -23,20 +19,6 @@ The thread state is also used to access the **interpreter state** (`tstate->inte The interpreter state also holds the optimizer state (`optimizer` and some counters). Note that the eval breaker may be moved to the thread state soon as part of the multicore (PEP 703) work. -## Fast locals and evaluation stack - -The frame contains a single array of object pointers, `localsplus`, which contains both the fast locals and the stack. -The top of the stack, including the locals, is indicated by `stacktop`. -For example, in a function with three locals, if the stack contains one value, `frame->stacktop == 4`. - -The interpreters share an implementation which uses the same memory but caches the depth (as a pointer) in a C local, `stack_pointer`. -We aren't sure yet exactly how the JIT will implement the stack; likely some of the values near the top of the stack will be held in registers. - -## Instruction pointer - -The canonical, in-memory, representation of the instruction pointer is `frame->instr_ptr`. -It always points to an instruction in the bytecode array of the frame's code object. -Dispatching on `frame->instr_ptr` would be very inefficient, so in Tier 1 we cache the upcoming value of `frame->instr_ptr` in the C local `next_instr`. ## Tier 2 From 52225ebb6ad61ce67d1243efd1bb27ab2915eeb0 Mon Sep 17 00:00:00 2001 From: Kirill Podoprigora Date: Mon, 13 Jan 2025 14:20:19 +0200 Subject: [PATCH 10/11] Move unwinding to the exception_handling --- InternalDocs/exception_handling.md | 2 ++ Python/vm-state.md | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/InternalDocs/exception_handling.md b/InternalDocs/exception_handling.md index ec09e0769929fa..fd3f72953adde6 100644 --- a/InternalDocs/exception_handling.md +++ b/InternalDocs/exception_handling.md @@ -79,6 +79,8 @@ If no handler is found, the program terminates. During unwinding, the traceback is constructed as each frame is added to it by ``PyTraceBack_Here()``, which is in [Python/traceback.c](https://github.com/python/cpython/blob/main/Python/traceback.c). +Unwinding uses exception tables to find the next point at which normal execution can +occur, or fail if there are no exception handlers. Along with the location of an exception handler, each entry of the exception table also contains the stack depth of the `try` instruction diff --git a/Python/vm-state.md b/Python/vm-state.md index 3df621695abc9d..b9a2e120ebf06e 100644 --- a/Python/vm-state.md +++ b/Python/vm-state.md @@ -11,7 +11,7 @@ # Thread state and interpreter state -Another important piece of VM state is the **thread state**, held in `tstate`. +An important piece of VM state is the **thread state**, held in `tstate`. The current frame pointer, `frame`, is always equal to `tstate->current_frame`. The thread state also holds the exception state (`tstate->exc_info`) and the recursion counters (`tstate->c_recursion_remaining` and `tstate->py_recursion_remaining`). From ba8f39ae8de6a9f7bd3bbd16f0414ace636efff9 Mon Sep 17 00:00:00 2001 From: Kirill Podoprigora Date: Mon, 13 Jan 2025 14:21:39 +0200 Subject: [PATCH 11/11] Unwinding is already covered; Reverting --- InternalDocs/exception_handling.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/InternalDocs/exception_handling.md b/InternalDocs/exception_handling.md index fd3f72953adde6..ec09e0769929fa 100644 --- a/InternalDocs/exception_handling.md +++ b/InternalDocs/exception_handling.md @@ -79,8 +79,6 @@ If no handler is found, the program terminates. During unwinding, the traceback is constructed as each frame is added to it by ``PyTraceBack_Here()``, which is in [Python/traceback.c](https://github.com/python/cpython/blob/main/Python/traceback.c). -Unwinding uses exception tables to find the next point at which normal execution can -occur, or fail if there are no exception handlers. Along with the location of an exception handler, each entry of the exception table also contains the stack depth of the `try` instruction