Skip to content

Local names implementation #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: pdb_support
Choose a base branch
from

Conversation

andrewleech
Copy link
Owner

Summary

Started with #4 checked out. Then in a new Claude micropython session:

perform a comprehensive code review of the 6 latest commits, reviewing the changes in each one sequentially.. build up a detailed technical log in markdown of the sequence of changes

before these changes, local variable names are discarded and just the the values retained in an array which are then referred to by index. this process is handled in both the compiler and the vm

the initial commits focus on minimal change to expose the local variables by index only to python, via the profile framework. this allows them to be accessed in sys.settrace callbacks. consolidate just these changes into one minimal simplified commit, exclude the larger charge to retain variable names for now. ensure the author on this commits is retained as Jos as per . then think extra hard about how to add unit tests into the standard micropython test suite for this settrace local variable access (by index only)

micropython uses string interning, names of functions, classes, global attributes etc are stored as QSTR_* then referenced by the integer id of that QSTR. strings in C code are parsed by a custom pre-processor script and stored separately in the c binary. for the names of objects in python code the compiler must collect and retain them in the created bytecode. currently the names of local variables is not retained the way other things like functions are, this will need to change so they're retained the same way. the references to these names should be stored in a new list/array, then length of which should be calculated in an equivalent way to hiow the length of the existing local variables values array is calculated/allocated. ideally this new variable names lookup array is stored in such a way it can be accessed by the profile module, or as a new __ attribute on functions itself, or a new .co_ attribute on the code object; think deeply about options like this and find one that would not result in a change to the bytecode format. also prioritise options that minimise the size of code change. the end goal is the have a list of names that can be associated with the list of local values accessed in a systrace callback. think deeply about options, present them to me in a technical planning document.

Executive Summary

This document presents architectural options for preserving local variable names in MicroPython's compilation and runtime system, enabling proper name resolution in sys.settrace() callbacks without modifying the bytecode format.

Current Architecture Analysis

String Interning (QSTR) System

  • MicroPython uses QSTR (interned strings) for all identifiers
  • Function names, class names, and global attributes are already preserved as QSTRs
  • Local variable names exist as QSTRs during compilation but are discarded after bytecode generation
  • The QSTR system is highly optimized for memory efficiency

Compilation Flow

Source Code → Lexer (QSTRs created) → Parser → Scope Analysis → Bytecode Emission
                                                      ↓
                                              Local names exist here
                                              but are discarded

Current Data Structures

// During compilation (py/scope.h)
typedef struct _id_info_t {
    uint8_t kind;
    uint8_t flags;
    uint16_t local_num;
    qstr qst;  // Variable name as QSTR - currently discarded for locals
} id_info_t;

// Runtime structure (py/emitglue.h)
typedef struct _mp_raw_code_t {
    // ... existing fields ...
    mp_bytecode_prelude_t prelude;  // Contains n_state, n_pos_args, etc.
    // No local name information preserved
} mp_raw_code_t;

Design Constraints

  1. No bytecode format changes - Existing .mpy files must remain compatible
  2. Minimal code changes - Reduce implementation complexity
  3. Memory efficiency - Only store when needed (conditional compilation)
  4. QSTR integration - Leverage existing string interning system
  5. Profile accessibility - Names must be accessible from sys.settrace() callbacks

Proposed Solutions

Option 1: Extend mp_raw_code_t (Recommended)

Implementation:

// py/emitglue.h
typedef struct _mp_raw_code_t {
    // ... existing fields ...
    #if MICROPY_PY_SYS_SETTRACE_LOCALNAMES
    const qstr *local_names;  // Array of QSTRs indexed by local_num
    #endif
} mp_raw_code_t;

Advantages:

  • Natural location alongside other function metadata
  • Already accessible from profile module via code_state->fun_bc->rc
  • Follows existing pattern (line_info storage)
  • No runtime overhead when disabled

Implementation Details:

  1. Allocate during mp_emit_glue_new_raw_code()
  2. Populate in scope_compute_things() after local_num assignment
  3. Access in frame_f_locals() to map indices to names

Memory Cost:

  • sizeof(qstr) * num_locals per function (typically 2-4 bytes per local)

Option 2: Function Object Attribute

Implementation:

// Add new attribute to function objects
// Accessed as: func.__localnames__ or func.co_varnames

Advantages:

  • Python-accessible for introspection
  • Compatible with CPython's co_varnames
  • No change to raw_code structure

Disadvantages:

  • Requires modifying function object creation
  • Additional indirection at runtime
  • More complex implementation

Option 3: Separate Global Mapping

Implementation:

// Global hash table: raw_code_ptr → local_names_array
static mp_map_t raw_code_to_locals_map;

Advantages:

  • Completely decoupled from existing structures
  • Can be added/removed without touching core structures

Disadvantages:

  • Additional lookup overhead
  • Memory overhead for hash table
  • Cleanup complexity for garbage collection

Option 4: Encode in Bytecode Prelude

Implementation:

  • Extend bytecode prelude with optional local names section
  • Use a flag bit to indicate presence

Advantages:

  • Data travels with bytecode
  • Works with frozen bytecode

Disadvantages:

  • Violates constraint: Changes bytecode format
  • Breaks .mpy compatibility
  • Increases bytecode size

Recommended Implementation Plan

Phase 1: Core Infrastructure (Option 1)

  1. Add conditional field to mp_raw_code_t:
// py/emitglue.h
#if MICROPY_PY_SYS_SETTRACE_LOCALNAMES
const qstr *local_names;  // NULL if no locals or disabled
#endif
  1. Allocate and populate during compilation:
// py/compile.c - in scope_compute_things()
#if MICROPY_PY_SYS_SETTRACE_LOCALNAMES
if (scope->num_locals > 0) {
    // Allocate array
    qstr *names = m_new0(qstr, scope->num_locals);
    
    // Populate from id_info
    for (int i = 0; i < scope->id_info_len; i++) {
        id_info_t *id = &scope->id_info[i];
        if (ID_IS_LOCAL(id->kind) && id->local_num < scope->num_locals) {
            names[id->local_num] = id->qst;
        }
    }
    
    scope->raw_code->local_names = names;
}
#endif
  1. Use in profile module:
// py/profile.c - in frame_f_locals()
#if MICROPY_PY_SYS_SETTRACE_LOCALNAMES
const mp_raw_code_t *rc = code_state->fun_bc->rc;
if (rc->local_names != NULL) {
    qstr name = rc->local_names[i];
    if (name != MP_QSTR_NULL) {
        // Use actual name instead of local_XX
    }
}
#endif

Phase 2: Python Accessibility (Optional)

Add co_varnames attribute to code objects:

def func(a, b):
    x = 1
    y = 2

print(func.__code__.co_varnames)  # ('a', 'b', 'x', 'y')

Memory Optimization Strategies

  1. Share common patterns:

    • Many functions have similar local patterns (i, j, x, y)
    • Could use a pool of common name arrays
  2. Compress storage:

    • Store only non-parameter locals (parameters can be reconstructed)
    • Use bit flags for common names
  3. Lazy allocation:

    • Only allocate when settrace is active
    • Use weak references for cleanup

Size Impact Analysis

Typical function with 4 locals:

  • Storage: 4 * sizeof(qstr) = 8-16 bytes
  • Overhead: ~0.5% of typical raw_code size

Mitigation:

  • Only enabled with MICROPY_PY_SYS_SETTRACE_LOCALNAMES
  • Zero cost when disabled

Testing Strategy

  1. Correctness tests:

    • Verify name mapping matches source order
    • Handle edge cases (no locals, many locals)
    • Test with nested functions and closures
  2. Memory tests:

    • Measure overhead with typical programs
    • Verify cleanup on function deallocation
  3. Compatibility tests:

    • Ensure .mpy files work unchanged
    • Test frozen bytecode compatibility

Conclusion

Option 1 (Extend mp_raw_code_t) provides the best balance of:

  • Minimal code changes
  • Natural integration with existing architecture
  • Zero overhead when disabled
  • Direct accessibility from profile module

This approach preserves the bytecode format while enabling full local variable name resolution in debugging scenarios.


The implantation here started as suggested above, but then extended to also include the "bytecode modification" option as well behind second feature flag.

● Local Variable Names Schemes Summary ## Phase 1 (RAM Storage - MICROPY_PY_SYS_SETTRACE_LOCALNAMES) preserves local variable names in memory during source file compilation by extending the mp_raw_code_t structure with a local_names array. This approach provides excellent debugging experience for source-based development with minimal implementation complexity and zero bytecode format changes, ensuring complete backward compatibility. However, it has significant limitations: local variable names are only available when running from source files, .mpy files fall back to generic local_XX naming, and there's a runtime memory overhead of ~8 bytes plus 4-8 bytes per local variable per function. This makes it ideal for development and debugging workflows but less suitable for production deployments using pre-compiled .mpy files.

Phase 2 (Bytecode Persistence -

MICROPY_PY_SYS_SETTRACE_LOCALNAMES_PERSIST) extends the bytecode format to store local variable names
directly in the source info section of .mpy files, enabling debugging support for pre-compiled modules. The major advantages include complete debugging coverage for both source and .mpy files, persistent local names that survive compilation, graceful compatibility across MicroPython versions (files with local names work on older versions, just without the names), and relatively small file size
overhead (~1-5 bytes plus ~10 bytes per local
variable). The main drawbacks are increased .mpy
file sizes proportional to the number of local variables, slightly more complex implementation requiring bytecode format extensions, and additional
compilation time to encode the local names. This
scheme provides the most comprehensive solution for production debugging scenarios where .mpy files are the primary deployment format.

Testing

Trade-offs and Alternatives

Josverl and others added 5 commits June 16, 2025 17:25
This minimal change enables access to function local variables through
frame.f_locals in sys.settrace() callbacks. Variables are exposed with
index-based names (local_00, local_01, etc.) corresponding to their
position in the VM's state array.

The implementation:
- Exposes all non-NULL values in code_state->state array
- Uses zero-padded index naming for consistency
- Maintains backward compatibility when settrace is disabled
- Adds comprehensive unit tests for various usage scenarios

This provides the foundation for debugging tools and profilers to
access local variable values during program execution.

Signed-off-by: Andrew Leech <[email protected]>
This commit implements complete local variable name preservation for
MicroPython's sys.settrace() functionality, providing both RAM-based
storage (Phase 1) and bytecode persistence (Phase 2) for debugging tools.

Key Features:
- Phase 1: Local variable names preserved in RAM during compilation
- Phase 2: Local variable names stored in .mpy bytecode files
- Hybrid architecture with graceful fallback behavior
- Full backward and forward compatibility maintained
- Bounds checking prevents memory access violations

Phase 1 Implementation (MICROPY_PY_SYS_SETTRACE_LOCALNAMES):
- py/compile.c: Collect local variable names during compilation
- py/emitglue.h: Extended mp_raw_code_t with local_names array
- py/profile.c: Expose real names through frame.f_locals
- Unified access via mp_raw_code_get_local_name() with bounds checking

Phase 2 Implementation (MICROPY_PY_SYS_SETTRACE_LOCALNAMES_PERSIST):
- py/emitbc.c: Extended bytecode source info section with local names
- py/persistentcode.c: Save/load functions for .mpy file support
- Format detection via source info section size analysis
- No bytecode version bump required for compatibility

Testing and Documentation:
- Comprehensive unit tests for both phases
- Updated user documentation in docs/library/sys.rst
- Complete developer documentation in docs/develop/sys_settrace_localnames.rst
- All tests pass with both indexed and named variable access

Memory Usage:
- Phase 1: ~8 bytes + (num_locals * sizeof(qstr)) per function
- Phase 2: ~1-5 bytes + (num_locals * ~10 bytes) per .mpy function
- Disabled by default to minimize impact

Compatibility Matrix:
- Source files: Full local names support with Phase 1
- .mpy files: Index-based fallback without Phase 2, full names with Phase 2
- Graceful degradation across all MicroPython versions

Signed-off-by: Andrew Leech <[email protected]>
Remove accidental submodule changes that were introduced in an earlier commit.
These submodule updates were not related to the settrace functionality
and should be reverted to maintain a clean commit history.

Reverted submodules:
- lib/nxp_driver
- lib/protobuf-c
- lib/wiznet5k

Signed-off-by: Andrew Leech <[email protected]>
This commit completes the local variable name preservation feature by
implementing Phase 2 (bytecode persistence) and updating all documentation
to reflect the complete implementation.

Phase 2 Implementation (MICROPY_PY_SYS_SETTRACE_LOCALNAMES_PERSIST):
- py/emitbc.c: Extended bytecode generation to include local names in source info
- py/persistentcode.c: Added save/load functions for .mpy local names support
- py/persistentcode.h: Function declarations for Phase 2 functionality
- Format detection via source info section size without bytecode version bump

Documentation Updates:
- docs/library/sys.rst: Enhanced user documentation with examples and features
- docs/develop/sys_settrace_localnames.rst: Added Phase 2 implementation details,
  updated memory usage documentation, added compatibility matrix
- Removed obsolete planning documents (TECHNICAL_PLAN_LOCAL_NAMES.md)

Testing:
- tests/basics/sys_settrace_localnames_persist.py: Phase 2 functionality tests
- ports/unix/variants/standard/mpconfigvariant.h: Enabled Phase 2 for testing

Configuration:
- py/mpconfig.h: Updated Phase 2 dependencies documentation

Key Features:
- Backward/forward compatibility maintained across all MicroPython versions
- .mpy files can now preserve local variable names when compiled with Phase 2
- Graceful degradation when Phase 2 disabled or .mpy lacks local names
- Complete user and developer documentation covering both phases

Memory Overhead:
- .mpy files: ~1-5 bytes + (num_locals * ~10 bytes) per function when enabled
- Runtime: Same as Phase 1 when loading local names from .mpy files

Signed-off-by: Andrew Leech <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants