Skip to content

Intern string representation of operators and some other symbolic literals #136757

@abebus

Description

@abebus

Feature or enhancement

Proposal:

Python currently interns certain strings, such as keywords and some ASCII/Unicode characters, as well as module-specific strings. I propose extending this interning mechanism to the string representations of operators (e.g., "+=", "==", "|=").

Rationale:
Interning these strings could improve performance, particularly in code parsing workflows, by:

  • Reducing memory overhead for repeated operator strings.
  • Accelerating string comparisons (e.g., during AST construction or bytecode generation).

Target Symbols:
The following multi-character syntactic literals (with len() > 1) are candidates for interning:

# Syntax literals  
'...', '->' 

# Operators  
'**', '//', '==', '!=', '>=', '<=', ':=',  
'+=', '-=', '*=', '/=', '//=', '%=', '**=',  
'<<', '>>', '<<=', '>>=', '&=', '|=', '^='  

# And maybe character sequence that used in REPL?
'>>>'

Proof of Concept:
A preliminary implementation is available here, demonstrating the feasibility of this change.

Considerations:

  • The change would be low-risk, as it targets immutable, statically known strings.
  • The impact on startup time and memory usage should be negligible, given the small set of operators.

Would this be a worthwhile optimization for CPython? I’d appreciate feedback on the idea and the PoC.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagetype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions