Fix parsing of integer literals with base prefix #106

wnienhaus · 2025-06-19T20:24:57Z

MicroPython 1.25.0 introduced a breaking change, aligning the behaviour of the int() function closer to the behaviour of CPython (something along the lines of: strings are assumed to represent a decimal number, unless a base is specified. if a base of 0 is specified, is the base is inferred from the string)

This broke our parsing logic, which relied on the previous behaviour of the int() function to automatically determine the base of the string literal, based on a base prefix present in the string. Specifying base 0 was not a solution, as this resulted in parsing behaviour different from GNU as.

Additionally, we never actually parsed octal in the format 0100 correctly - even before this PR; that number would have been interpreted as 100 rather than 64.

So, to fix this, and to ensure our parsing matches the GNU assembler, this PR implements a custom parse_int() function, using the base prefix in a string to determine the correct base to pass to int(). The following are supported:

0x -> treated as hex
0b -> treated as binary
0... -> treated as octal
0o -> treated as octal
anything else parsed as decimal

The parse_int method also supports the negative prefix operator for all of the above cases.

This change also ensures .int, .long, .word directives correctly handle the above mentioned formats. This fixes the issue described in #104.

Note: GNU as does not actually accept the octal prefix 0o..., but we accept it as a convenience, as this is accepted in Python code. This means however, that our assembler accepts code which GNU as does not accept. But the other way around, we still accept all code that GNU as accepts, which was one of our goals.

wnienhaus · 2025-06-19T20:45:10Z

After merging #107 the tests now pass.

dpgeorge

Specifying base 0 was not a solution, as this resulted in parsing behaviour different from GNU as.

I guess the simplest fix here would be to just replace int(x) with int(x, 0). That should restore the existing behaviour. But it looks like you want to improve things further, which is great!

dpgeorge · 2025-06-20T01:57:38Z

esp32_ulp/opcodes.py

    parts = "".join(parts)
    if not validate_expression(parts):
        raise ValueError('Unsupported expression: %s' % parts)
    return eval(parts)


+def parse_int(literal):


I'm not familiar with this code base, but would it make sense to factor this function out into a separate file, so it can be reused in opcodes_s2.py?

Similarly, could have a single unit test for this function in a separate testing file.

(Just a suggestion 😄 )

Sounds good, also from a code (de-)duplication aspect.

wnienhaus self-assigned this Jun 19, 2025

wnienhaus requested a review from ThomasWaldmann June 19, 2025 20:25

wnienhaus removed their assignment Jun 19, 2025

wnienhaus mentioned this pull request Jun 19, 2025

Update builder image to ubuntu-22.04 #107

Merged

fix parsing of integer literals with base prefix

23f8ab4

wnienhaus force-pushed the fix-int-parsing-with-base-prefix branch from 9452423 to 23f8ab4 Compare June 19, 2025 20:42

dpgeorge reviewed Jun 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix parsing of integer literals with base prefix #106

Fix parsing of integer literals with base prefix #106

Uh oh!

wnienhaus commented Jun 19, 2025

Uh oh!

wnienhaus commented Jun 19, 2025

Uh oh!

dpgeorge left a comment

Uh oh!

dpgeorge Jun 20, 2025

Uh oh!

ThomasWaldmann Jun 20, 2025

Uh oh!

Uh oh!

Fix parsing of integer literals with base prefix #106

Are you sure you want to change the base?

Fix parsing of integer literals with base prefix #106

Uh oh!

Conversation

wnienhaus commented Jun 19, 2025

Uh oh!

wnienhaus commented Jun 19, 2025

Uh oh!

dpgeorge left a comment

Choose a reason for hiding this comment

Uh oh!

dpgeorge Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

ThomasWaldmann Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!