Skip to content

Fix parsing of integer literals with base prefix #106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

wnienhaus
Copy link
Collaborator

MicroPython 1.25.0 introduced a breaking change, aligning the behaviour of the int() function closer to the behaviour of CPython (something along the lines of: strings are assumed to represent a decimal number, unless a base is specified. if a base of 0 is specified, is the base is inferred from the string)

This broke our parsing logic, which relied on the previous behaviour of the int() function to automatically determine the base of the string literal, based on a base prefix present in the string. Specifying base 0 was not a solution, as this resulted in parsing behaviour different from GNU as.

Additionally, we never actually parsed octal in the format 0100 correctly - even before this PR; that number would have been interpreted as 100 rather than 64.

So, to fix this, and to ensure our parsing matches the GNU assembler, this PR implements a custom parse_int() function, using the base prefix in a string to determine the correct base to pass to int(). The following are supported:

  • 0x -> treated as hex
  • 0b -> treated as binary
  • 0... -> treated as octal
  • 0o -> treated as octal
  • anything else parsed as decimal

The parse_int method also supports the negative prefix operator for all of the above cases.

This change also ensures .int, .long, .word directives correctly handle the above mentioned formats. This fixes the issue described in #104.

Note: GNU as does not actually accept the octal prefix 0o..., but we accept it as a convenience, as this is accepted in Python code. This means however, that our assembler accepts code which GNU as does not accept. But the other way around, we still accept all code that GNU as accepts, which was one of our goals.

@wnienhaus wnienhaus self-assigned this Jun 19, 2025
@wnienhaus wnienhaus requested a review from ThomasWaldmann June 19, 2025 20:25
@wnienhaus wnienhaus removed their assignment Jun 19, 2025
@wnienhaus wnienhaus force-pushed the fix-int-parsing-with-base-prefix branch from 9452423 to 23f8ab4 Compare June 19, 2025 20:42
@wnienhaus
Copy link
Collaborator Author

After merging #107 the tests now pass.

Copy link
Member

@dpgeorge dpgeorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying base 0 was not a solution, as this resulted in parsing behaviour different from GNU as.

I guess the simplest fix here would be to just replace int(x) with int(x, 0). That should restore the existing behaviour. But it looks like you want to improve things further, which is great!

parts = "".join(parts)
if not validate_expression(parts):
raise ValueError('Unsupported expression: %s' % parts)
return eval(parts)


def parse_int(literal):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this code base, but would it make sense to factor this function out into a separate file, so it can be reused in opcodes_s2.py?

Similarly, could have a single unit test for this function in a separate testing file.

(Just a suggestion 😄 )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, also from a code (de-)duplication aspect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants