Skip to content

[AVR] Fix incorrect selection of post indexed addresses #145040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

tomtor
Copy link
Contributor

@tomtor tomtor commented Jun 20, 2025

Fixes #143247

For testing this PR I used the following code:

#if 1
#define PA      0x1 // Bad
#else
#define PA      0x2 // Good
#endif

volatile char *const PORT = (char *const) PA;

extern void nil(short);

void bug()
{
  short s= 0;
 
  while (1) {
    s++;
    *PORT = *PORT | 0x80;
    nil((s & 0xF) + PA);
  }
}

unsigned
demo_post (unsigned short value, unsigned short *buffer, unsigned short *bufend)
{
  unsigned short *tmp;

  for (tmp = buffer; tmp < bufend; tmp++)
    value -= *tmp;

  return value;
}

const char *
demo_2 (const char *in, char *out)
{
  while (1)
    {
      if (*in == 'a')
        {
          const char *p = in + 1;
          while (*p == 'x')
            ++p;
          if (*p == 'b')
            return p;
          while (in < p)
            *out++ = *in++;
        }
    }
}

It contains the code which triggers the original bug and two demo functions which still emit post/pre-index code triggered by getPostIndexedAddressParts.

When comparing the generated assembly the only difference is in the generated code for bug()

16,17c16,17
<       ldi     r24, 0
<       ldi     r25, 0
---
>       ldi     r26, 0
>       ldi     r27, 0
20,25c20,26
<       sbi     1, 7
<       adiw    r24, 1
<       movw    r16, r24
<       andi    r24, 15
<       andi    r25, 0
<       adiw    r24, 1
---
>       ld      r24, X+
>       ori     r24, -128
>       movw    r16, r26
>       andi    r26, 15
>       andi    r27, 0
>       st      X+, r24
>       movw    r24, r26
27c28
<       movw    r24, r16
---
>       movw    r26, r16

For future reference, I examined tests from the gcc torture collection, and determined that only the few following tests (from 1930 tests) generate pre/post index code:

$C -O2 -target avr-none $M -S gcc/gcc/testsuite/gcc.c-torture/execute/20000412-6.c
$C -O2 -target avr-none $M -S gcc/gcc/testsuite/gcc.c-torture/execute/20011126-2.c
$C -O2 -target avr-none $M -S gcc/gcc/testsuite/gcc.c-torture/execute/980205.c
$C -O2 -target avr-none $M -S gcc/gcc/testsuite/gcc.c-torture/execute/990524-1.c
$C -O2 -target avr-none $M -S gcc/gcc/testsuite/gcc.c-torture/execute/pr20601-1.c
$C -O2 -target avr-none $M -S gcc/gcc/testsuite/gcc.c-torture/execute/pr38051.c
$C -O2 -target avr-none $M -S gcc/gcc/testsuite/gcc.c-torture/execute/pr44852.c
$C -O2 -target avr-none $M -S gcc/gcc/testsuite/gcc.c-torture/execute/pr44942.c
#$C -O2 -target avr-none $M -S gcc/gcc/testsuite/gcc.c-torture/execute/pr67037.c
$C -O2 -target avr-none $M -S gcc/gcc/testsuite/gcc.c-torture/execute/stdarg-2.c
$C -O2 -target avr-none $M -S gcc/gcc/testsuite/gcc.c-torture/execute/stdarg-3.c
$C -O2 -target avr-none $M -S gcc/gcc/testsuite/gcc.c-torture/execute/va-arg-22.c
$C -O2 -target avr-none $M -S gcc/gcc/testsuite/gcc.c-torture/execute/va-arg-26.c

The commented out test crashes clang, as do many others.

So for some reason, the AVR pre/post index only is generated in a few cases, less then I would expect, but that is not related to this PR.

The reason why this PR is effective for the specific bug() code generation is not clear to me.

The fix can be suboptimal, but it only removes a certain case of optimization which is incorrect for the bug() code, so there is minimal danger of a regression. I would really appreciate any feedback from experienced LLVM contributors about the root cause or alternative ways to handle the issue.

@tomtor
Copy link
Contributor Author

tomtor commented Jun 20, 2025

@benshi001 Please review

@benshi001
Copy link
Member

benshi001 commented Jun 21, 2025

To be honest, I do not prefer this solution, since it is so tricky. Though I used to commit walkaround solutions to llvm-AVR, all of them were clear and easy to understand.

For this solution, it neither fixes the root cause nor explains why it work.

For you test program, the generated .ll files have slight difference,
image

Even we can not fix the root cause by now, our walk around solution should at least reveal why ptr inttoptr (i16 1 to ptr) is lowerred to wrong assembly.

@@ -1114,6 +1120,10 @@ bool AVRTargetLowering::getPostIndexedAddressParts(SDNode *N, SDNode *Op,
if (AVR::isProgramMemoryAccess(LD))
return false;

// Fixes https://github.com/llvm/llvm-project/issues/143247
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the above lines,

 // FIXME: We temporarily disable post increment load from program memory,
 //        due to bug https://github.com/llvm/llvm-project/issues/59914.

It is also a walkaround committed by me, it revealed the root cause, though did not fix the root cause.

I hope your walk around solution can also be like that.

@benshi001 benshi001 self-requested a review June 21, 2025 15:15
Copy link
Member

@benshi001 benshi001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@tomtor
Copy link
Contributor Author

tomtor commented Jun 21, 2025

@Benshi Thanks for the review.

What do you mean with "the generated .ll files have slight difference"? Which files are you comparing? Changing if 1 to if 0 in the test program and comparing the .ll or something else?

@Patryk27
Copy link
Contributor

fwiw - one sec, I'm taking a look as well 🔍

@tomtor tomtor changed the title [AVR] Adapt getPostIndexedAddressParts() and getPreIndexedAddressParts [AVR] Temp fix for getPostIndexedAddressParts() Jun 21, 2025
Copy link

github-actions bot commented Jun 21, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@Patryk27
Copy link
Contributor

Ok, got it - I think we're hitting the same condition RISC-V and aarch64 are already guarded against:

So I thiiiink the proper fix would be to:

diff --git a/llvm/lib/Target/AVR/AVRISelLowering.cpp b/llvm/lib/Target/AVR/AVRISelLowering.cpp
index d65ef8986c98..c319499ecc05 100644
--- a/llvm/lib/Target/AVR/AVRISelLowering.cpp
+++ b/llvm/lib/Target/AVR/AVRISelLowering.cpp
@@ -1120,12 +1120,22 @@ bool AVRTargetLowering::getPostIndexedAddressParts(SDNode *N, SDNode *Op,
       if (AVR::isProgramMemoryAccess(LD))
         return false;
 
-    // Fixes https://github.com/llvm/llvm-project/issues/143247
-    if (Op->getOperand(0)->hasOneUse())
+    SDValue Ptr;
+    if (LoadSDNode *LD = dyn_cast<LoadSDNode>(N)) {
+      Ptr = LD->getBasePtr();
+    } else if (StoreSDNode *ST = dyn_cast<StoreSDNode>(N)) {
+      Ptr = ST->getBasePtr();
+    } else
       return false;
 
     Base = Op->getOperand(0);
     Offset = DAG.getConstant(RHSC, DL, MVT::i8);
+
+    // Post-indexing updates the base, so it's not a valid transform
+    // if that's not the same as the load's pointer.
+    if (Ptr != Base)
+      return false;
+
     AM = ISD::POST_INC;
 
     return true;

@tomtor
Copy link
Contributor Author

tomtor commented Jun 21, 2025

@Patryk27 Great! Committed your suggestion, the CI test should pick it up.

Should the same be applied to pre-indexing?

@Patryk27
Copy link
Contributor

Doesn't seem so, it's just a post-indexing thingie.

@tomtor
Copy link
Contributor Author

tomtor commented Jun 21, 2025

Doesn't seem so, it's just a post-indexing thingie.

Ok, I think you should take ownership of the fix commit. Not sure what the easiest way to do that is. This PR could just be the new CI test?

Anyway, nice find. I looked briefly at the arm32 sources, but that did not result in anything usefull. :/

@Patryk27
Copy link
Contributor

This PR could just be the new CI test?

Yea, if you dropped the three commits here and added a new commit just with the test, I could take it from here and create a new pull request with the test and fix 🙂

@benshi001
Copy link
Member

@Benshi Thanks for the review.

What do you mean with "the generated .ll files have slight difference"? Which files are you comparing? Changing if 1 to if 0 in the test program and comparing the .ll or something else?

I was comparing your test program

#define PA      0x1 // Bad
//#define PA    0x2 // Good

volatile char *const PORT = (char *const) PA;

extern void nil(short);

void bug()
{
  short s= 0;
 
  while (1) {
    s++;
    *PORT = *PORT | 0x80;
    nil((s & 0xF) + 1);
  }
}

@benshi001 benshi001 self-requested a review June 22, 2025 01:21
@benshi001 benshi001 changed the title [AVR] Temp fix for getPostIndexedAddressParts() [AVR] Fix incorrect selection of post indexed addresses Jun 22, 2025
@benshi001
Copy link
Member

This PR could just be the new CI test?

Yea, if you dropped the three commits here and added a new commit just with the test, I could take it from here and create a new pull request with the test and fix 🙂

We need not two commits (test+fix) for a bug fix, you can directly create a new PR with both of the test and fix.

@tomtor
Copy link
Contributor Author

tomtor commented Jun 22, 2025

@Patryk27 Ok, as suggested by @benshi001 can you create a new PR with the fix and edited test? I will close this PR. Thanks!

@tomtor tomtor closed this Jun 22, 2025
@tomtor
Copy link
Contributor Author

tomtor commented Jun 22, 2025

@Patryk27 added a cleaned up test to this PR for your convenience.

@tomtor
Copy link
Contributor Author

tomtor commented Jun 22, 2025

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc < %s -O=2 -mtriple=avr-none --mcpu=avr128db28 -verify-machineinstrs | FileCheck %s

declare dso_local void @nil(i16 noundef) local_unnamed_addr addrspace(1)

define void @complex_sbi() {
; CHECK-LABEL: complex_sbi:
; CHECK:       ; %bb.0: ; %entry
; CHECK-NEXT:    push r16
; CHECK-NEXT:    push r17
; CHECK-NEXT:    ldi r24, 0
; CHECK-NEXT:    ldi r25, 0
; CHECK-NEXT:  .LBB0_1: ; %while.cond
; CHECK-NEXT:    ; =>This Inner Loop Header: Depth=1
; CHECK-NEXT:    sbi 1, 7
; CHECK-NEXT:    adiw r24, 1
; CHECK-NEXT:    movw r16, r24
; CHECK-NEXT:    andi r24, 15
; CHECK-NEXT:    andi r25, 0
; CHECK-NEXT:    adiw r24, 1
; CHECK-NEXT:    call nil
; CHECK-NEXT:    movw r24, r16
; CHECK-NEXT:    rjmp .LBB0_1
entry:
  br label %while.cond
while.cond:                                       ; preds = %while.cond, %entry
  %s.0 = phi i16 [ 0, %entry ], [ %inc, %while.cond ]
  %inc = add nuw nsw i16 %s.0, 1
  %0 = load volatile i8, ptr inttoptr (i16 1 to ptr), align 1
  %or = or i8 %0, -128
  store volatile i8 %or, ptr inttoptr (i16 1 to ptr), align 1
  %and = and i16 %inc, 15
  %add = add nuw nsw i16 %and, 1
  tail call addrspace(1) void @nil(i16 noundef %add)
  br label %while.cond
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clang generates incorrect code when storing to memory location 0x1 for target avr-none
3 participants