Skip to content

[SIFoldOperands] Folding immediate into a copy invalidates candidates in the fold list #148187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

jmmartinez
Copy link
Contributor

When we fold a constant operand of a binary operator into a copy, both operands of the operator may be in the fold-list.

This change works around this by removing the folds of the use converted into copy from the list.

The pattern that triggers the issue is:

%22:sreg_32 = S_MOV_B32 0
%23:sreg_64 = REG_SEQUENCE %22, %subreg.sub0, %22, %subreg.sub1
%2:sreg_64 = COPY %23
%26:sreg_32 = COPY %2.sub0
%27:sreg_32 = COPY %2.sub1
%28:sreg_32 = S_OR_B32 killed %26, killed %27, implicit-def dead $scc

%28:sreg_32 = S_OR_B32 is folded into a constant, and the fold attempt for the second operand triggers a segfault.

Things missing:

  • I'm working on reducing the test case into MIR, but for some reason I cannot trigger it with the MIR obtained with --stop-before=si-fold-operands.
  • The final generated code misses some other optimizations. We have s_mov_b32 s0, 0; s_cmp_lg_u32 s0, 0; and the result of this is still unused.

Fixes SWDEV-542372

@jmmartinez jmmartinez requested a review from arsenm July 11, 2025 10:24
@jmmartinez jmmartinez self-assigned this Jul 11, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 11, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Juan Manuel Martinez Caamaño (jmmartinez)

Changes

When we fold a constant operand of a binary operator into a copy, both operands of the operator may be in the fold-list.

This change works around this by removing the folds of the use converted into copy from the list.

The pattern that triggers the issue is:

%22:sreg_32 = S_MOV_B32 0
%23:sreg_64 = REG_SEQUENCE %22, %subreg.sub0, %22, %subreg.sub1
%2:sreg_64 = COPY %23
%26:sreg_32 = COPY %2.sub0
%27:sreg_32 = COPY %2.sub1
%28:sreg_32 = S_OR_B32 killed %26, killed %27, implicit-def dead $scc

%28:sreg_32 = S_OR_B32 is folded into a constant, and the fold attempt for the second operand triggers a segfault.

Things missing:

  • I'm working on reducing the test case into MIR, but for some reason I cannot trigger it with the MIR obtained with --stop-before=si-fold-operands.
  • The final generated code misses some other optimizations. We have s_mov_b32 s0, 0; s_cmp_lg_u32 s0, 0; and the result of this is still unused.

Fixes SWDEV-542372


Full diff: https://github.com/llvm/llvm-project/pull/148187.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+8-2)
  • (added) llvm/test/CodeGen/AMDGPU/si-fold-operands-swdev-542372.ll (+60)
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 0ed06c37507af..2675e4e9d02e4 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -1761,7 +1761,9 @@ bool SIFoldOperandsImpl::foldInstOperand(MachineInstr &MI,
   for (MachineInstr *Copy : CopiesToReplace)
     Copy->addImplicitDefUseOperands(*MF);
 
-  for (FoldCandidate &Fold : FoldList) {
+  for (auto FoldIt = FoldList.begin(), End = FoldList.end(); FoldIt != End;
+       ++FoldIt) {
+    FoldCandidate &Fold = *FoldIt;
     assert(!Fold.isReg() || Fold.Def.OpToFold);
     if (Fold.isReg() && Fold.getReg().isVirtual()) {
       Register Reg = Fold.getReg();
@@ -1785,9 +1787,13 @@ bool SIFoldOperandsImpl::foldInstOperand(MachineInstr &MI,
 
       if (Fold.isImm() && tryConstantFoldOp(Fold.UseMI)) {
         LLVM_DEBUG(dbgs() << "Constant folded " << *Fold.UseMI);
+        // The instruction was folded into a copy, we have to skip any other
+        // occurence of UseMI in the fold list
+        End = std::remove_if(FoldIt + 1, End, [&](const FoldCandidate &F) {
+          return F.UseMI == Fold.UseMI;
+        });
         Changed = true;
       }
-
     } else if (Fold.Commuted) {
       // Restoring instruction's original operand order if fold has failed.
       TII->commuteInstruction(*Fold.UseMI, false);
diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-operands-swdev-542372.ll b/llvm/test/CodeGen/AMDGPU/si-fold-operands-swdev-542372.ll
new file mode 100644
index 0000000000000..2b6f6ba6a4de8
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-fold-operands-swdev-542372.ll
@@ -0,0 +1,60 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -O3 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a %s -o - | FileCheck %s
+
+define amdgpu_kernel void @kernel() {
+; CHECK-LABEL: kernel:
+; CHECK:       ; %bb.0: ; %entry
+; CHECK-NEXT:    v_mov_b32_e32 v0, 0
+; CHECK-NEXT:    ds_read_b64 v[2:3], v0
+; CHECK-NEXT:    ds_write_b32 v0, v0
+; CHECK-NEXT:    s_waitcnt lgkmcnt(1)
+; CHECK-NEXT:    v_cmp_ne_u32_e32 vcc, 2, v2
+; CHECK-NEXT:    s_cbranch_vccnz .LBB0_3
+; CHECK-NEXT:  ; %bb.1: ; %land.rhs49
+; CHECK-NEXT:    ds_read_b64 v[0:1], v0
+; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
+; CHECK-NEXT:    v_cmp_ne_u32_e32 vcc, 1, v0
+; CHECK-NEXT:    s_cbranch_vccnz .LBB0_3
+; CHECK-NEXT:  ; %bb.2: ; %land.rhs57
+; CHECK-NEXT:    s_mov_b32 s0, 0
+; CHECK-NEXT:    s_mov_b32 s1, s0
+; CHECK-NEXT:    v_mov_b32_e32 v2, 0
+; CHECK-NEXT:    v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
+; CHECK-NEXT:    s_cmp_lg_u32 s0, 0
+; CHECK-NEXT:    ds_write_b64 v2, v[0:1]
+; CHECK-NEXT:  .LBB0_3: ; %land.end59
+; CHECK-NEXT:    s_endpgm
+entry:
+  %0 = load <2 x i32>, ptr addrspace(3) null, align 8
+  %vecext.i55 = extractelement <2 x i32> %0, i64 0
+  %cmp3.i57 = icmp eq i32 %vecext.i55, 2
+  store i32 0, ptr addrspace(3) null, align 8
+  br i1 %cmp3.i57, label %land.rhs49, label %land.end59
+
+land.rhs49:                                       ; preds = %entry
+  %1 = load <2 x i32>, ptr addrspace(3) null, align 8
+  %vecext.i67 = extractelement <2 x i32> %1, i64 0
+  %cmp3.i69 = icmp eq i32 %vecext.i67, 1
+  br i1 %cmp3.i69, label %land.rhs57, label %land.end59
+
+land.rhs57:                                       ; preds = %land.rhs49
+  %rem.i.i.i = srem <2 x i32> %0, %1
+  %ref.tmp.sroa.0.0.vec.extract.i.i = extractelement <2 x i32> %rem.i.i.i, i64 0
+  store i32 %ref.tmp.sroa.0.0.vec.extract.i.i, ptr addrspace(3) null, align 8
+  store i32 %ref.tmp.sroa.0.0.vec.extract.i.i, ptr addrspace(3) getelementptr inbounds nuw (i8, ptr addrspace(3) null, i32 4), align 4
+  %2 = load <2 x i32>, ptr addrspace(3) null, align 8
+  %vecext.i.i.i = extractelement <2 x i32> %2, i64 0
+  %cmp3.i.i.i = icmp ne i32 %vecext.i.i.i, 0
+  %vecext.1.i.i.i = extractelement <2 x i32> %2, i64 1
+  %cmp3.1.i.i.i = icmp ne i32 %vecext.1.i.i.i, 0
+  %.not.i.i = select i1 %cmp3.i.i.i, i1 true, i1 %cmp3.1.i.i.i
+  br i1 %.not.i.i, label %land.end59, label %if.end.i
+
+if.end.i:                                         ; preds = %land.rhs57
+  %and.i.i.i = and <2 x i32> %2, splat (i32 1)
+  %ref.tmp.sroa.0.0.vec.extract.i20.i = extractelement <2 x i32> %and.i.i.i, i64 0
+  br label %land.end59
+
+land.end59:                                       ; preds = %if.end.i, %land.rhs57, %land.rhs49, %entry
+  ret void
+}

@jmmartinez jmmartinez changed the title [SIFoldOperands] Folding immediate into a copy invalidates candidates in the fold list [WIP][SIFoldOperands] Folding immediate into a copy invalidates candidates in the fold list Jul 11, 2025
@jmmartinez
Copy link
Contributor Author

I'm not convinced about this change. I'm mostly looking for advice. There seems to be an issue with the iteration order of this pass and how/which instructions get modified.

jmmartinez added a commit to jmmartinez/llvm-project that referenced this pull request Jul 11, 2025
This reverts commit 80064b6.

The patch triggers a crash when the folded use can have 2 operands in
the fold list.

See llvm#148187 for more info.

SWDEV-542372
@jmmartinez jmmartinez requested a review from shiltian July 11, 2025 10:37
@jmmartinez jmmartinez changed the title [WIP][SIFoldOperands] Folding immediate into a copy invalidates candidates in the fold list [SIFoldOperands] Folding immediate into a copy invalidates candidates in the fold list Jul 11, 2025
@@ -0,0 +1,60 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc -O3 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a %s -o - | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really need -O3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems. I cannot trigger the bug with O0/1/2. And I was not able on the mir.

@@ -1785,9 +1787,13 @@ bool SIFoldOperandsImpl::foldInstOperand(MachineInstr &MI,

if (Fold.isImm() && tryConstantFoldOp(Fold.UseMI)) {
LLVM_DEBUG(dbgs() << "Constant folded " << *Fold.UseMI);
// The instruction was folded into a copy, we have to skip any other
// occurence of UseMI in the fold list
End = std::remove_if(FoldIt + 1, End, [&](const FoldCandidate &F) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need to rewrite this pass. The fold list is garbage. It would be a lot simpler to just track a map of foldable sources like peephole opt does

jmmartinez added a commit to jmmartinez/llvm-project that referenced this pull request Jul 11, 2025
This reverts commit 80064b6.

The patch triggers a crash when the folded use can have 2 operands in
the fold list.

See llvm#148187 for more info.

SWDEV-542372
This reverts commit 80064b6.

The patch triggers a crash when the folded use can have 2 operands in
the fold list.

See llvm#148187 for more info.

SWDEV-542372
@jmmartinez jmmartinez requested a review from arsenm July 11, 2025 13:26
@ronlieb ronlieb self-requested a review July 11, 2025 16:14
@jmmartinez
Copy link
Contributor Author

Not needed anymore once #148205 lands

@jmmartinez jmmartinez closed this Jul 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants