[AggressiveInstCombine] Support store merge with non-consecutive parts #149807

nikic · 2025-07-21T12:43:00Z

This is a minor extension of #147540, resolving one of the FIXMEs. If the collected parts contain some non-consecutive elements, we can still handle smaller ranges that are consecutive.

This is not common in practice and mostly shows up when the same value is stored at two different offsets.

llvm-opt-benchmark results: dtcxzyw/llvm-opt-benchmark#2593

This is a minor extension of llvm#147540, resolving one of the FIXMEs. If the collected parts contain some non-consecutive elements, we can still handle smaller ranges that *are* consecutive. This is not common in practice and mostly shows up when the same value is stored at two different offsets. llvm-opt-benchmark results: dtcxzyw/llvm-opt-benchmark#2593

llvmbot · 2025-07-21T12:43:34Z

@llvm/pr-subscribers-llvm-transforms

Author: Nikita Popov (nikic)

Changes

This is a minor extension of #147540, resolving one of the FIXMEs. If the collected parts contain some non-consecutive elements, we can still handle smaller ranges that are consecutive.

This is not common in practice and mostly shows up when the same value is stored at two different offsets.

llvm-opt-benchmark results: dtcxzyw/llvm-opt-benchmark#2593

Full diff: https://github.com/llvm/llvm-project/pull/149807.diff

2 Files Affected:

(modified) llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp (+39-21)
(modified) llvm/test/Transforms/AggressiveInstCombine/X86/store-merge.ll (+99)

diff --git a/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp b/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
index a3a0e31f887ab..7fa6e6c5161cf 100644
--- a/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
+++ b/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
@@ -886,35 +886,20 @@ static std::optional<PartStore> matchPartStore(Instruction &I,
   return {{PtrBase, PtrOffset, Val, ValOffset, ValWidth, Store}};
 }
 
-static bool mergePartStores(SmallVectorImpl<PartStore> &Parts,
-                            const DataLayout &DL, TargetTransformInfo &TTI) {
+static bool mergeConsecutivePartStores(ArrayRef<PartStore> Parts,
+                                       unsigned Width, const DataLayout &DL,
+                                       TargetTransformInfo &TTI) {
   if (Parts.size() < 2)
     return false;
 
-  // We now have multiple parts of the same value stored to the same pointer.
-  // Sort the parts by pointer offset, and make sure they are consistent with
-  // the value offsets. Also check that the value is fully covered without
-  // overlaps.
-  // FIXME: We could support merging stores for only part of the value here.
-  llvm::sort(Parts);
-  int64_t LastEndOffsetFromFirst = 0;
-  const PartStore &First = Parts[0];
-  for (const PartStore &Part : Parts) {
-    APInt PtrOffsetFromFirst = Part.PtrOffset - First.PtrOffset;
-    int64_t ValOffsetFromFirst = Part.ValOffset - First.ValOffset;
-    if (PtrOffsetFromFirst * 8 != ValOffsetFromFirst ||
-        LastEndOffsetFromFirst != ValOffsetFromFirst)
-      return false;
-    LastEndOffsetFromFirst = ValOffsetFromFirst + Part.ValWidth;
-  }
-
   // Check whether combining the stores is profitable.
   // FIXME: We could generate smaller stores if we can't produce a large one.
+  const PartStore &First = Parts.front();
   LLVMContext &Ctx = First.Store->getContext();
-  Type *NewTy = Type::getIntNTy(Ctx, LastEndOffsetFromFirst);
+  Type *NewTy = Type::getIntNTy(Ctx, Width);
   unsigned Fast = 0;
   if (!TTI.isTypeLegal(NewTy) ||
-      !TTI.allowsMisalignedMemoryAccesses(Ctx, LastEndOffsetFromFirst,
+      !TTI.allowsMisalignedMemoryAccesses(Ctx, Width,
                                           First.Store->getPointerAddressSpace(),
                                           First.Store->getAlign(), &Fast) ||
       !Fast)
@@ -941,6 +926,39 @@ static bool mergePartStores(SmallVectorImpl<PartStore> &Parts,
   return true;
 }
 
+static bool mergePartStores(SmallVectorImpl<PartStore> &Parts,
+                            const DataLayout &DL, TargetTransformInfo &TTI) {
+  if (Parts.size() < 2)
+    return false;
+
+  // We now have multiple parts of the same value stored to the same pointer.
+  // Sort the parts by pointer offset, and make sure they are consistent with
+  // the value offsets. Also check that the value is fully covered without
+  // overlaps.
+  bool Changed = false;
+  llvm::sort(Parts);
+  int64_t LastEndOffsetFromFirst = 0;
+  const PartStore *First = &Parts[0];
+  for (const PartStore &Part : Parts) {
+    APInt PtrOffsetFromFirst = Part.PtrOffset - First->PtrOffset;
+    int64_t ValOffsetFromFirst = Part.ValOffset - First->ValOffset;
+    if (PtrOffsetFromFirst * 8 != ValOffsetFromFirst ||
+        LastEndOffsetFromFirst != ValOffsetFromFirst) {
+      Changed |= mergeConsecutivePartStores(ArrayRef(First, &Part),
+                                            LastEndOffsetFromFirst, DL, TTI);
+      First = &Part;
+      LastEndOffsetFromFirst = Part.ValWidth;
+      continue;
+    }
+
+    LastEndOffsetFromFirst = ValOffsetFromFirst + Part.ValWidth;
+  }
+
+  Changed |= mergeConsecutivePartStores(ArrayRef(First, Parts.end()),
+                                        LastEndOffsetFromFirst, DL, TTI);
+  return Changed;
+}
+
 static bool foldConsecutiveStores(BasicBlock &BB, const DataLayout &DL,
                                   TargetTransformInfo &TTI, AliasAnalysis &AA) {
   // FIXME: Add big endian support.
diff --git a/llvm/test/Transforms/AggressiveInstCombine/X86/store-merge.ll b/llvm/test/Transforms/AggressiveInstCombine/X86/store-merge.ll
index 38a55e1566a77..4ab8d18eb69b5 100644
--- a/llvm/test/Transforms/AggressiveInstCombine/X86/store-merge.ll
+++ b/llvm/test/Transforms/AggressiveInstCombine/X86/store-merge.ll
@@ -792,6 +792,105 @@ define void @test_i32_tbaa(i32 %x, ptr %p) {
   ret void
 }
 
+define void @test_multiple_parts_with_gap1(i32 %x, ptr %p) {
+; CHECK-LABEL: define void @test_multiple_parts_with_gap1(
+; CHECK-SAME: i32 [[X:%.*]], ptr [[P:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = trunc i32 [[X]] to i16
+; CHECK-NEXT:    store i16 [[TMP1]], ptr [[P]], align 1
+; CHECK-NEXT:    [[SHR_3:%.*]] = lshr i32 [[X]], 24
+; CHECK-NEXT:    [[X_3:%.*]] = trunc i32 [[SHR_3]] to i8
+; CHECK-NEXT:    [[GEP_3:%.*]] = getelementptr i8, ptr [[P]], i64 3
+; CHECK-NEXT:    store i8 [[X_3]], ptr [[GEP_3]], align 1
+; CHECK-NEXT:    ret void
+;
+  %x.0 = trunc i32 %x to i8
+  store i8 %x.0, ptr %p
+  %shr.1 = lshr i32 %x, 8
+  %x.1 = trunc i32 %shr.1 to i8
+  %gep.1 = getelementptr i8, ptr %p, i64 1
+  store i8 %x.1, ptr %gep.1
+  %shr.3 = lshr i32 %x, 24
+  %x.3 = trunc i32 %shr.3 to i8
+  %gep.3 = getelementptr i8, ptr %p, i64 3
+  store i8 %x.3, ptr %gep.3
+  ret void
+}
+
+define void @test_multiple_parts_with_gap2(i32 %x, ptr %p) {
+; CHECK-LABEL: define void @test_multiple_parts_with_gap2(
+; CHECK-SAME: i32 [[X:%.*]], ptr [[P:%.*]]) {
+; CHECK-NEXT:    [[X_0:%.*]] = trunc i32 [[X]] to i8
+; CHECK-NEXT:    store i8 [[X_0]], ptr [[P]], align 1
+; CHECK-NEXT:    [[GEP_2:%.*]] = getelementptr i8, ptr [[P]], i64 1
+; CHECK-NEXT:    [[TMP1:%.*]] = lshr i32 [[X]], 16
+; CHECK-NEXT:    [[TMP2:%.*]] = trunc i32 [[TMP1]] to i16
+; CHECK-NEXT:    store i16 [[TMP2]], ptr [[GEP_2]], align 1
+; CHECK-NEXT:    ret void
+;
+  %x.0 = trunc i32 %x to i8
+  store i8 %x.0, ptr %p
+  %shr.2 = lshr i32 %x, 16
+  %x.2 = trunc i32 %shr.2 to i8
+  %gep.2 = getelementptr i8, ptr %p, i64 1
+  store i8 %x.2, ptr %gep.2
+  %shr.3 = lshr i32 %x, 24
+  %x.3 = trunc i32 %shr.3 to i8
+  %gep.3 = getelementptr i8, ptr %p, i64 2
+  store i8 %x.3, ptr %gep.3
+  ret void
+}
+
+define void @test_multiple_parts_with_gap3(i64 %x, ptr %p) {
+; CHECK-LABEL: define void @test_multiple_parts_with_gap3(
+; CHECK-SAME: i64 [[X:%.*]], ptr [[P:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = trunc i64 [[X]] to i16
+; CHECK-NEXT:    store i16 [[TMP1]], ptr [[P]], align 1
+; CHECK-NEXT:    [[GEP_3:%.*]] = getelementptr i8, ptr [[P]], i64 3
+; CHECK-NEXT:    [[TMP2:%.*]] = lshr i64 [[X]], 24
+; CHECK-NEXT:    [[TMP3:%.*]] = trunc i64 [[TMP2]] to i16
+; CHECK-NEXT:    store i16 [[TMP3]], ptr [[GEP_3]], align 1
+; CHECK-NEXT:    ret void
+;
+  %x.0 = trunc i64 %x to i8
+  store i8 %x.0, ptr %p
+  %shr.1 = lshr i64 %x, 8
+  %x.1 = trunc i64 %shr.1 to i8
+  %gep.1 = getelementptr i8, ptr %p, i64 1
+  store i8 %x.1, ptr %gep.1
+  %shr.3 = lshr i64 %x, 24
+  %x.3 = trunc i64 %shr.3 to i8
+  %gep.3 = getelementptr i8, ptr %p, i64 3
+  store i8 %x.3, ptr %gep.3
+  %shr.4 = lshr i64 %x, 32
+  %x.4 = trunc i64 %shr.4 to i8
+  %gep.4 = getelementptr i8, ptr %p, i64 4
+  store i8 %x.4, ptr %gep.4
+  ret void
+}
+
+define void @test_store_same_parts_twice(i32 %x, ptr %p) {
+; CHECK-LABEL: define void @test_store_same_parts_twice(
+; CHECK-SAME: i32 [[X:%.*]], ptr [[P:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = trunc i32 [[X]] to i16
+; CHECK-NEXT:    store i16 [[TMP1]], ptr [[P]], align 1
+; CHECK-NEXT:    [[GEP_2:%.*]] = getelementptr i8, ptr [[P]], i64 2
+; CHECK-NEXT:    [[TMP2:%.*]] = trunc i32 [[X]] to i16
+; CHECK-NEXT:    store i16 [[TMP2]], ptr [[GEP_2]], align 1
+; CHECK-NEXT:    ret void
+;
+  %x.0 = trunc i32 %x to i8
+  store i8 %x.0, ptr %p
+  %shr.1 = lshr i32 %x, 8
+  %x.1 = trunc i32 %shr.1 to i8
+  %gep.1 = getelementptr i8, ptr %p, i64 1
+  store i8 %x.1, ptr %gep.1
+  %gep.2 = getelementptr i8, ptr %p, i64 2
+  store i8 %x.0, ptr %gep.2
+  %gep.3 = getelementptr i8, ptr %p, i64 3
+  store i8 %x.1, ptr %gep.3
+  ret void
+}
+
 !0 = !{!1}
 !1 = !{!1, !2}
 !2 = !{!2}

dtcxzyw

LG.

However, I have a question for the remaining FIXME:

FIXME: We could generate smaller stores if we can't produce a large one.

Imagine we have three consecutive stores:

store 0 | store 1 | store 2
        ^
        |
      aligned to 2 bytes

Fast unaligned memory access may not be as fast as aligned access because the former one may access multiple cache lines. In this case, store i8 + store i16 is better than store i16 + store i8. So I don't think that a greedy strategy like this patch works well.

nikic · 2025-07-22T08:14:30Z

However, I have a question for the remaining FIXME:

FIXME: We could generate smaller stores if we can't produce a large one.

Imagine we have three consecutive stores:
store 0 | store 1 | store 2
        ^
        |
      aligned to 2 bytes
Fast unaligned memory access may not be as fast as aligned access because the former one may access multiple cache lines. In this case, store i8 + store i16 is better than store i16 + store i8. So I don't think that a greedy strategy like this patch works well.

I think this can be mostly addressed by only combining stores, but not trying to split them up in new ways. So if we have stores i8, i16 we would not try to rewrite them to i16, i8. This does mean we'd also exclude rewriting i8, i16, i8 to i16, i16 (for the hypothetical case where unaligned i32 store is not supported).

llvm#149807) This is a minor extension of llvm#147540, resolving one of the FIXMEs. If the collected parts contain some non-consecutive elements, we can still handle smaller ranges that *are* consecutive. This is not common in practice and mostly shows up when the same value is stored at two different offsets.

nikic requested review from fhahn and dtcxzyw July 21, 2025 12:43

llvmbot added the llvm:transforms label Jul 21, 2025

dtcxzyw mentioned this pull request Jul 21, 2025

Fuzz PR149807 dtcxzyw/llvm-fuzz-service#105

Closed

dtcxzyw approved these changes Jul 22, 2025

View reviewed changes

nikic merged commit 0752759 into llvm:main Jul 22, 2025
11 checks passed

nikic deleted the store-merge-parts branch July 22, 2025 08:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AggressiveInstCombine] Support store merge with non-consecutive parts #149807

[AggressiveInstCombine] Support store merge with non-consecutive parts #149807

Uh oh!

nikic commented Jul 21, 2025

Uh oh!

llvmbot commented Jul 21, 2025

Uh oh!

dtcxzyw left a comment •

edited

Loading

Uh oh!

nikic commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

[AggressiveInstCombine] Support store merge with non-consecutive parts #149807

[AggressiveInstCombine] Support store merge with non-consecutive parts #149807

Uh oh!

Conversation

nikic commented Jul 21, 2025

Uh oh!

llvmbot commented Jul 21, 2025

Uh oh!

dtcxzyw left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikic commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

dtcxzyw left a comment •

edited

Loading