[LV][NFC] Clean up tail-folding check for early-exit loops #133931

arcbbb · 2025-04-01T16:15:24Z

This patch moves the check for a single latch exit from computeMaxVF() to LoopVectorizationLegality::canFoldTailByMasking(), as it duplicates the logic when foldTailByMasking() returns false.

It also updates the NoScalarEpilogueNeeded logic to return false for loops that are neither single-latch-exit nor early-exit. This avoids applying tail-folding in unsupported cases and prevents triggering assertions during analysis.

This patch moves the check for a single latch exit from computeMaxVF() to LoopVectorizationLegality::canFoldTailByMasking(), as it duplicates the logic when foldTailByMasking() returns false. It also introduces HasSingleLatchExit to prevent early-exit loops from entering code paths that assume non-predicated loops.

arcbbb · 2025-04-01T16:21:00Z

This is inspired by #130918, where early-exit loops with different tail-folding styles may have varying requirements.
This patch enables early-exit loops to proceed through the tail-folding setup when applicable.

github-actions · 2025-04-01T16:22:07Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvmbot · 2025-04-02T01:40:14Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Shih-Po Hung (arcbbb)

Changes

This patch moves the check for a single latch exit from computeMaxVF() to LoopVectorizationLegality::canFoldTailByMasking(), as it duplicates the logic when foldTailByMasking() returns false.

It also introduces HasSingleLatchExit to prevent early-exit loops from entering code paths that assume non-predicated loops.

Full diff: https://github.com/llvm/llvm-project/pull/133931.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp (+10)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+5-18)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index 3ec6850d6f685..0763a255b3afa 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -1924,6 +1924,16 @@ bool LoopVectorizationLegality::canFoldTailByMasking() const {
     }
   }
 
+  // The only loops we can vectorize without a scalar epilogue, are loops with
+  // a bottom-test and a single exiting block. We'd have to handle the fact
+  // that not every instruction executes on the last iteration.  This will
+  // require a lane mask which varies through the vector loop body.  (TODO)
+  if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
+    LLVM_DEBUG(
+        dbgs()
+        << "LV: Cannot fold tail by masking. Requires a singe latch exit\n");
+    return false;
+  }
   LLVM_DEBUG(dbgs() << "LV: can fold tail by masking.\n");
 
   return true;
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 55cc801e91452..a010f5c52e9a7 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -3987,22 +3987,6 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
     break;
   }
 
-  // The only loops we can vectorize without a scalar epilogue, are loops with
-  // a bottom-test and a single exiting block. We'd have to handle the fact
-  // that not every instruction executes on the last iteration.  This will
-  // require a lane mask which varies through the vector loop body.  (TODO)
-  if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
-    // If there was a tail-folding hint/switch, but we can't fold the tail by
-    // masking, fallback to a vectorization with a scalar epilogue.
-    if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {
-      LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking: vectorize with a "
-                           "scalar epilogue instead.\n");
-      ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;
-      return computeFeasibleMaxVF(MaxTC, UserVF, false);
-    }
-    return FixedScalableVFPair::getNone();
-  }
-
   // Now try the tail folding
 
   // Invalidate interleave groups that require an epilogue if we can't mask
@@ -4049,7 +4033,9 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
     return Rem->isZero();
   };
 
-  if (MaxPowerOf2RuntimeVF > 0u) {
+  bool HasSingleLatchExit =
+      TheLoop->getExitingBlock() == TheLoop->getLoopLatch();
+  if (HasSingleLatchExit && MaxPowerOf2RuntimeVF > 0u) {
     assert((UserVF.isNonZero() || isPowerOf2_32(*MaxPowerOf2RuntimeVF)) &&
            "MaxFixedVF must be a power of 2");
     if (NoScalarEpilogueNeeded(*MaxPowerOf2RuntimeVF)) {
@@ -4060,7 +4046,8 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
   }
 
   auto ExpectedTC = getSmallBestKnownTC(PSE, TheLoop);
-  if (ExpectedTC && ExpectedTC <= TTI.getMinTripCountTailFoldingThreshold()) {
+  if (HasSingleLatchExit && ExpectedTC &&
+      ExpectedTC <= TTI.getMinTripCountTailFoldingThreshold()) {
     if (MaxPowerOf2RuntimeVF > 0u) {
       // If we have a low-trip-count, and the fixed-width VF is known to divide
       // the trip count but the scalable factor does not, use the fixed-width

david-arm · 2025-04-03T10:54:38Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -4049,7 +4033,9 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
    return Rem->isZero();
  };

-  if (MaxPowerOf2RuntimeVF > 0u) {
+  bool HasSingleLatchExit =


I understand what you're trying to do here, but I think we should remove the extra HasSingleLatchExit check and instead update NoScalarEpilogueNeeded for the case when a scalar epilogue is genuinely required. For loops with uncountable early exits we don't actually require a scalar epilogue at the moment, so we can still benefit from

If we have a power-of-2 runtime VF then there is no point tail-folding.

If we have a very low trip count we should fall back on a runtime power-of-2 fixed-width VF if possible.

In NoScalarEpilogueNeeded I think you can then add an extra check like this:

auto NoScalarEpilogueNeeded = [this, &UserIC](unsigned MaxVF) { if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch() && !Legal->hasUncountableEarlyExit()) return false;

What do you think?

Thanks! It does streamline the code.
I had been thinking that enabling the EVL transform would require going through setTailFoldingStyles(). But you're right — for loops where the trip count is known to be a multiple of the VF, using a non-predicated form should work well.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

david-arm

LGTM!

fhahn

Could you update the description of the PR? Currently it seems slightly out-of-sync with the code

fhahn · 2025-04-07T14:11:57Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch() &&
+        !Legal->hasUncountableEarlyExit())
+      return false;


Could you document what this is checking? It's not entirely obvious from just looking at the condition

Updated. Thanks!

fhahn · 2025-04-07T14:12:22Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    // Calling getSymbolicMaxBackedgeTakenCount enables support for loops
+    // with uncountable exits. For countable loops, the symbolic maximum must
+    // remain identical to the known back-edge taken count.
    const SCEV *BackedgeTakenCount = PSE.getSymbolicMaxBackedgeTakenCount();
-    assert(BackedgeTakenCount == PSE.getBackedgeTakenCount() &&
+    assert((Legal->hasUncountableEarlyExit() ||
+            BackedgeTakenCount == PSE.getBackedgeTakenCount()) &&


Are those changes still needed?

Yes, the changes are still required to prevent early-exit loops from triggering the assertion under -prefer-predicate-over-epilogue=predicate-dont-vectorize.

arcbbb · 2025-04-15T06:24:01Z

Just checking in — let me know if there are any remaining concerns. Otherwise, I’m planning to merge this later this week. Thanks!

This patch moves the check for a single latch exit from computeMaxVF() to LoopVectorizationLegality::canFoldTailByMasking(), as it duplicates the logic when foldTailByMasking() returns false. It also updates the NoScalarEpilogueNeeded logic to return false for loops that are neither single-latch-exit nor early-exit. This avoids applying tail-folding in unsupported cases and prevents triggering assertions during analysis.

arcbbb requested review from fhahn, alexey-bataev and david-arm April 1, 2025 16:15

Style update

45541a4

llvmbot added vectorizers llvm:transforms labels Apr 2, 2025

david-arm reviewed Apr 3, 2025

View reviewed changes

Moves check to NoScalarEpilogueNeeded

e9df81f

david-arm approved these changes Apr 7, 2025

View reviewed changes

fhahn reviewed Apr 7, 2025

View reviewed changes

Clarify check in NoScalarEpilogueNeeded

15f48ee

arcbbb merged commit e5263e3 into llvm:main Apr 18, 2025
11 checks passed

arcbbb deleted the nfc-mv-exit-check branch April 18, 2025 02:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV][NFC] Clean up tail-folding check for early-exit loops #133931

[LV][NFC] Clean up tail-folding check for early-exit loops #133931

Uh oh!

arcbbb commented Apr 1, 2025 •

edited

Loading

Uh oh!

arcbbb commented Apr 1, 2025

Uh oh!

github-actions bot commented Apr 1, 2025 •

edited

Loading

Uh oh!

llvmbot commented Apr 2, 2025 •

edited

Loading

Uh oh!

david-arm Apr 3, 2025

Uh oh!

arcbbb Apr 3, 2025

Uh oh!

Uh oh!

david-arm left a comment

Uh oh!

fhahn left a comment

Uh oh!

fhahn Apr 7, 2025

Uh oh!

arcbbb Apr 8, 2025

Uh oh!

fhahn Apr 7, 2025

Uh oh!

arcbbb Apr 8, 2025

Uh oh!

arcbbb commented Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

[LV][NFC] Clean up tail-folding check for early-exit loops #133931

[LV][NFC] Clean up tail-folding check for early-exit loops #133931

Uh oh!

Conversation

arcbbb commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arcbbb commented Apr 1, 2025

Uh oh!

github-actions bot commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-arm Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

arcbbb Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

david-arm left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

arcbbb Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

arcbbb Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

arcbbb commented Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

arcbbb commented Apr 1, 2025 •

edited

Loading

github-actions bot commented Apr 1, 2025 •

edited

Loading

llvmbot commented Apr 2, 2025 •

edited

Loading