Skip to content

[AArch64] Align 0-cycle reg-mov model of GPR64, GPR32 reg classes #146051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

tomershafir
Copy link
Contributor

Aligns 0-cycle register MOV model of GPR64 and GPR32 register classes to that of FPR64 and FPR32 resolved in: #144152.

  • Splits FeatureZCRegMove into FeatureZCRegMoveGPR64 and FeatureZCRegMove32 and fix Apple processors and AArch64InstrInfo accordingly
  • Aligns the test arm64-zero-cycle-regmov-gpr.ll to the FPR one

The target feature name change is effectively a breaking change. The absolute most of users shouldn't use -mzcm directly, so I think it should be ok to make an immediate switch, unless this doesn't align with the conventions in this project. The patch adds a release note for that.

@tomershafir tomershafir marked this pull request as ready for review June 27, 2025 10:32
@llvmbot
Copy link
Member

llvmbot commented Jun 27, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Tomer Shafir (tomershafir)

Changes

Aligns 0-cycle register MOV model of GPR64 and GPR32 register classes to that of FPR64 and FPR32 resolved in: #144152.

  • Splits FeatureZCRegMove into FeatureZCRegMoveGPR64 and FeatureZCRegMove32 and fix Apple processors and AArch64InstrInfo accordingly
  • Aligns the test arm64-zero-cycle-regmov-gpr.ll to the FPR one

The target feature name change is effectively a breaking change. The absolute most of users shouldn't use -mzcm directly, so I think it should be ok to make an immediate switch, unless this doesn't align with the conventions in this project. The patch adds a release note for that.


Full diff: https://github.com/llvm/llvm-project/pull/146051.diff

6 Files Affected:

  • (modified) llvm/docs/ReleaseNotes.md (+2)
  • (modified) llvm/lib/Target/AArch64/AArch64Features.td (+5-2)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+4-4)
  • (modified) llvm/lib/Target/AArch64/AArch64Processors.td (+10-10)
  • (added) llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov-gpr.ll (+54)
  • (removed) llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov-gpr32.ll (-45)
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 73ae2ee599640..f63054b7920f2 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -107,6 +107,8 @@ Changes to the AArch64 Backend
   [#132196](https://github.com/llvm/llvm-project/pull/132196),
   [#133084](https://github.com/llvm/llvm-project/pull/133084))
 
+* **Breaking Change:** Change target feature name from `zcm` to `zcm-gpr64`.
+
 Changes to the AMDGPU Backend
 -----------------------------
 
diff --git a/llvm/lib/Target/AArch64/AArch64Features.td b/llvm/lib/Target/AArch64/AArch64Features.td
index 24fbe207c4969..05562516e5198 100644
--- a/llvm/lib/Target/AArch64/AArch64Features.td
+++ b/llvm/lib/Target/AArch64/AArch64Features.td
@@ -612,8 +612,11 @@ def FeatureExperimentalZeroingPseudos
 def FeatureNoSVEFPLD1R : SubtargetFeature<"no-sve-fp-ld1r",
   "NoSVEFPLD1R", "true", "Avoid using LD1RX instructions for FP">;
 
-def FeatureZCRegMove : SubtargetFeature<"zcm", "HasZeroCycleRegMove", "true",
-                                        "Has zero-cycle register moves">;
+def FeatureZCRegMoveGPR64 : SubtargetFeature<"zcm-gpr64", "HasZeroCycleRegMoveGPR64", "true",
+                                        "Has zero-cycle register moves for GPR64 registers">;
+
+def FeatureZCRegMoveGPR32 : SubtargetFeature<"zcm-gpr32", "HasZeroCycleRegMoveGPR32", "true",
+                                        "Has zero-cycle register moves for GPR32 registers">;
 
 def FeatureZCRegMoveFPR64 : SubtargetFeature<"zcm-fpr64", "HasZeroCycleRegMoveFPR64", "true",
                                         "Has zero-cycle register moves for FPR64 registers">;
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index c3837cfe73d28..2bd630bf4c76a 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -5037,8 +5037,8 @@ void AArch64InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
 
     if (DestReg == AArch64::WSP || SrcReg == AArch64::WSP) {
       // If either operand is WSP, expand to ADD #0.
-      if (Subtarget.hasZeroCycleRegMove()) {
-        // Cyclone recognizes "ADD Xd, Xn, #0" as a zero-cycle register move.
+      if (Subtarget.hasZeroCycleRegMoveGPR64() &&
+          !Subtarget.hasZeroCycleRegMoveGPR32()) {
         MCRegister DestRegX = TRI->getMatchingSuperReg(
             DestReg, AArch64::sub_32, &AArch64::GPR64spRegClass);
         MCRegister SrcRegX = TRI->getMatchingSuperReg(
@@ -5063,8 +5063,8 @@ void AArch64InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
           .addImm(0)
           .addImm(AArch64_AM::getShifterImm(AArch64_AM::LSL, 0));
     } else {
-      if (Subtarget.hasZeroCycleRegMove()) {
-        // Cyclone recognizes "ORR Xd, XZR, Xm" as a zero-cycle register move.
+      if (Subtarget.hasZeroCycleRegMoveGPR64() &&
+          !Subtarget.hasZeroCycleRegMoveGPR32()) {
         MCRegister DestRegX = TRI->getMatchingSuperReg(
             DestReg, AArch64::sub_32, &AArch64::GPR64spRegClass);
         MCRegister SrcRegX = TRI->getMatchingSuperReg(
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index 4a5682475d107..dcccde4a4d666 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -311,7 +311,7 @@ def TuneAppleA7  : SubtargetFeature<"apple-a7", "ARMProcFamily", "AppleA7",
                                     FeatureDisableLatencySchedHeuristic,
                                     FeatureFuseAES, FeatureFuseCryptoEOR,
                                     FeatureStorePairSuppress,
-                                    FeatureZCRegMove,
+                                    FeatureZCRegMoveGPR64,
                                     FeatureZCRegMoveFPR64,
                                     FeatureZCZeroing,
                                     FeatureZCZeroingFPWorkaround]>;
@@ -325,7 +325,7 @@ def TuneAppleA10 : SubtargetFeature<"apple-a10", "ARMProcFamily", "AppleA10",
                                     FeatureFuseAES,
                                     FeatureFuseCryptoEOR,
                                     FeatureStorePairSuppress,
-                                    FeatureZCRegMove,
+                                    FeatureZCRegMoveGPR64,
                                     FeatureZCRegMoveFPR64,
                                     FeatureZCZeroing]>;
 
@@ -338,7 +338,7 @@ def TuneAppleA11 : SubtargetFeature<"apple-a11", "ARMProcFamily", "AppleA11",
                                     FeatureFuseAES,
                                     FeatureFuseCryptoEOR,
                                     FeatureStorePairSuppress,
-                                    FeatureZCRegMove,
+                                    FeatureZCRegMoveGPR64,
                                     FeatureZCRegMoveFPR64,
                                     FeatureZCZeroing]>;
 
@@ -351,7 +351,7 @@ def TuneAppleA12 : SubtargetFeature<"apple-a12", "ARMProcFamily", "AppleA12",
                                     FeatureFuseAES,
                                     FeatureFuseCryptoEOR,
                                     FeatureStorePairSuppress,
-                                    FeatureZCRegMove,
+                                    FeatureZCRegMoveGPR64,
                                     FeatureZCRegMoveFPR64,
                                     FeatureZCZeroing]>;
 
@@ -364,7 +364,7 @@ def TuneAppleA13 : SubtargetFeature<"apple-a13", "ARMProcFamily", "AppleA13",
                                     FeatureFuseAES,
                                     FeatureFuseCryptoEOR,
                                     FeatureStorePairSuppress,
-                                    FeatureZCRegMove,
+                                    FeatureZCRegMoveGPR64,
                                     FeatureZCRegMoveFPR64,
                                     FeatureZCZeroing]>;
 
@@ -382,7 +382,7 @@ def TuneAppleA14 : SubtargetFeature<"apple-a14", "ARMProcFamily", "AppleA14",
                                     FeatureFuseCryptoEOR,
                                     FeatureFuseLiterals,
                                     FeatureStorePairSuppress,
-                                    FeatureZCRegMove,
+                                    FeatureZCRegMoveGPR64,
                                     FeatureZCRegMoveFPR64,
                                     FeatureZCZeroing]>;
 
@@ -400,7 +400,7 @@ def TuneAppleA15 : SubtargetFeature<"apple-a15", "ARMProcFamily", "AppleA15",
                                     FeatureFuseCryptoEOR,
                                     FeatureFuseLiterals,
                                     FeatureStorePairSuppress,
-                                    FeatureZCRegMove,
+                                    FeatureZCRegMoveGPR64,
                                     FeatureZCRegMoveFPR64,
                                     FeatureZCZeroing]>;
 
@@ -418,7 +418,7 @@ def TuneAppleA16 : SubtargetFeature<"apple-a16", "ARMProcFamily", "AppleA16",
                                     FeatureFuseCryptoEOR,
                                     FeatureFuseLiterals,
                                     FeatureStorePairSuppress,
-                                    FeatureZCRegMove,
+                                    FeatureZCRegMoveGPR64,
                                     FeatureZCRegMoveFPR64,
                                     FeatureZCZeroing]>;
 
@@ -436,7 +436,7 @@ def TuneAppleA17 : SubtargetFeature<"apple-a17", "ARMProcFamily", "AppleA17",
                                     FeatureFuseCryptoEOR,
                                     FeatureFuseLiterals,
                                     FeatureStorePairSuppress,
-                                    FeatureZCRegMove,
+                                    FeatureZCRegMoveGPR64,
                                     FeatureZCRegMoveFPR64,
                                     FeatureZCZeroing]>;
 
@@ -453,7 +453,7 @@ def TuneAppleM4 : SubtargetFeature<"apple-m4", "ARMProcFamily", "AppleM4",
                                      FeatureFuseCCSelect,
                                      FeatureFuseCryptoEOR,
                                      FeatureFuseLiterals,
-                                     FeatureZCRegMove,
+                                     FeatureZCRegMoveGPR64,
                                      FeatureZCRegMoveFPR64,
                                      FeatureZCZeroing
                                      ]>;
diff --git a/llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov-gpr.ll b/llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov-gpr.ll
new file mode 100644
index 0000000000000..e14e69b5e6a2a
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov-gpr.ll
@@ -0,0 +1,54 @@
+; RUN: llc < %s -mtriple=arm64-linux-gnu | FileCheck %s -check-prefixes=NOTCPU-LINUX --match-full-lines
+; RUN: llc < %s -mtriple=arm64-apple-macosx -mcpu=generic | FileCheck %s -check-prefixes=NOTCPU-APPLE --match-full-lines
+; RUN: llc < %s -mtriple=arm64-apple-macosx -mcpu=apple-m1 | FileCheck %s -check-prefixes=CPU --match-full-lines
+; RUN: llc < %s -mtriple=arm64-apple-macosx -mcpu=apple-m1 -mattr=-zcm-gpr64 | FileCheck %s -check-prefixes=NOTATTR --match-full-lines
+; RUN: llc < %s -mtriple=arm64-apple-macosx -mattr=+zcm-gpr64 | FileCheck %s -check-prefixes=ATTR --match-full-lines
+
+define void @zero_cycle_regmov_GPR32(i32 %a, i32 %b, i32 %c, i32 %d) {
+entry:
+; CHECK-LABEL: t:
+; NOTCPU-LINUX: mov w0, w2
+; NOTCPU-LINUX: mov w1, w3
+; NOTCPU-LINUX: mov [[REG2:w[0-9]+]], w3
+; NOTCPU-LINUX: mov [[REG1:w[0-9]+]], w2
+; NOTCPU-LINUX-NEXT: bl {{_?foo_i32}}
+; NOTCPU-LINUX: mov w0, [[REG1]]
+; NOTCPU-LINUX: mov w1, [[REG2]]
+
+; NOTCPU-APPLE: mov w0, w2
+; NOTCPU-APPLE: mov w1, w3
+; NOTCPU-APPLE: mov [[REG2:w[0-9]+]], w3
+; NOTCPU-APPLE: mov [[REG1:w[0-9]+]], w2
+; NOTCPU-APPLE-NEXT: bl {{_?foo_i32}}
+; NOTCPU-APPLE: mov w0, [[REG1]]
+; NOTCPU-APPLE: mov w1, [[REG2]]
+
+; CPU: mov [[REG2:x[0-9]+]], x3
+; CPU: mov [[REG1:x[0-9]+]], x2
+; CPU: mov x0, x2
+; CPU: mov x1, x3
+; CPU-NEXT: bl {{_?foo_i32}}
+; CPU: mov x0, [[REG1]]
+; CPU: mov x1, [[REG2]]
+
+; NOTATTR: mov [[REG2:w[0-9]+]], w3
+; NOTATTR: mov [[REG1:w[0-9]+]], w2
+; NOTATTR: mov w0, w2
+; NOTATTR: mov w1, w3
+; NOTATTR-NEXT: bl {{_?foo_i32}}
+; NOTATTR: mov w0, [[REG1]]
+; NOTATTR: mov w1, [[REG2]]
+
+; ATTR: mov x0, x2
+; ATTR: mov x1, x3
+; ATTR: mov [[REG2:x[0-9]+]], x3
+; ATTR: mov [[REG1:x[0-9]+]], x2
+; ATTR-NEXT: bl {{_?foo_i32}}
+; ATTR: mov x0, [[REG1]]
+; ATTR: mov x1, [[REG2]]
+  %call = call i32 @foo_i32(i32 %c, i32 %d)
+  %call1 = call i32 @foo_i32(i32 %c, i32 %d)
+  unreachable
+}
+
+declare i32 @foo_i32(i32, i32)
diff --git a/llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov-gpr32.ll b/llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov-gpr32.ll
deleted file mode 100644
index 5ef6d3e84805a..0000000000000
--- a/llvm/test/CodeGen/AArch64/arm64-zero-cycle-regmov-gpr32.ll
+++ /dev/null
@@ -1,45 +0,0 @@
-; RUN: llc < %s -march=arm64 | FileCheck %s -check-prefixes=NOTCPU --match-full-lines
-; RUN: llc < %s -mtriple=arm64-apple-macosx -mcpu=apple-m1 | FileCheck %s -check-prefixes=CPU --match-full-lines
-; RUN: llc < %s -mtriple=arm64-apple-macosx -mcpu=apple-m1 -mattr=-zcm | FileCheck %s -check-prefixes=NOTATTR --match-full-lines
-; RUN: llc < %s -mtriple=arm64-apple-macosx -mattr=+zcm | FileCheck %s -check-prefixes=ATTR --match-full-lines
-
-define void @t(i32 %a, i32 %b, i32 %c, i32 %d) {
-entry:
-; CHECK-LABEL: t:
-; NOTCPU: mov w0, w2
-; NOTCPU: mov w1, w3
-; NOTCPU: mov [[REG2:w[0-9]+]], w3
-; NOTCPU: mov [[REG1:w[0-9]+]], w2
-; NOTCPU-NEXT: bl {{_?foo}}
-; NOTCPU: mov w0, [[REG1]]
-; NOTCPU: mov w1, [[REG2]]
-
-; CPU: mov [[REG2:x[0-9]+]], x3
-; CPU: mov [[REG1:x[0-9]+]], x2
-; CPU: mov x0, x2
-; CPU: mov x1, x3
-; CPU-NEXT: bl {{_?foo}}
-; CPU: mov x0, [[REG1]]
-; CPU: mov x1, [[REG2]]
-
-; NOTATTR: mov [[REG2:w[0-9]+]], w3
-; NOTATTR: mov [[REG1:w[0-9]+]], w2
-; NOTATTR: mov w0, w2
-; NOTATTR: mov w1, w3
-; NOTATTR-NEXT: bl {{_?foo}}
-; NOTATTR: mov w0, [[REG1]]
-; NOTATTR: mov w1, [[REG2]]
-
-; ATTR: mov x0, x2
-; ATTR: mov x1, x3
-; ATTR: mov [[REG2:x[0-9]+]], x3
-; ATTR: mov [[REG1:x[0-9]+]], x2
-; ATTR-NEXT: bl {{_?foo}}
-; ATTR: mov x0, [[REG1]]
-; ATTR: mov x1, [[REG2]]
-  %call = call i32 @foo(i32 %c, i32 %d)
-  %call1 = call i32 @foo(i32 %c, i32 %d)
-  unreachable
-}
-
-declare i32 @foo(i32, i32)

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some suggestions, thanks.

@@ -5063,8 +5063,8 @@ void AArch64InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
.addImm(0)
.addImm(AArch64_AM::getShifterImm(AArch64_AM::LSL, 0));
} else {
if (Subtarget.hasZeroCycleRegMove()) {
// Cyclone recognizes "ORR Xd, XZR, Xm" as a zero-cycle register move.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments would be good to keep, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will restore them

@@ -107,6 +107,8 @@ Changes to the AArch64 Backend
[#132196](https://github.com/llvm/llvm-project/pull/132196),
[#133084](https://github.com/llvm/llvm-project/pull/133084))

* **Breaking Change:** Change target feature name from `zcm` to `zcm-gpr64`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As they are only tuning features I would not expect any users to use them. (And -mzcm isn't a valid option AFAIU). I would remove the release note, or at least remove the "Breaking Change" part. Maybe it is good to call these kinds of things out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea sorry about the wrong option in the release note and commit msg. Will fix the commit msg and remove the release note.

Aligns 0-cycle register MOV model of GPR64 and GPR32 register classes to that of FPR64 and FPR32 resolved in: llvm#144152.

- Splits `FeatureZCRegMove` into `FeatureZCRegMoveGPR64` and `FeatureZCRegMove32` and fix Apple processors and `AArch64InstrInfo` accordingly
- Aligns the test `arm64-zero-cycle-regmov-gpr.ll` to the FPR one

The target feature name change is effectively a breaking change. The absolute most of users shouldn't use `-mattr=zcm` directly, so I think it should be ok to make an immediate switch without a release note.
@tomershafir tomershafir force-pushed the aarch64-align-zero-cycle-regmov-gpr branch from 308e542 to 13275b3 Compare June 27, 2025 15:17
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@tomershafir tomershafir merged commit b6515ae into llvm:main Jun 28, 2025
7 checks passed
@tomershafir tomershafir deleted the aarch64-align-zero-cycle-regmov-gpr branch June 28, 2025 09:36
@@ -612,8 +612,11 @@ def FeatureExperimentalZeroingPseudos
def FeatureNoSVEFPLD1R : SubtargetFeature<"no-sve-fp-ld1r",
"NoSVEFPLD1R", "true", "Avoid using LD1RX instructions for FP">;

def FeatureZCRegMove : SubtargetFeature<"zcm", "HasZeroCycleRegMove", "true",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible there's more places that need to be updated (like TargetParser)? I'm seeing a bunch of '+zcm' is not a recognized feature for this target (ignoring feature) warnings when building anything with Clang on macOS now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will push a fix

tomershafir added a commit to tomershafir/llvm-project that referenced this pull request Jun 29, 2025
Replaces all the uses of `+zcm` with `+zcm-gpr64`. A fix for: llvm#146051
tomershafir added a commit that referenced this pull request Jun 29, 2025
Replaces all the uses of `+zcm` with `+zcm-gpr64`. A fix for:
#146051
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 29, 2025
…6260)

Replaces all the uses of `+zcm` with `+zcm-gpr64`. A fix for:
llvm/llvm-project#146051
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
…vm#146051)

Aligns 0-cycle register MOV model of GPR64 and GPR32 register classes to
that of FPR64 and FPR32 resolved in:
llvm#144152.

- Splits `FeatureZCRegMove` into `FeatureZCRegMoveGPR64` and
`FeatureZCRegMove32` and fix Apple processors and `AArch64InstrInfo`
accordingly
- Aligns the test `arm64-zero-cycle-regmov-gpr.ll` to the FPR one

The target feature name change is effectively a breaking change. The
absolute most of users shouldn't use `-mzcm` directly, so I think it
should be ok to make an immediate switch, unless this doesn't align with
the conventions in this project. The patch adds a release note for that.
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
Replaces all the uses of `+zcm` with `+zcm-gpr64`. A fix for:
llvm#146051
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
…vm#146051)

Aligns 0-cycle register MOV model of GPR64 and GPR32 register classes to
that of FPR64 and FPR32 resolved in:
llvm#144152.

- Splits `FeatureZCRegMove` into `FeatureZCRegMoveGPR64` and
`FeatureZCRegMove32` and fix Apple processors and `AArch64InstrInfo`
accordingly
- Aligns the test `arm64-zero-cycle-regmov-gpr.ll` to the FPR one

The target feature name change is effectively a breaking change. The
absolute most of users shouldn't use `-mzcm` directly, so I think it
should be ok to make an immediate switch, unless this doesn't align with
the conventions in this project. The patch adds a release note for that.
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
Replaces all the uses of `+zcm` with `+zcm-gpr64`. A fix for:
llvm#146051
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants