Skip to content

Commit 8e5275a

Browse files
authored
[SYCL] Add support for -ftarget-register-alloc-mode (#11430)
We finally are using a new enough IGC driver in CI so we can implement this. The option was approved a while back by the options group. This change adds a new option `-ftarget-register-alloc-mode`. It is specified as follows: `-ftarget-register-alloc-mode=DeviceName0:Mode0[,DeviceName1:Mode1...]` Currently the only valid device name is `pvc`, but I developed this change to be extendable in the future. The valid modes are: auto -> request the backend to use heuristics to pick a register alloc mode small -> use small grf mode large -> use large grf mode default -> provide no specification to the backend on what register alloc mode to use. The default value if not specified is `pvc:default`, so provide no specification to the backend in any case for any hardware. I will begin an internal discussion after this is merged to see if we should change the default to be `pvc:auto`. The driver owns the mapping between the option mode and the backend flag. It converts the user specified mode to the backend flag and passes it either to ocloc for AOT or clang-offload-wrapper for JIT. The runtime will extract the option stored in the image by clang-offload-wrapper and pass it to the device backend if the device matches. For AOT, we start making use of a new ocloc feature, `-device_options <device> <options>`, ex `-device_options pvc "-options -ze-opt-large-register-file"`. This means "if the device is pvc, use large GRF mode. If the device is not pvc, ignore the option." For JIT, we do some argument processing in the runtime. The driver splits the options in the case multiple are passed, and converts the mode to the backend argument name to make the runtime processing easier. For example, if the user passes `-ftarget-target-alloc-mode=pvc:auto,pvc:large`, the options stored in the image that the runtime sees will be `-ftarget-target-alloc-mode=pvc:-ze-opt-large-register-file -ftarget-target-alloc-mode=pvc:-ze-intel-enable-auto-large-GRF-mode` The runtime loops over all the options, and for each instance of `-ftarget-register-alloc-mode`: Check if the device matches the device in the argument (currently only pvc supported) If so, replace the `-ftarget-register-alloc-mode=...` option with the backend opt provided by the driver. If not, remove the `-ftarget-register-alloc-mode=...` as to not pass it to the backend. --------- Signed-off-by: Sarnie, Nick <[email protected]>
1 parent 819a94a commit 8e5275a

File tree

7 files changed

+229
-0
lines changed

7 files changed

+229
-0
lines changed

clang/include/clang/Driver/Options.td

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3840,6 +3840,10 @@ def ftarget_export_symbols : Flag<["-"], "ftarget-export-symbols">,
38403840
"target library to allow for visibilty to other modules.">;
38413841
def fno_target_export_symbols : Flag<["-"], "fno-target-export-symbols">,
38423842
Visibility<[ClangOption, CLOption, DXCOption]>;
3843+
def ftarget_register_alloc_mode_EQ : Joined<["-"], "ftarget-register-alloc-mode=">,
3844+
Visibility<[ClangOption, CLOption, DXCOption]>,
3845+
HelpText<"Specify a register allocation mode for specific hardware for use by supported "
3846+
"target backends.">;
38433847
def : Flag<["-"], "fsycl-rdc">, Visibility<[ClangOption, CLOption, DXCOption]>, Alias<fgpu_rdc>;
38443848
def : Flag<["-"], "fno-sycl-rdc">, Visibility<[ClangOption, CLOption, DXCOption]>, Alias<fno_gpu_rdc>;
38453849
def fsycl_optimize_non_user_code : Flag<["-"], "fsycl-optimize-non-user-code">,

clang/lib/Driver/ToolChains/SYCL.cpp

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -763,6 +763,14 @@ void SYCL::fpga::BackendCompiler::ConstructJob(
763763
C.addCommand(std::move(Cmd));
764764
}
765765

766+
StringRef SYCL::gen::getGenGRFFlag(StringRef GRFMode) {
767+
return llvm::StringSwitch<StringRef>(GRFMode)
768+
.Case("auto", "-ze-intel-enable-auto-large-GRF-mode")
769+
.Case("small", "-ze-intel-128-GRF-per-thread")
770+
.Case("large", "-ze-opt-large-register-file")
771+
.Default("");
772+
}
773+
766774
void SYCL::gen::BackendCompiler::ConstructJob(Compilation &C,
767775
const JobAction &JA,
768776
const InputInfo &Output,
@@ -1132,6 +1140,9 @@ void SYCLToolChain::AddImpliedTargetArgs(const llvm::Triple &Triple,
11321140
// GEN: -options "-g -O0"
11331141
// CPU: "--bo=-g -cl-opt-disable"
11341142
llvm::opt::ArgStringList BeArgs;
1143+
// Per-device argument vector storing the device name and the backend argument
1144+
// string
1145+
llvm::SmallVector<std::pair<StringRef, StringRef>, 16> PerDeviceArgs;
11351146
bool IsGen = Triple.getSubArch() == llvm::Triple::SPIRSubArch_gen;
11361147
if (Arg *A = Args.getLastArg(options::OPT_g_Group, options::OPT__SLASH_Z7))
11371148
if (!A->getOption().matches(options::OPT_g0))
@@ -1142,6 +1153,40 @@ void SYCLToolChain::AddImpliedTargetArgs(const llvm::Triple &Triple,
11421153
if (Arg *A = Args.getLastArg(options::OPT_O_Group))
11431154
if (A->getOption().matches(options::OPT_O0))
11441155
BeArgs.push_back("-cl-opt-disable");
1156+
StringRef RegAllocModeOptName = "-ftarget-register-alloc-mode=";
1157+
if (Arg *A = Args.getLastArg(options::OPT_ftarget_register_alloc_mode_EQ)) {
1158+
StringRef RegAllocModeVal = A->getValue(0);
1159+
auto ProcessElement = [&](StringRef Ele) {
1160+
auto [DeviceName, RegAllocMode] = Ele.split(':');
1161+
StringRef BackendOptName = SYCL::gen::getGenGRFFlag(RegAllocMode);
1162+
bool IsDefault = RegAllocMode.equals("default");
1163+
if (RegAllocMode.empty() || !DeviceName.equals("pvc") ||
1164+
(BackendOptName.empty() && !IsDefault)) {
1165+
getDriver().Diag(diag::err_drv_unsupported_option_argument)
1166+
<< A->getSpelling() << Ele;
1167+
}
1168+
// "default" means "provide no specification to the backend", so
1169+
// we don't need to do anything here.
1170+
if (IsDefault)
1171+
return;
1172+
if (IsGen) {
1173+
// For AOT, Use ocloc's per-device options flag with the correct ocloc
1174+
// option to honor the user's specification.
1175+
PerDeviceArgs.push_back(
1176+
{DeviceName, Args.MakeArgString("-options " + BackendOptName)});
1177+
} else if (Triple.isSPIR() &&
1178+
Triple.getSubArch() == llvm::Triple::NoSubArch) {
1179+
// For JIT, pass -ftarget-register-alloc-mode=Device:BackendOpt to
1180+
// clang-offload-wrapper to be processed by the runtime.
1181+
BeArgs.push_back(Args.MakeArgString(RegAllocModeOptName + DeviceName +
1182+
":" + BackendOptName));
1183+
}
1184+
};
1185+
llvm::SmallVector<StringRef, 16> RegAllocModeArgs;
1186+
RegAllocModeVal.split(RegAllocModeArgs, ',');
1187+
for (StringRef Elem : RegAllocModeArgs)
1188+
ProcessElement(Elem);
1189+
}
11451190
if (IsGen) {
11461191
// For GEN (spir64_gen) we have implied -device settings given usage
11471192
// of intel_gpu_ as a target. Handle those here, and also check that no
@@ -1181,6 +1226,13 @@ void SYCLToolChain::AddImpliedTargetArgs(const llvm::Triple &Triple,
11811226
// -ftarget-compile-fast JIT
11821227
Args.AddLastArg(BeArgs, options::OPT_ftarget_compile_fast);
11831228
}
1229+
if (IsGen) {
1230+
for (auto [DeviceName, BackendArgStr] : PerDeviceArgs) {
1231+
CmdArgs.push_back("-device_options");
1232+
CmdArgs.push_back(Args.MakeArgString(DeviceName));
1233+
CmdArgs.push_back(Args.MakeArgString(BackendArgStr));
1234+
}
1235+
}
11841236
if (BeArgs.empty())
11851237
return;
11861238
if (Triple.getSubArch() == llvm::Triple::NoSubArch ||

clang/lib/Driver/ToolChains/SYCL.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,7 @@ class LLVM_LIBRARY_VISIBILITY BackendCompiler : public Tool {
115115

116116
StringRef resolveGenDevice(StringRef DeviceName);
117117
SmallString<64> getGenDeviceMacro(StringRef DeviceName);
118+
StringRef getGenGRFFlag(StringRef GRFMode);
118119

119120
// // Prefix for GPU specific targets used for -fsycl-targets
120121
constexpr char IntelGPU[] = "intel_gpu_";
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
// Test SYCL -ftarget-register-alloc-mode
2+
3+
// RUN: %clang -### -fsycl \
4+
// RUN: -fsycl-targets=spir64_gen -ftarget-register-alloc-mode=pvc:auto %s 2>&1 \
5+
// RUN: | FileCheck -check-prefix=AUTO_AOT %s
6+
7+
// RUN: %clang -### -fsycl \
8+
// RUN: -fsycl-targets=spir64_gen -ftarget-register-alloc-mode=pvc:large %s 2>&1 \
9+
// RUN: | FileCheck -check-prefix=LARGE_AOT %s
10+
11+
// RUN: %clang -### -fsycl \
12+
// RUN: -fsycl-targets=spir64_gen -ftarget-register-alloc-mode=pvc:small %s 2>&1 \
13+
// RUN: | FileCheck -check-prefix=SMALL_AOT %s
14+
15+
// RUN: %clang -### -fsycl \
16+
// RUN: -fsycl-targets=spir64_gen -ftarget-register-alloc-mode=pvc:default %s 2>&1 \
17+
// RUN: | FileCheck -check-prefix=DEFAULT_AOT %s
18+
19+
// RUN: %clang -### -fsycl \
20+
// RUN: -fsycl-targets=spir64_gen -ftarget-register-alloc-mode=pvc:small,pvc:large %s 2>&1 \
21+
// RUN: | FileCheck -check-prefix=MULTIPLE_ARGS_AOT %s
22+
23+
// RUN: %clang -### -fsycl \
24+
// RUN: -ftarget-register-alloc-mode=pvc:auto %s 2>&1 \
25+
// RUN: | FileCheck -check-prefix=AUTO_JIT %s
26+
27+
// RUN: %clang -### -fsycl \
28+
// RUN: -ftarget-register-alloc-mode=pvc:large %s 2>&1 \
29+
// RUN: | FileCheck -check-prefix=LARGE_JIT %s
30+
31+
// RUN: %clang -### -fsycl \
32+
// RUN: -ftarget-register-alloc-mode=pvc:small %s 2>&1 \
33+
// RUN: | FileCheck -check-prefix=SMALL_JIT %s
34+
35+
// RUN: %clang -### -fsycl \
36+
// RUN: -ftarget-register-alloc-mode=pvc:default %s 2>&1 \
37+
// RUN: | FileCheck -check-prefix=DEFAULT_JIT %s
38+
39+
// RUN: %clang -### -fsycl \
40+
// RUN: -ftarget-register-alloc-mode=pvc:small,pvc:large %s 2>&1 \
41+
// RUN: | FileCheck -check-prefix=MULTIPLE_ARGS_JIT %s
42+
43+
// RUN: not %clang -### -fsycl \
44+
// RUN: -fsycl-targets=spir64_gen -ftarget-register-alloc-mode=dg2:auto %s 2>&1 \
45+
// RUN: | FileCheck -check-prefix=BAD_DEVICE %s
46+
47+
// RUN: not %clang -### -fsycl \
48+
// RUN: -fsycl-targets=spir64_gen -ftarget-register-alloc-mode=pvc:superlarge %s 2>&1 \
49+
// RUN: | FileCheck -check-prefix=BAD_MODE %s
50+
51+
// RUN: not %clang -### -fsycl \
52+
// RUN: -fsycl-targets=spir64_gen -ftarget-register-alloc-mode=dg2:superlarge %s 2>&1 \
53+
// RUN: | FileCheck -check-prefix=BAD_BOTH %s
54+
55+
// AUTO_AOT: ocloc{{.*}} "-output"
56+
// AUTO_AOT: -device_options
57+
// AUTO_AOT: pvc
58+
// AUTO_AOT: "-options -ze-intel-enable-auto-large-GRF-mode"
59+
60+
// LARGE_AOT: ocloc{{.*}} "-output"
61+
// LARGE_AOT: -device_options
62+
// LARGE_AOT: pvc
63+
// LARGE_AOT: "-options -ze-opt-large-register-file"
64+
65+
// SMALL_AOT: ocloc{{.*}} "-output"
66+
// SMALL_AOT: -device_options
67+
// SMALL_AOT: pvc
68+
// SMALL_AOT: "-options -ze-intel-128-GRF-per-thread"
69+
70+
// DEFAULT_AOT-NOT: -device_options
71+
72+
// MULTIPLE_ARGS_AOT: ocloc{{.*}} "-output"
73+
// MULTIPLE_ARGS_AOT: -device_options
74+
// MULTIPLE_ARGS_AOT: pvc
75+
// MULTIPLE_ARGS_AOT: "-options -ze-intel-128-GRF-per-thread"
76+
// MULTIPLE_ARGS_AOT: -device_options
77+
// MULTIPLE_ARGS_AOT: pvc
78+
// MULTIPLE_ARGS_AOT: "-options -ze-opt-large-register-file"
79+
80+
// AUTO_JIT: clang-offload-wrapper{{.*}} "-compile-opts=-ftarget-register-alloc-mode=pvc:-ze-intel-enable-auto-large-GRF-mode"
81+
82+
// LARGE_JIT: clang-offload-wrapper{{.*}} "-compile-opts=-ftarget-register-alloc-mode=pvc:-ze-opt-large-register-file"
83+
84+
// SMALL_JIT: clang-offload-wrapper{{.*}} "-compile-opts=-ftarget-register-alloc-mode=pvc:-ze-intel-128-GRF-per-thread"
85+
86+
// DEFAULT_JIT-NOT: -ftarget-register-alloc-mode=
87+
88+
// MULTIPLE_ARGS_JIT: clang-offload-wrapper{{.*}} "-compile-opts=-ftarget-register-alloc-mode=pvc:-ze-intel-128-GRF-per-thread -ftarget-register-alloc-mode=pvc:-ze-opt-large-register-file"
89+
90+
// BAD_DEVICE: unsupported argument 'dg2:auto' to option '-ftarget-register-alloc-mode='
91+
// BAD_MODE: unsupported argument 'pvc:superlarge' to option '-ftarget-register-alloc-mode='
92+
// BAD_BOTH: unsupported argument 'dg2:superlarge' to option '-ftarget-register-alloc-mode='

sycl/doc/UsersManual.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -426,6 +426,7 @@ and not recommended to use in production environment.
426426
"stateless" memory accesses.
427427

428428
**`-ftarget-compile-fast`** [EXPERIMENTAL]
429+
429430
Instructs the target backend to reduce compilation time, potentially
430431
at the cost of runtime performance. Currently only supported on Intel GPUs.
431432

@@ -436,6 +437,13 @@ and not recommended to use in production environment.
436437

437438
NOTE: This flag is only supported for spir64_gen AOT targets.
438439

440+
**`-ftarget-register-alloc-mode=<arg>`**
441+
442+
Specify a register allocation mode for specific hardware for use by supported
443+
target backends. The format of the argument is "Device0:Mode0[,Device1:Mode1...]".
444+
Currently the only supported Device is "pvc". The supported modes are
445+
"default","small","large", and "auto".
446+
439447
# Example: SYCL device code compilation
440448

441449
To invoke SYCL device compiler set `-fsycl-device-only` flag.

sycl/source/detail/program_manager/program_manager.cpp

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -494,6 +494,37 @@ static void appendCompileOptionsFromImage(std::string &CompileOpts,
494494
else
495495
CompileOpts.erase(Pos, OptLen);
496496
}
497+
static const std::string TargetRegisterAllocMode =
498+
"-ftarget-register-alloc-mode=";
499+
auto OptPos = CompileOpts.find(TargetRegisterAllocMode);
500+
while (OptPos != std::string::npos) {
501+
auto EndOfOpt = CompileOpts.find(" ", OptPos);
502+
// Extract everything after the equals until the end of the option
503+
auto OptValue = CompileOpts.substr(
504+
OptPos + TargetRegisterAllocMode.size(),
505+
EndOfOpt - OptPos - TargetRegisterAllocMode.size());
506+
auto ColonPos = OptValue.find(":");
507+
auto Device = OptValue.substr(0, ColonPos);
508+
std::string BackendStrToAdd;
509+
bool IsPVC =
510+
std::all_of(Devs.begin(), Devs.end(), [&](const device &Dev) {
511+
return IsIntelGPU &&
512+
(Dev.get_info<ext::intel::info::device::device_id>() &
513+
0xFF00) == 0x0B00;
514+
});
515+
// Currently 'pvc' is the only supported device.
516+
if (Device == "pvc" && IsPVC)
517+
BackendStrToAdd = " " + OptValue.substr(ColonPos + 1) + " ";
518+
519+
// Extract everything before this option
520+
std::string NewCompileOpts =
521+
CompileOpts.substr(0, OptPos) + BackendStrToAdd;
522+
// Extract everything after this option and add it to the above.
523+
if (EndOfOpt != std::string::npos)
524+
NewCompileOpts += CompileOpts.substr(EndOfOpt);
525+
CompileOpts = NewCompileOpts;
526+
OptPos = CompileOpts.find(TargetRegisterAllocMode);
527+
}
497528
}
498529
}
499530

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
// REQUIRES: gpu-intel-pvc
2+
3+
// RUN: %{build} -ftarget-register-alloc-mode=pvc:auto -o %t_with.out
4+
// RUN: %{build} -o %t_without.out
5+
6+
// RUN: env SYCL_PI_TRACE=-1 %{run} %t_with.out 2>&1 | FileCheck --check-prefix=CHECK-WITH %s
7+
// RUN: env SYCL_PI_TRACE=-1 %{run} %t_without.out 2>&1 | FileCheck --implicit-check-not=-ze-intel-enable-auto-large-GRF-mode %s
8+
9+
// CHECK-WITH: ---> piProgramBuild(
10+
// CHECK-WITH: -ze-intel-enable-auto-large-GRF-mode
11+
12+
#include <sycl/sycl.hpp>
13+
14+
int main() {
15+
sycl::buffer<size_t, 1> Buffer(4);
16+
17+
sycl::queue Queue;
18+
19+
sycl::range<1> NumOfWorkItems{Buffer.size()};
20+
21+
Queue.submit([&](sycl::handler &cgh) {
22+
sycl::accessor Accessor{Buffer, cgh, sycl::write_only};
23+
cgh.parallel_for<class FillBuffer>(NumOfWorkItems, [=](sycl::id<1> WIid) {
24+
Accessor[WIid] = WIid.get(0);
25+
});
26+
});
27+
28+
sycl::host_accessor HostAccessor{Buffer, sycl::read_only};
29+
30+
bool MismatchFound = false;
31+
for (size_t I = 0; I < Buffer.size(); ++I) {
32+
if (HostAccessor[I] != I) {
33+
std::cout << "The result is incorrect for element: " << I
34+
<< " , expected: " << I << " , got: " << HostAccessor[I]
35+
<< std::endl;
36+
MismatchFound = true;
37+
}
38+
}
39+
40+
return MismatchFound;
41+
}

0 commit comments

Comments
 (0)