Skip to content

[Clang][HIP] Deprecate the AMDGCN_WAVEFRONT_SIZE macros #112849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions clang/docs/AMDGPUSupport.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,9 @@ Predefined Macros
* - ``__AMDGCN_UNSAFE_FP_ATOMICS__``
- Defined if unsafe floating-point atomics are allowed.
* - ``__AMDGCN_WAVEFRONT_SIZE__``
- Defines the wavefront size. Allowed values are 32 and 64.
- Defines the wavefront size. Allowed values are 32 and 64 (deprecated).
* - ``__AMDGCN_WAVEFRONT_SIZE``
- Alias to ``__AMDGCN_WAVEFRONT_SIZE__``. To be deprecated.
- Alias to ``__AMDGCN_WAVEFRONT_SIZE__`` (deprecated).
* - ``__HAS_FMAF__``
- Defined if FMAF instruction is available (deprecated).
* - ``__HAS_LDEXPF__``
Expand Down
2 changes: 1 addition & 1 deletion clang/docs/HIPSupport.rst
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ Predefined Macros

Note that some architecture specific AMDGPU macros will have default values when
used from the HIP host compilation. Other :doc:`AMDGPU macros <AMDGPUSupport>`
like ``__AMDGCN_WAVEFRONT_SIZE__`` will default to 64 for example.
like ``__AMDGCN_WAVEFRONT_SIZE__`` (deprecated) will default to 64 for example.

Compilation Modes
=================
Expand Down
8 changes: 7 additions & 1 deletion clang/include/clang/Basic/MacroBuilder.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,14 @@ class MacroBuilder {
MacroBuilder(raw_ostream &Output) : Out(Output) {}

/// Append a \#define line for macro of the form "\#define Name Value\n".
void defineMacro(const Twine &Name, const Twine &Value = "1") {
/// If DeprecationMsg is provided, also append a pragma to deprecate the
/// defined macro.
void defineMacro(const Twine &Name, const Twine &Value = "1",
Twine DeprecationMsg = "") {
Out << "#define " << Name << ' ' << Value << '\n';
if (!DeprecationMsg.isTriviallyEmpty())
Out << "#pragma clang deprecated(" << Name << ", \"" << DeprecationMsg
<< "\")\n";
}

/// Append a \#undef line for Name. Name should be of the form XXX
Expand Down
9 changes: 6 additions & 3 deletions clang/lib/Basic/Targets/AMDGPU.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -337,9 +337,12 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts,
if (hasFastFMA())
Builder.defineMacro("FP_FAST_FMA");

Builder.defineMacro("__AMDGCN_WAVEFRONT_SIZE__", Twine(WavefrontSize));
// ToDo: deprecate this macro for naming consistency.
Builder.defineMacro("__AMDGCN_WAVEFRONT_SIZE", Twine(WavefrontSize));
Builder.defineMacro("__AMDGCN_WAVEFRONT_SIZE__", Twine(WavefrontSize),
"compile-time-constant access to the wavefront size will "
"be removed in a future release");
Builder.defineMacro("__AMDGCN_WAVEFRONT_SIZE", Twine(WavefrontSize),
"compile-time-constant access to the wavefront size will "
"be removed in a future release");
Comment on lines +340 to +345
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is unhelpful to deprecate without a suggested replacement.

I also disagree with removing this for the device side compile. There still are uses for it, and there are still non-offloading languages

Copy link
Contributor

@AlexVlx AlexVlx Oct 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are removing an antipattern, the replacement is “don't do this”? Not all things that exist are valid. We should absolutely remove this, at least for HIP, where it creates needless divergence from CUDA and is a bug farm as a consequence.

Orthogonally, what are the uses for it that you have in mind, and which non-offloading languages are you thinking of? The only scenario I can envision here is if one does an end-to-end compile (so not the weird 1-per-target thing we do with offload, and not playing games around single AST that gets forked at CodeGen time). In that case, yes, that might be useful, but delivering it as a macro is fundamentally a bad design IMHO, so even there we should not have this. Furthermore, at the moment, that sort of use is incredibly underspecified / governed by happens to work by accident rather than by design, so building some early technical debt seems problematic, at least to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This never should've been defined on the host side for sure. We do this in the OpenMP device runtime.

return llvm::omp::getAMDGPUGridValues<__AMDGCN_WAVEFRONT_SIZE>();

That could probably be replaced with

return __builtin_amdgcn_wavefrontsize();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of lower level uses. i.e. not user facing code we would want in headers or people using directly. e.g. cases where implementing cross lane operations relies on the operations that only work on half the lanes, or don't work at all when executed in wave32

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use we currently have in our project is to detect, via CMake's try_compile, whether we have Wave64 or Wave32 targets (or both) to later choose which data structure flavors to instantiate. With the macro, we can easily do #if+#warning inside a kernel, and then parse the build log.

Without it, the only solution we found was to parse the list of targets, which feels more brittle. __builtin_amdgcn_wavefrontsize() is not a compile-time constant expression, so cannot be used for such checks. Am I missing something? What is the planned timeline for the removal of this macro?

Copy link
Contributor

@jhuber6 jhuber6 Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that it's not that simple, since the user can easily change the wavefront size by compiling with -mwavefrontsize64. I ran into these types of issues myself while working on the RPC interface for the libc . There I pretty much just take the wavefront size as an argument to the indexing / allocation functions. Alternatively you use a template and do runtime dispatch and assume that optimizations will DCE the other code.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that it's not that simple, since the user can easily change the wavefront size by compiling with -mwavefrontsize64.

I assume you are highlighting a problem with parsing the target list? That's a good point.

I ran into these types of issues myself while working on the RPC interface for the rpc interface.

Ah, the new meaning of "running DOOM on GPUs" 😄 👍

Alternatively you use a template and do runtime dispatch and assume that optimizations will DCE the other code.

That's a nice solution and we are moving into that direction (with AMD's help). Unfortunately, we already template on a lot of things, and doubling the compile time in the most common case (compiling for a single device architecture) would be quite noticeable. So, having a compile-time thing to only instantiate the data structures / functions for the wavefront size we need would still be desirable.

Not saying that the whole SPIR-V / JIT thing is bad (having portable binaries would be great, especially the ones using standardized IR), but it's not all upsides.

Copy link
Contributor

@jhuber6 jhuber6 Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think these macros are fine for the case where the user manually passed an -mcpu= argument to the compilation job, and we're not targeting SPIR-V. As far as I know, the main motivation is because for HIP SPIR-V it's an incorrect value since it will JIT compile, but if we're not doing that then it's fine. So it's mostly for compatibility between the two. You could probably make CMake define a custom macro depending on what rocminfo states if nothing else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect, the change was not motivated by anything SPIR-V related, the discussion predates it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The big problem previously was that it was defined on the host for some reason.

Builder.defineMacro("__AMDGCN_CUMODE__", Twine(CUMode));
}

Expand Down
111 changes: 111 additions & 0 deletions clang/test/Driver/hip-wavefront-size-deprecation-diagnostics.hip
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
// REQUIRES: amdgpu-registered-target
// RUN: %clang -xhip --offload-arch=gfx1030 --offload-host-only -pedantic -nogpuinc -nogpulib -nobuiltininc -fsyntax-only -Xclang -verify %s
// RUN: %clang -xhip --offload-arch=gfx1030 --offload-device-only -pedantic -nogpuinc -nogpulib -nobuiltininc -fsyntax-only -Xclang -verify %s

// Test that deprecation warnings for the wavefront size macro are emitted properly.

#include <type_traits>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Clang's tests must be hermetic and can't include system headers
  2. This fails on Mac: http://45.33.8.238/macm1/95266/step_6.txt

Please take a look and revert for now if it takes a while to fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nico Thanks for the report and sorry for the inconvenience! I reverted the patch with #115499 and opened PR #115507 with a fix. I'd appreciate it if you could take a look and see if that fixes the issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Since the test is now hermetic, I'm guessing it'll work now. I'll check my bot after you've relanded your patch.


#define WRAPPED __AMDGCN_WAVEFRONT_SIZE__

#define DOUBLE_WRAPPED (WRAPPED)

__attribute__((host, device)) void use(int, const char*);

template<int N> __attribute__((host, device)) int templatify(int x) {
return x + N;
}

__attribute__((device)) const int GlobalConst = __AMDGCN_WAVEFRONT_SIZE__; // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
constexpr int GlobalConstExpr = __AMDGCN_WAVEFRONT_SIZE__; // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}

#if defined(__HIP_DEVICE_COMPILE__) && (__AMDGCN_WAVEFRONT_SIZE__ == 64) // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
int foo(void);
#endif

__attribute__((device)) int device_var = __AMDGCN_WAVEFRONT_SIZE__; // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}

__attribute__((device))
void device_fun() {
use(__AMDGCN_WAVEFRONT_SIZE, "device function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE' has been marked as deprecated}}
use(__AMDGCN_WAVEFRONT_SIZE__, "device function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
use(WRAPPED, "device function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
use(DOUBLE_WRAPPED, "device function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "device function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
use(GlobalConst, "device function");
use(GlobalConstExpr, "device function");
}

__attribute__((global))
void global_fun() {
// no warnings expected
use(__AMDGCN_WAVEFRONT_SIZE, "global function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE' has been marked as deprecated}}
use(__AMDGCN_WAVEFRONT_SIZE__, "global function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
use(WRAPPED, "global function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
use(DOUBLE_WRAPPED, "global function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "global function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
}

int host_var = __AMDGCN_WAVEFRONT_SIZE__; // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
int host_var_alt = __AMDGCN_WAVEFRONT_SIZE; // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE' has been marked as deprecated}}
int host_var_wrapped = WRAPPED; // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
int host_var_double_wrapped = DOUBLE_WRAPPED; // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}

__attribute__((host))
void host_fun() {
use(__AMDGCN_WAVEFRONT_SIZE, "host function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE' has been marked as deprecated}}
use(__AMDGCN_WAVEFRONT_SIZE__, "host function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
use(WRAPPED, "host function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
use(DOUBLE_WRAPPED, "host function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "host function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
use(GlobalConst, "host function");
use(GlobalConstExpr, "host function");
}

__attribute((host, device))
void host_device_fun() {
use(__AMDGCN_WAVEFRONT_SIZE__, "host device function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
use(WRAPPED, "host device function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
use(DOUBLE_WRAPPED, "host device function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "host device function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
}

template <unsigned int OuterWarpSize = __AMDGCN_WAVEFRONT_SIZE__> // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
class FunSelector {
public:
template<unsigned int FunWarpSize = OuterWarpSize>
__attribute__((device))
auto fun(void)
-> typename std::enable_if<(FunWarpSize <= __AMDGCN_WAVEFRONT_SIZE__), void>::type // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
{
use(1, "yay!");
}

template<unsigned int FunWarpSize = OuterWarpSize>
__attribute__((device))
auto fun(void)
-> typename std::enable_if<(FunWarpSize > __AMDGCN_WAVEFRONT_SIZE__), void>::type // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
{
use(0, "nay!");
}
};

__attribute__((device))
void device_fun_selector_user() {
FunSelector<> f;
f.fun<>();
f.fun<1>();
f.fun<1000>();

std::enable_if<(1 <= __AMDGCN_WAVEFRONT_SIZE__), int>::type x = 42; // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
}

__attribute__((device)) std::enable_if<(1 <= __AMDGCN_WAVEFRONT_SIZE__), int>::type DeviceFunTemplateRet(void) { // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
return 42;
}

__attribute__((device)) int DeviceFunTemplateArg(std::enable_if<(1 <= __AMDGCN_WAVEFRONT_SIZE__), int>::type x) { // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}}
return x;
}

// expected-note@* 0+ {{macro marked 'deprecated' here}}
Loading