-
Notifications
You must be signed in to change notification settings - Fork 14.6k
[Clang][HIP] Deprecate the AMDGCN_WAVEFRONT_SIZE macros #112849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
// REQUIRES: amdgpu-registered-target | ||
// RUN: %clang -xhip --offload-arch=gfx1030 --offload-host-only -pedantic -nogpuinc -nogpulib -nobuiltininc -fsyntax-only -Xclang -verify %s | ||
// RUN: %clang -xhip --offload-arch=gfx1030 --offload-device-only -pedantic -nogpuinc -nogpulib -nobuiltininc -fsyntax-only -Xclang -verify %s | ||
|
||
// Test that deprecation warnings for the wavefront size macro are emitted properly. | ||
|
||
#include <type_traits> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Please take a look and revert for now if it takes a while to fix. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! Since the test is now hermetic, I'm guessing it'll work now. I'll check my bot after you've relanded your patch. |
||
|
||
#define WRAPPED __AMDGCN_WAVEFRONT_SIZE__ | ||
|
||
#define DOUBLE_WRAPPED (WRAPPED) | ||
|
||
__attribute__((host, device)) void use(int, const char*); | ||
|
||
template<int N> __attribute__((host, device)) int templatify(int x) { | ||
return x + N; | ||
} | ||
|
||
__attribute__((device)) const int GlobalConst = __AMDGCN_WAVEFRONT_SIZE__; // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
constexpr int GlobalConstExpr = __AMDGCN_WAVEFRONT_SIZE__; // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
|
||
#if defined(__HIP_DEVICE_COMPILE__) && (__AMDGCN_WAVEFRONT_SIZE__ == 64) // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
int foo(void); | ||
#endif | ||
|
||
__attribute__((device)) int device_var = __AMDGCN_WAVEFRONT_SIZE__; // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
|
||
__attribute__((device)) | ||
void device_fun() { | ||
use(__AMDGCN_WAVEFRONT_SIZE, "device function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE' has been marked as deprecated}} | ||
use(__AMDGCN_WAVEFRONT_SIZE__, "device function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
use(WRAPPED, "device function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
use(DOUBLE_WRAPPED, "device function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "device function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
use(GlobalConst, "device function"); | ||
use(GlobalConstExpr, "device function"); | ||
} | ||
|
||
__attribute__((global)) | ||
void global_fun() { | ||
// no warnings expected | ||
use(__AMDGCN_WAVEFRONT_SIZE, "global function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE' has been marked as deprecated}} | ||
use(__AMDGCN_WAVEFRONT_SIZE__, "global function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
use(WRAPPED, "global function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
use(DOUBLE_WRAPPED, "global function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "global function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
} | ||
|
||
int host_var = __AMDGCN_WAVEFRONT_SIZE__; // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
int host_var_alt = __AMDGCN_WAVEFRONT_SIZE; // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE' has been marked as deprecated}} | ||
int host_var_wrapped = WRAPPED; // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
int host_var_double_wrapped = DOUBLE_WRAPPED; // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
|
||
__attribute__((host)) | ||
void host_fun() { | ||
use(__AMDGCN_WAVEFRONT_SIZE, "host function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE' has been marked as deprecated}} | ||
use(__AMDGCN_WAVEFRONT_SIZE__, "host function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
use(WRAPPED, "host function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
use(DOUBLE_WRAPPED, "host function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "host function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
use(GlobalConst, "host function"); | ||
use(GlobalConstExpr, "host function"); | ||
} | ||
|
||
__attribute((host, device)) | ||
void host_device_fun() { | ||
use(__AMDGCN_WAVEFRONT_SIZE__, "host device function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
use(WRAPPED, "host device function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
use(DOUBLE_WRAPPED, "host device function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "host device function"); // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
} | ||
|
||
template <unsigned int OuterWarpSize = __AMDGCN_WAVEFRONT_SIZE__> // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
class FunSelector { | ||
public: | ||
template<unsigned int FunWarpSize = OuterWarpSize> | ||
__attribute__((device)) | ||
auto fun(void) | ||
-> typename std::enable_if<(FunWarpSize <= __AMDGCN_WAVEFRONT_SIZE__), void>::type // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
{ | ||
use(1, "yay!"); | ||
} | ||
|
||
template<unsigned int FunWarpSize = OuterWarpSize> | ||
__attribute__((device)) | ||
auto fun(void) | ||
-> typename std::enable_if<(FunWarpSize > __AMDGCN_WAVEFRONT_SIZE__), void>::type // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
{ | ||
use(0, "nay!"); | ||
} | ||
}; | ||
|
||
__attribute__((device)) | ||
void device_fun_selector_user() { | ||
FunSelector<> f; | ||
f.fun<>(); | ||
f.fun<1>(); | ||
f.fun<1000>(); | ||
|
||
std::enable_if<(1 <= __AMDGCN_WAVEFRONT_SIZE__), int>::type x = 42; // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
} | ||
|
||
__attribute__((device)) std::enable_if<(1 <= __AMDGCN_WAVEFRONT_SIZE__), int>::type DeviceFunTemplateRet(void) { // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
return 42; | ||
} | ||
|
||
__attribute__((device)) int DeviceFunTemplateArg(std::enable_if<(1 <= __AMDGCN_WAVEFRONT_SIZE__), int>::type x) { // expected-warning {{macro '__AMDGCN_WAVEFRONT_SIZE__' has been marked as deprecated}} | ||
return x; | ||
} | ||
|
||
// expected-note@* 0+ {{macro marked 'deprecated' here}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is unhelpful to deprecate without a suggested replacement.
I also disagree with removing this for the device side compile. There still are uses for it, and there are still non-offloading languages
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are removing an antipattern, the replacement is “don't do this”? Not all things that exist are valid. We should absolutely remove this, at least for HIP, where it creates needless divergence from CUDA and is a bug farm as a consequence.
Orthogonally, what are the uses for it that you have in mind, and which non-offloading languages are you thinking of? The only scenario I can envision here is if one does an end-to-end compile (so not the weird 1-per-target thing we do with offload, and not playing games around single AST that gets forked at CodeGen time). In that case, yes, that might be useful, but delivering it as a macro is fundamentally a bad design IMHO, so even there we should not have this. Furthermore, at the moment, that sort of use is incredibly underspecified / governed by happens to work by accident rather than by design, so building some early technical debt seems problematic, at least to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This never should've been defined on the host side for sure. We do this in the OpenMP device runtime.
That could probably be replaced with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of lower level uses. i.e. not user facing code we would want in headers or people using directly. e.g. cases where implementing cross lane operations relies on the operations that only work on half the lanes, or don't work at all when executed in wave32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use we currently have in our project is to detect, via CMake's
try_compile
, whether we have Wave64 or Wave32 targets (or both) to later choose which data structure flavors to instantiate. With the macro, we can easily do#if
+#warning
inside a kernel, and then parse the build log.Without it, the only solution we found was to parse the list of targets, which feels more brittle.
__builtin_amdgcn_wavefrontsize()
is not a compile-time constant expression, so cannot be used for such checks. Am I missing something? What is the planned timeline for the removal of this macro?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that it's not that simple, since the user can easily change the wavefront size by compiling with
-mwavefrontsize64
. I ran into these types of issues myself while working on the RPC interface for the libc . There I pretty much just take the wavefront size as an argument to the indexing / allocation functions. Alternatively you use a template and do runtime dispatch and assume that optimizations will DCE the other code.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you are highlighting a problem with parsing the target list? That's a good point.
Ah, the new meaning of "running DOOM on GPUs" 😄 👍
That's a nice solution and we are moving into that direction (with AMD's help). Unfortunately, we already template on a lot of things, and doubling the compile time in the most common case (compiling for a single device architecture) would be quite noticeable. So, having a compile-time thing to only instantiate the data structures / functions for the wavefront size we need would still be desirable.
Not saying that the whole SPIR-V / JIT thing is bad (having portable binaries would be great, especially the ones using standardized IR), but it's not all upsides.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think these macros are fine for the case where the user manually passed an
-mcpu=
argument to the compilation job, and we're not targeting SPIR-V. As far as I know, the main motivation is because for HIP SPIR-V it's an incorrect value since it will JIT compile, but if we're not doing that then it's fine. So it's mostly for compatibility between the two. You could probably make CMake define a custom macro depending on whatrocminfo
states if nothing else.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect, the change was not motivated by anything SPIR-V related, the discussion predates it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The big problem previously was that it was defined on the host for some reason.