-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[libc][math][c23] Enable C23 _Float16 math functions on GPUs #99248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-libc Author: OverMighty (overmighty) ChangesFull diff: https://github.com/llvm/llvm-project/pull/99248.diff 2 Files Affected:
diff --git a/libc/cmake/modules/CheckCompilerFeatures.cmake b/libc/cmake/modules/CheckCompilerFeatures.cmake
index a6d793d495c45..361c1e710b187 100644
--- a/libc/cmake/modules/CheckCompilerFeatures.cmake
+++ b/libc/cmake/modules/CheckCompilerFeatures.cmake
@@ -15,6 +15,12 @@ set(
# Making sure ALL_COMPILER_FEATURES is sorted.
list(SORT ALL_COMPILER_FEATURES)
+# Compiler features that are unavailable on GPU targets with the in-tree Clang.
+set(
+ CPU_ONLY_COMPILER_FEATURES
+ "float128"
+)
+
# Function to check whether the compiler supports the provided set of features.
# Usage:
# compiler_supports(
@@ -65,13 +71,26 @@ foreach(feature IN LISTS ALL_COMPILER_FEATURES)
set(CMAKE_TRY_COMPILE_TARGET_TYPE EXECUTABLE)
endif()
- try_compile(
- has_feature
- ${CMAKE_CURRENT_BINARY_DIR}/compiler_features
- SOURCES ${LIBC_SOURCE_DIR}/cmake/modules/compiler_features/check_${feature}.cpp
- COMPILE_DEFINITIONS -I${LIBC_SOURCE_DIR} ${compile_options}
- LINK_OPTIONS ${link_options}
- )
+ if(LIBC_TARGET_OS_IS_GPU)
+ # CUDA shouldn't be required to build the libc, only to test it, so we can't
+ # try to build CUDA binaries here. Since GPU builds are always compiled with
+ # the in-tree Clang, we just hardcode which compiler features are available
+ # when targeting GPUs.
+ if(feature IN_LIST CPU_ONLY_COMPILER_FEATURES)
+ set(has_feature FALSE)
+ else()
+ set(has_feature TRUE)
+ endif()
+ else()
+ try_compile(
+ has_feature
+ ${CMAKE_CURRENT_BINARY_DIR}/compiler_features
+ SOURCES ${LIBC_SOURCE_DIR}/cmake/modules/compiler_features/check_${feature}.cpp
+ COMPILE_DEFINITIONS -I${LIBC_SOURCE_DIR} ${compile_options}
+ LINK_OPTIONS ${link_options}
+ )
+ endif()
+
if(has_feature)
list(APPEND AVAILABLE_COMPILER_FEATURES ${feature})
if(${feature} STREQUAL "float16")
diff --git a/libc/config/gpu/entrypoints.txt b/libc/config/gpu/entrypoints.txt
index 63228216c85ec..20c97fbbbadc5 100644
--- a/libc/config/gpu/entrypoints.txt
+++ b/libc/config/gpu/entrypoints.txt
@@ -343,6 +343,77 @@ set(TARGET_LIBM_ENTRYPOINTS
libc.src.math.truncf
)
+if(LIBC_TYPES_HAS_FLOAT16)
+ list(APPEND TARGET_LIBM_ENTRYPOINTS
+ # math.h C23 _Float16 entrypoints
+ libc.src.math.canonicalizef16
+ libc.src.math.ceilf16
+ libc.src.math.copysignf16
+ libc.src.math.f16add
+ libc.src.math.f16addf
+ libc.src.math.f16addl
+ libc.src.math.f16div
+ libc.src.math.f16divf
+ libc.src.math.f16divl
+ libc.src.math.f16fma
+ libc.src.math.f16fmaf
+ libc.src.math.f16fmal
+ libc.src.math.f16sqrt
+ libc.src.math.f16sqrtf
+ libc.src.math.f16sqrtl
+ libc.src.math.f16sub
+ libc.src.math.f16subf
+ libc.src.math.f16subl
+ libc.src.math.fabsf16
+ libc.src.math.fdimf16
+ libc.src.math.floorf16
+ libc.src.math.fmaxf16
+ libc.src.math.fmaximum_mag_numf16
+ libc.src.math.fmaximum_magf16
+ libc.src.math.fmaximum_numf16
+ libc.src.math.fmaximumf16
+ libc.src.math.fminf16
+ libc.src.math.fminimum_mag_numf16
+ libc.src.math.fminimum_magf16
+ libc.src.math.fminimum_numf16
+ libc.src.math.fminimumf16
+ libc.src.math.fmodf16
+ libc.src.math.frexpf16
+ libc.src.math.fromfpf16
+ libc.src.math.fromfpxf16
+ libc.src.math.getpayloadf16
+ libc.src.math.ilogbf16
+ libc.src.math.ldexpf16
+ libc.src.math.llogbf16
+ libc.src.math.llrintf16
+ libc.src.math.llroundf16
+ libc.src.math.logbf16
+ libc.src.math.lrintf16
+ libc.src.math.lroundf16
+ libc.src.math.modff16
+ libc.src.math.nanf16
+ libc.src.math.nearbyintf16
+ libc.src.math.nextafterf16
+ libc.src.math.nextdownf16
+ libc.src.math.nexttowardf16
+ libc.src.math.nextupf16
+ libc.src.math.remainderf16
+ libc.src.math.remquof16
+ libc.src.math.rintf16
+ libc.src.math.roundevenf16
+ libc.src.math.roundf16
+ libc.src.math.scalblnf16
+ libc.src.math.scalbnf16
+ libc.src.math.setpayloadf16
+ libc.src.math.setpayloadsigf16
+ libc.src.math.totalorderf16
+ libc.src.math.totalordermagf16
+ libc.src.math.truncf16
+ libc.src.math.ufromfpf16
+ libc.src.math.ufromfpxf16
+ )
+endif()
+
set(TARGET_LLVMLIBC_ENTRYPOINTS
${TARGET_LIBC_ENTRYPOINTS}
${TARGET_LIBM_ENTRYPOINTS}
|
@jhuber6 I built the tests for AMDGPU, but only ran them on NVIDIA. Could you please try to run them on AMD for me, or should I try to run them on my other laptop? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works on both targets, thanks. I'm assuming these implementations will go to the builtins where appropriate?
Oh, they actually don't use builtins. I forgot to add an |
Should we try to run some benchmarks before enabling builtins on GPUs? |
If the builitin is supported it's pretty much always going to be faster than the bit-twidding generic implementation. Maybe there's some exceptions for things like |
sqrt is always correct. The intrinsics should all be lowered correctly. If not, it's a bug that should be addressed in the backend and not worked around in a library. I'm not sure if the sin/cos cases pass conformance, OpenCL only just finally added half tests to the conformance suite so we can check. The f16 cases are easier because it's a lot easier to upcast to float and back |
@@ -111,7 +111,7 @@ add_entrypoint_object( | |||
DEPENDS | |||
libc.src.__support.macros.properties.types | |||
libc.src.__support.FPUtil.nearest_integer_operations | |||
libc.src.__support.macros.properties.architectures | |||
libc.src.__support.macros.properties.cpu_features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind running the bazel builds to see if these extra dependencies need to be added to bazel overlay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bazel build seems to work:
~/projects/llvm-project/utils/bazel libc-math-float16-gpu ➜ bazel build --config=generic_clang @llvm-project//libc/...
INFO: Analyzed 897 targets (68 packages loaded, 6253 targets configured).
INFO: Found 897 targets...
INFO: Elapsed time: 166.512s, Critical Path: 151.57s
INFO: 3876 processes: 2132 internal, 1744 linux-sandbox.
INFO: Build completed successfully, 3876 total actions
~/projects/llvm-project/utils/bazel libc-math-float16-gpu ➜ bazel test --config=generic_clang @llvm-project//libc/...
INFO: Analyzed 897 targets (0 packages loaded, 0 targets configured).
INFO: Found 659 targets and 238 test targets...
INFO: Elapsed time: 5.366s, Critical Path: 4.26s
INFO: 239 processes: 476 linux-sandbox.
INFO: Build completed successfully, 239 total actions
@llvm-project//libc/test/src/__support:arg_list_test PASSED in 0.0s
...
Executed 238 out of 238 tests: 238 tests pass.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
~/projects/llvm-project/utils/bazel libc-math-float16-gpu ➜
Why are only some math functions listed in https://github.com/llvm/llvm-project/blob/main/utils/bazel/llvm-project-overlay/libc/BUILD.bazel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
B/c I've been lazy / forget to update it, especially since there are quite some work need to be done to add smoke tests to the lists. I planned to update it all at once at some point in late July, early August.
The PR looks good to me. Let's wait for @jhuber6 to do the confirmation / approval. |
#98972 was merged yesterday so maybe I should enable these functions in this PR too while at it, unless you would prefer another PR as this one is already approved. |
Enable ROUND_OPT flag and builtin usage for _Float16 functions on GPUs.
Disable narrowing functions that take `long double` arguments.
Enable f16mul and f16mulf.
a0de91e
to
3c7f767
Compare
Rebased to enable |
@jhuber6 Is this still okay? I just disabled the functions that use |
Yes, so long as there's no long double stuff it should still work. If not we can turn it off, but it worked when I last tested it. |
No description provided.