[libc][math][c23] Enable C23 _Float16 math functions on GPUs #99248

overmighty · 2024-07-16T22:06:37Z

No description provided.

llvmbot · 2024-07-16T22:07:10Z

@llvm/pr-subscribers-libc

Author: OverMighty (overmighty)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/99248.diff

2 Files Affected:

(modified) libc/cmake/modules/CheckCompilerFeatures.cmake (+26-7)
(modified) libc/config/gpu/entrypoints.txt (+71)

diff --git a/libc/cmake/modules/CheckCompilerFeatures.cmake b/libc/cmake/modules/CheckCompilerFeatures.cmake
index a6d793d495c45..361c1e710b187 100644
--- a/libc/cmake/modules/CheckCompilerFeatures.cmake
+++ b/libc/cmake/modules/CheckCompilerFeatures.cmake
@@ -15,6 +15,12 @@ set(
 # Making sure ALL_COMPILER_FEATURES is sorted.
 list(SORT ALL_COMPILER_FEATURES)
 
+# Compiler features that are unavailable on GPU targets with the in-tree Clang.
+set(
+  CPU_ONLY_COMPILER_FEATURES
+    "float128"
+)
+
 # Function to check whether the compiler supports the provided set of features.
 # Usage:
 # compiler_supports(
@@ -65,13 +71,26 @@ foreach(feature IN LISTS ALL_COMPILER_FEATURES)
     set(CMAKE_TRY_COMPILE_TARGET_TYPE EXECUTABLE)
   endif()
 
-  try_compile(
-    has_feature
-    ${CMAKE_CURRENT_BINARY_DIR}/compiler_features
-    SOURCES ${LIBC_SOURCE_DIR}/cmake/modules/compiler_features/check_${feature}.cpp
-    COMPILE_DEFINITIONS -I${LIBC_SOURCE_DIR} ${compile_options}
-    LINK_OPTIONS ${link_options}
-  )
+  if(LIBC_TARGET_OS_IS_GPU)
+    # CUDA shouldn't be required to build the libc, only to test it, so we can't
+    # try to build CUDA binaries here. Since GPU builds are always compiled with
+    # the in-tree Clang, we just hardcode which compiler features are available
+    # when targeting GPUs.
+    if(feature IN_LIST CPU_ONLY_COMPILER_FEATURES)
+      set(has_feature FALSE)
+    else()
+      set(has_feature TRUE)
+    endif()
+  else()
+    try_compile(
+      has_feature
+      ${CMAKE_CURRENT_BINARY_DIR}/compiler_features
+      SOURCES ${LIBC_SOURCE_DIR}/cmake/modules/compiler_features/check_${feature}.cpp
+      COMPILE_DEFINITIONS -I${LIBC_SOURCE_DIR} ${compile_options}
+      LINK_OPTIONS ${link_options}
+    )
+  endif()
+
   if(has_feature)
     list(APPEND AVAILABLE_COMPILER_FEATURES ${feature})
     if(${feature} STREQUAL "float16")
diff --git a/libc/config/gpu/entrypoints.txt b/libc/config/gpu/entrypoints.txt
index 63228216c85ec..20c97fbbbadc5 100644
--- a/libc/config/gpu/entrypoints.txt
+++ b/libc/config/gpu/entrypoints.txt
@@ -343,6 +343,77 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.truncf
 )
 
+if(LIBC_TYPES_HAS_FLOAT16)
+  list(APPEND TARGET_LIBM_ENTRYPOINTS
+    # math.h C23 _Float16 entrypoints
+    libc.src.math.canonicalizef16
+    libc.src.math.ceilf16
+    libc.src.math.copysignf16
+    libc.src.math.f16add
+    libc.src.math.f16addf
+    libc.src.math.f16addl
+    libc.src.math.f16div
+    libc.src.math.f16divf
+    libc.src.math.f16divl
+    libc.src.math.f16fma
+    libc.src.math.f16fmaf
+    libc.src.math.f16fmal
+    libc.src.math.f16sqrt
+    libc.src.math.f16sqrtf
+    libc.src.math.f16sqrtl
+    libc.src.math.f16sub
+    libc.src.math.f16subf
+    libc.src.math.f16subl
+    libc.src.math.fabsf16
+    libc.src.math.fdimf16
+    libc.src.math.floorf16
+    libc.src.math.fmaxf16
+    libc.src.math.fmaximum_mag_numf16
+    libc.src.math.fmaximum_magf16
+    libc.src.math.fmaximum_numf16
+    libc.src.math.fmaximumf16
+    libc.src.math.fminf16
+    libc.src.math.fminimum_mag_numf16
+    libc.src.math.fminimum_magf16
+    libc.src.math.fminimum_numf16
+    libc.src.math.fminimumf16
+    libc.src.math.fmodf16
+    libc.src.math.frexpf16
+    libc.src.math.fromfpf16
+    libc.src.math.fromfpxf16
+    libc.src.math.getpayloadf16
+    libc.src.math.ilogbf16
+    libc.src.math.ldexpf16
+    libc.src.math.llogbf16
+    libc.src.math.llrintf16
+    libc.src.math.llroundf16
+    libc.src.math.logbf16
+    libc.src.math.lrintf16
+    libc.src.math.lroundf16
+    libc.src.math.modff16
+    libc.src.math.nanf16
+    libc.src.math.nearbyintf16
+    libc.src.math.nextafterf16
+    libc.src.math.nextdownf16
+    libc.src.math.nexttowardf16
+    libc.src.math.nextupf16
+    libc.src.math.remainderf16
+    libc.src.math.remquof16
+    libc.src.math.rintf16
+    libc.src.math.roundevenf16
+    libc.src.math.roundf16
+    libc.src.math.scalblnf16
+    libc.src.math.scalbnf16
+    libc.src.math.setpayloadf16
+    libc.src.math.setpayloadsigf16
+    libc.src.math.totalorderf16
+    libc.src.math.totalordermagf16
+    libc.src.math.truncf16
+    libc.src.math.ufromfpf16
+    libc.src.math.ufromfpxf16
+  )
+endif()
+
 set(TARGET_LLVMLIBC_ENTRYPOINTS
   ${TARGET_LIBC_ENTRYPOINTS}
   ${TARGET_LIBM_ENTRYPOINTS}

overmighty · 2024-07-16T22:11:58Z

@jhuber6 I built the tests for AMDGPU, but only ran them on NVIDIA. Could you please try to run them on AMD for me, or should I try to run them on my other laptop?

jhuber6

Works on both targets, thanks. I'm assuming these implementations will go to the builtins where appropriate?

overmighty · 2024-07-16T22:55:02Z

Oh, they actually don't use builtins. I forgot to add an OR LIBC_TARGET_OS_IS_GPU in https://github.com/llvm/llvm-project/blob/main/libc/cmake/modules/LLVMLibCFlagRules.cmake#L280-L281 and replace the && defined(LIBC_TARGET_ARCH_IS_AARCH64) in libc/src/math/generic/*.

overmighty · 2024-07-16T23:13:10Z

Should we try to run some benchmarks before enabling builtins on GPUs?

jhuber6 · 2024-07-16T23:15:09Z

Should we try to run some benchmarks before enabling builtins on GPUs?

If the builitin is supported it's pretty much always going to be faster than the bit-twidding generic implementation. Maybe there's some exceptions for things like sin or sqrt which might have precision issues. @arsenm the f16 builtins are generally supported in hardware, right?

arsenm · 2024-07-17T07:32:18Z

Should we try to run some benchmarks before enabling builtins on GPUs?

If the builitin is supported it's pretty much always going to be faster than the bit-twidding generic implementation. Maybe there's some exceptions for things like sin or sqrt which might have precision issues. @arsenm the f16 builtins are generally supported in hardware, right?

sqrt is always correct. The intrinsics should all be lowered correctly. If not, it's a bug that should be addressed in the backend and not worked around in a library. I'm not sure if the sin/cos cases pass conformance, OpenCL only just finally added half tests to the conformance suite so we can check. The f16 cases are easier because it's a lot easier to upcast to float and back

lntue · 2024-07-18T23:53:28Z

libc/src/math/generic/CMakeLists.txt

@@ -111,7 +111,7 @@ add_entrypoint_object(
  DEPENDS
    libc.src.__support.macros.properties.types
    libc.src.__support.FPUtil.nearest_integer_operations
-    libc.src.__support.macros.properties.architectures
+    libc.src.__support.macros.properties.cpu_features


Do you mind running the bazel builds to see if these extra dependencies need to be added to bazel overlay?

Bazel build seems to work:

~/projects/llvm-project/utils/bazel libc-math-float16-gpu ➜ bazel build --config=generic_clang @llvm-project//libc/... INFO: Analyzed 897 targets (68 packages loaded, 6253 targets configured). INFO: Found 897 targets... INFO: Elapsed time: 166.512s, Critical Path: 151.57s INFO: 3876 processes: 2132 internal, 1744 linux-sandbox. INFO: Build completed successfully, 3876 total actions ~/projects/llvm-project/utils/bazel libc-math-float16-gpu ➜ bazel test --config=generic_clang @llvm-project//libc/... INFO: Analyzed 897 targets (0 packages loaded, 0 targets configured). INFO: Found 659 targets and 238 test targets... INFO: Elapsed time: 5.366s, Critical Path: 4.26s INFO: 239 processes: 476 linux-sandbox. INFO: Build completed successfully, 239 total actions @llvm-project//libc/test/src/__support:arg_list_test PASSED in 0.0s ... Executed 238 out of 238 tests: 238 tests pass. There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are. ~/projects/llvm-project/utils/bazel libc-math-float16-gpu ➜

Why are only some math functions listed in https://github.com/llvm/llvm-project/blob/main/utils/bazel/llvm-project-overlay/libc/BUILD.bazel?

B/c I've been lazy / forget to update it, especially since there are quite some work need to be done to add smoke tests to the lists. I planned to update it all at once at some point in late July, early August.

lntue · 2024-07-19T14:50:32Z

The PR looks good to me. Let's wait for @jhuber6 to do the confirmation / approval.

overmighty · 2024-07-19T15:31:10Z

#98972 was merged yesterday so maybe I should enable these functions in this PR too while at it, unless you would prefer another PR as this one is already approved.

Enable ROUND_OPT flag and builtin usage for _Float16 functions on GPUs.

Disable narrowing functions that take `long double` arguments.

Enable f16mul and f16mulf.

overmighty · 2024-07-19T19:51:13Z

Rebased to enable f16mul and f16mulf.

overmighty · 2024-07-25T19:06:50Z

@jhuber6 Is this still okay? I just disabled the functions that use long double and enabled f16mul and f16mulf.

jhuber6 · 2024-07-25T19:08:14Z

@jhuber6 Is this still okay? I just disabled the functions that use long double and enabled f16mul and f16mulf.

Yes, so long as there's no long double stuff it should still work. If not we can turn it off, but it worked when I last tested it.

overmighty requested review from jhuber6 and lntue July 16, 2024 22:06

llvmbot added the libc label Jul 16, 2024

jhuber6 approved these changes Jul 16, 2024

View reviewed changes

overmighty requested a review from jhuber6 July 17, 2024 09:57

lntue reviewed Jul 18, 2024

View reviewed changes

jhuber6 approved these changes Jul 19, 2024

View reviewed changes

lntue approved these changes Jul 19, 2024

View reviewed changes

overmighty added 4 commits July 19, 2024 21:14

[libc][math][c23] Enable C23 _Float16 math functions on GPUs

f55e449

fixup! [libc][math][c23] Enable C23 _Float16 math functions on GPUs

7cc8e19

Enable ROUND_OPT flag and builtin usage for _Float16 functions on GPUs.

fixup! [libc][math][c23] Enable C23 _Float16 math functions on GPUs

eb17640

Disable narrowing functions that take `long double` arguments.

fixup! [libc][math][c23] Enable C23 _Float16 math functions on GPUs

3c7f767

Enable f16mul and f16mulf.

overmighty force-pushed the libc-math-float16-gpu branch from a0de91e to 3c7f767 Compare July 19, 2024 19:50

overmighty requested a review from jhuber6 July 22, 2024 11:37

jhuber6 approved these changes Jul 25, 2024

View reviewed changes

overmighty merged commit 81ce796 into llvm:main Jul 25, 2024
6 checks passed

[libc][math][c23] Enable C23 _Float16 math functions on GPUs #99248

[libc][math][c23] Enable C23 _Float16 math functions on GPUs #99248

Uh oh!

Conversation

overmighty commented Jul 16, 2024

Uh oh!

llvmbot commented Jul 16, 2024

Uh oh!

overmighty commented Jul 16, 2024

Uh oh!

jhuber6 left a comment

Choose a reason for hiding this comment

Uh oh!

overmighty commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

overmighty commented Jul 16, 2024

Uh oh!

jhuber6 commented Jul 16, 2024

Uh oh!

arsenm commented Jul 17, 2024

Uh oh!

lntue Jul 18, 2024

Choose a reason for hiding this comment

Uh oh!

overmighty Jul 19, 2024

Choose a reason for hiding this comment

Uh oh!

lntue Jul 19, 2024

Choose a reason for hiding this comment

Uh oh!

lntue commented Jul 19, 2024

Uh oh!

overmighty commented Jul 19, 2024

Uh oh!

overmighty commented Jul 19, 2024

Uh oh!

overmighty commented Jul 25, 2024

Uh oh!

jhuber6 commented Jul 25, 2024

Uh oh!

Uh oh!

Uh oh!

overmighty commented Jul 16, 2024 •

edited

Loading