Skip to content

[libc][math][c23] Enable C23 _Float16 math functions on GPUs #99248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 25, 2024

Conversation

overmighty
Copy link
Member

No description provided.

@overmighty overmighty requested review from jhuber6 and lntue July 16, 2024 22:06
@llvmbot llvmbot added the libc label Jul 16, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 16, 2024

@llvm/pr-subscribers-libc

Author: OverMighty (overmighty)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/99248.diff

2 Files Affected:

  • (modified) libc/cmake/modules/CheckCompilerFeatures.cmake (+26-7)
  • (modified) libc/config/gpu/entrypoints.txt (+71)
diff --git a/libc/cmake/modules/CheckCompilerFeatures.cmake b/libc/cmake/modules/CheckCompilerFeatures.cmake
index a6d793d495c45..361c1e710b187 100644
--- a/libc/cmake/modules/CheckCompilerFeatures.cmake
+++ b/libc/cmake/modules/CheckCompilerFeatures.cmake
@@ -15,6 +15,12 @@ set(
 # Making sure ALL_COMPILER_FEATURES is sorted.
 list(SORT ALL_COMPILER_FEATURES)
 
+# Compiler features that are unavailable on GPU targets with the in-tree Clang.
+set(
+  CPU_ONLY_COMPILER_FEATURES
+    "float128"
+)
+
 # Function to check whether the compiler supports the provided set of features.
 # Usage:
 # compiler_supports(
@@ -65,13 +71,26 @@ foreach(feature IN LISTS ALL_COMPILER_FEATURES)
     set(CMAKE_TRY_COMPILE_TARGET_TYPE EXECUTABLE)
   endif()
 
-  try_compile(
-    has_feature
-    ${CMAKE_CURRENT_BINARY_DIR}/compiler_features
-    SOURCES ${LIBC_SOURCE_DIR}/cmake/modules/compiler_features/check_${feature}.cpp
-    COMPILE_DEFINITIONS -I${LIBC_SOURCE_DIR} ${compile_options}
-    LINK_OPTIONS ${link_options}
-  )
+  if(LIBC_TARGET_OS_IS_GPU)
+    # CUDA shouldn't be required to build the libc, only to test it, so we can't
+    # try to build CUDA binaries here. Since GPU builds are always compiled with
+    # the in-tree Clang, we just hardcode which compiler features are available
+    # when targeting GPUs.
+    if(feature IN_LIST CPU_ONLY_COMPILER_FEATURES)
+      set(has_feature FALSE)
+    else()
+      set(has_feature TRUE)
+    endif()
+  else()
+    try_compile(
+      has_feature
+      ${CMAKE_CURRENT_BINARY_DIR}/compiler_features
+      SOURCES ${LIBC_SOURCE_DIR}/cmake/modules/compiler_features/check_${feature}.cpp
+      COMPILE_DEFINITIONS -I${LIBC_SOURCE_DIR} ${compile_options}
+      LINK_OPTIONS ${link_options}
+    )
+  endif()
+
   if(has_feature)
     list(APPEND AVAILABLE_COMPILER_FEATURES ${feature})
     if(${feature} STREQUAL "float16")
diff --git a/libc/config/gpu/entrypoints.txt b/libc/config/gpu/entrypoints.txt
index 63228216c85ec..20c97fbbbadc5 100644
--- a/libc/config/gpu/entrypoints.txt
+++ b/libc/config/gpu/entrypoints.txt
@@ -343,6 +343,77 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.truncf
 )
 
+if(LIBC_TYPES_HAS_FLOAT16)
+  list(APPEND TARGET_LIBM_ENTRYPOINTS
+    # math.h C23 _Float16 entrypoints
+    libc.src.math.canonicalizef16
+    libc.src.math.ceilf16
+    libc.src.math.copysignf16
+    libc.src.math.f16add
+    libc.src.math.f16addf
+    libc.src.math.f16addl
+    libc.src.math.f16div
+    libc.src.math.f16divf
+    libc.src.math.f16divl
+    libc.src.math.f16fma
+    libc.src.math.f16fmaf
+    libc.src.math.f16fmal
+    libc.src.math.f16sqrt
+    libc.src.math.f16sqrtf
+    libc.src.math.f16sqrtl
+    libc.src.math.f16sub
+    libc.src.math.f16subf
+    libc.src.math.f16subl
+    libc.src.math.fabsf16
+    libc.src.math.fdimf16
+    libc.src.math.floorf16
+    libc.src.math.fmaxf16
+    libc.src.math.fmaximum_mag_numf16
+    libc.src.math.fmaximum_magf16
+    libc.src.math.fmaximum_numf16
+    libc.src.math.fmaximumf16
+    libc.src.math.fminf16
+    libc.src.math.fminimum_mag_numf16
+    libc.src.math.fminimum_magf16
+    libc.src.math.fminimum_numf16
+    libc.src.math.fminimumf16
+    libc.src.math.fmodf16
+    libc.src.math.frexpf16
+    libc.src.math.fromfpf16
+    libc.src.math.fromfpxf16
+    libc.src.math.getpayloadf16
+    libc.src.math.ilogbf16
+    libc.src.math.ldexpf16
+    libc.src.math.llogbf16
+    libc.src.math.llrintf16
+    libc.src.math.llroundf16
+    libc.src.math.logbf16
+    libc.src.math.lrintf16
+    libc.src.math.lroundf16
+    libc.src.math.modff16
+    libc.src.math.nanf16
+    libc.src.math.nearbyintf16
+    libc.src.math.nextafterf16
+    libc.src.math.nextdownf16
+    libc.src.math.nexttowardf16
+    libc.src.math.nextupf16
+    libc.src.math.remainderf16
+    libc.src.math.remquof16
+    libc.src.math.rintf16
+    libc.src.math.roundevenf16
+    libc.src.math.roundf16
+    libc.src.math.scalblnf16
+    libc.src.math.scalbnf16
+    libc.src.math.setpayloadf16
+    libc.src.math.setpayloadsigf16
+    libc.src.math.totalorderf16
+    libc.src.math.totalordermagf16
+    libc.src.math.truncf16
+    libc.src.math.ufromfpf16
+    libc.src.math.ufromfpxf16
+  )
+endif()
+
 set(TARGET_LLVMLIBC_ENTRYPOINTS
   ${TARGET_LIBC_ENTRYPOINTS}
   ${TARGET_LIBM_ENTRYPOINTS}

@overmighty
Copy link
Member Author

@jhuber6 I built the tests for AMDGPU, but only ran them on NVIDIA. Could you please try to run them on AMD for me, or should I try to run them on my other laptop?

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works on both targets, thanks. I'm assuming these implementations will go to the builtins where appropriate?

@overmighty
Copy link
Member Author

overmighty commented Jul 16, 2024

Oh, they actually don't use builtins. I forgot to add an OR LIBC_TARGET_OS_IS_GPU in https://github.com/llvm/llvm-project/blob/main/libc/cmake/modules/LLVMLibCFlagRules.cmake#L280-L281 and replace the && defined(LIBC_TARGET_ARCH_IS_AARCH64) in libc/src/math/generic/*.

@overmighty
Copy link
Member Author

Should we try to run some benchmarks before enabling builtins on GPUs?

@jhuber6
Copy link
Contributor

jhuber6 commented Jul 16, 2024

Should we try to run some benchmarks before enabling builtins on GPUs?

If the builitin is supported it's pretty much always going to be faster than the bit-twidding generic implementation. Maybe there's some exceptions for things like sin or sqrt which might have precision issues. @arsenm the f16 builtins are generally supported in hardware, right?

@arsenm
Copy link
Contributor

arsenm commented Jul 17, 2024

Should we try to run some benchmarks before enabling builtins on GPUs?

If the builitin is supported it's pretty much always going to be faster than the bit-twidding generic implementation. Maybe there's some exceptions for things like sin or sqrt which might have precision issues. @arsenm the f16 builtins are generally supported in hardware, right?

sqrt is always correct. The intrinsics should all be lowered correctly. If not, it's a bug that should be addressed in the backend and not worked around in a library. I'm not sure if the sin/cos cases pass conformance, OpenCL only just finally added half tests to the conformance suite so we can check. The f16 cases are easier because it's a lot easier to upcast to float and back

@overmighty overmighty requested a review from jhuber6 July 17, 2024 09:57
@@ -111,7 +111,7 @@ add_entrypoint_object(
DEPENDS
libc.src.__support.macros.properties.types
libc.src.__support.FPUtil.nearest_integer_operations
libc.src.__support.macros.properties.architectures
libc.src.__support.macros.properties.cpu_features
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind running the bazel builds to see if these extra dependencies need to be added to bazel overlay?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bazel build seems to work:

~/projects/llvm-project/utils/bazel libc-math-float16-gpu ➜ bazel build --config=generic_clang @llvm-project//libc/...
INFO: Analyzed 897 targets (68 packages loaded, 6253 targets configured).
INFO: Found 897 targets...
INFO: Elapsed time: 166.512s, Critical Path: 151.57s
INFO: 3876 processes: 2132 internal, 1744 linux-sandbox.
INFO: Build completed successfully, 3876 total actions
~/projects/llvm-project/utils/bazel libc-math-float16-gpu ➜ bazel test --config=generic_clang @llvm-project//libc/...
INFO: Analyzed 897 targets (0 packages loaded, 0 targets configured).
INFO: Found 659 targets and 238 test targets...
INFO: Elapsed time: 5.366s, Critical Path: 4.26s
INFO: 239 processes: 476 linux-sandbox.
INFO: Build completed successfully, 239 total actions
@llvm-project//libc/test/src/__support:arg_list_test                     PASSED in 0.0s
...

Executed 238 out of 238 tests: 238 tests pass.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
~/projects/llvm-project/utils/bazel libc-math-float16-gpu ➜

Why are only some math functions listed in https://github.com/llvm/llvm-project/blob/main/utils/bazel/llvm-project-overlay/libc/BUILD.bazel?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

B/c I've been lazy / forget to update it, especially since there are quite some work need to be done to add smoke tests to the lists. I planned to update it all at once at some point in late July, early August.

@lntue
Copy link
Contributor

lntue commented Jul 19, 2024

The PR looks good to me. Let's wait for @jhuber6 to do the confirmation / approval.

@overmighty
Copy link
Member Author

#98972 was merged yesterday so maybe I should enable these functions in this PR too while at it, unless you would prefer another PR as this one is already approved.

@overmighty overmighty force-pushed the libc-math-float16-gpu branch from a0de91e to 3c7f767 Compare July 19, 2024 19:50
@overmighty
Copy link
Member Author

Rebased to enable f16mul and f16mulf.

@overmighty overmighty requested a review from jhuber6 July 22, 2024 11:37
@overmighty
Copy link
Member Author

@jhuber6 Is this still okay? I just disabled the functions that use long double and enabled f16mul and f16mulf.

@jhuber6
Copy link
Contributor

jhuber6 commented Jul 25, 2024

@jhuber6 Is this still okay? I just disabled the functions that use long double and enabled f16mul and f16mulf.

Yes, so long as there's no long double stuff it should still work. If not we can turn it off, but it worked when I last tested it.

@overmighty overmighty merged commit 81ce796 into llvm:main Jul 25, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants