-
Notifications
You must be signed in to change notification settings - Fork 795
[Matrix][SYCL] Add bfloat16 support for joint_matrix #5566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -11,6 +11,7 @@ | |||
#include <CL/__spirv/spirv_ops.hpp> | |||
#include <CL/sycl/detail/defines_elementary.hpp> | |||
#include <CL/sycl/feature_test.hpp> | |||
#include <sycl/ext/intel/experimental/bfloat16.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buildfail will happen since
class [[sycl_detail::uses_aspects(ext_intel_bf16_conversion)]] bfloat16
and sycl_detail::uses_aspects is unsupported code in g++ which is used for intel/llvm's build
we are not able to let a end-to-end testcase pass since __spirv_ConvertBF16ToFINTEL is not supported in OCL CPU Backend |
@@ -737,6 +738,160 @@ class wi_element<uint16_t, NumRows, NumCols, Layout, Group> { | |||
} | |||
}; | |||
|
|||
template <size_t NumRows, size_t NumCols, matrix_layout Layout, typename Group> | |||
class wi_element<sycl::ext::intel::experimental::bfloat16, NumRows, NumCols, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: probably alias like
using bfloat16 = sycl::ext::intel::experimental::bfloat16
could improve readability of this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't use "using" in header file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't use "using" in header file
What do you mean? I guess you can use them locally if it is in a scope not visible from the end-user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ping? @intel/llvm-reviewers-runtime |
Do we have tests for this? |
I don't think the failed testcase has relationship with this PR. |
return *res; | ||
} | ||
|
||
unsigned short make_bf16(float x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need these functions?
There must be some equivalent DPC++/SPIRV ones, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in bfloat16 class, we don't have bf16<->float conversion in host device. so this make_bf16 is needed in line144
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in bfloat16 class, we don't have bf16<->float conversion in host device. so this make_bf16 is needed in line144
I'd suggest to finish #5954 and reuse host version of the conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good.
@YetAnotherCompilerEngineer who are you? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we merge the patch before #5954?
Sure, please merge them in an order you like the most :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
(void)lhs; \ | ||
(void)rhs; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead just remove the names from the op()
declaration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i didn't get your point. actually this code is for host. the param list of friend type operator op should be the same as line555 which is for device code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe what @keryell suggests is to change this to
friend type operator op( \
const wi_element<sycl::ext::oneapi::experimental::bfloat16, NumRows, \
NumCols, Layout, Group> &, \
const sycl::ext::oneapi::experimental::bfloat16 &) { \
throw runtime_error("joint matrix is not supported on host device.", \
PI_INVALID_DEVICE); \
}
which preserves the signature but doesn't have unused arguments as they are unnamed.
@@ -0,0 +1,192 @@ | |||
// RUN: %clangxx -fsycl -O2 %s -o %t.out | |||
#include <CL/sycl.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <CL/sycl.hpp> | |
#include <sycl/sycl.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is "include/sycl/CL/sycl.hpp", how could we change into "#include <sycl/sycl.hpp>"
#if (SYCL_EXT_ONEAPI_MATRIX == 2) | ||
#include <iostream> | ||
|
||
using namespace sycl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to avoid.
If so, why the sycl::
on the 2 next lines? :-)
using namespace sycl::ext::oneapi::experimental::matrix; | ||
using bfloat16 = sycl::ext::oneapi::experimental::bfloat16; | ||
|
||
#define TILE_SZ 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constexpr
everywhere instead of macros.
using namespace sycl::ext::oneapi::experimental::matrix; | ||
using bfloat16 = sycl::ext::oneapi::experimental::bfloat16; | ||
|
||
static constexpr size_t TILE_SZ = 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since your macro was not using any type at the first place, you could just use auto
instead of size_t
.
Because with the macro the numbers we interpreted as 32-bit int
while here you are forcing 64-bit, which might be somewhat less efficient in some case (but not in this test I guess).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
ping? @steffenlarsen @dkhaldi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@steffenlarsen could you help us merge it? |
Unexpectedly Passed Tests (1): This was since fixed with intel/llvm-test-suite#1008 |
i think i know what happen in https://github.com/intel/llvm/runs/6314339242?check_suite_focus=true. #6111 is to fix the unused variable issue. |
Previously we use unsigned short to represent bfloat16, now we give formal support for sycl/s bfloat16 in joint_matrix implementation.