Skip to content

[Matrix][SYCL] Add bfloat16 support for joint_matrix #5566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 5, 2022

Conversation

yubingex007-a11y
Copy link
Contributor

@yubingex007-a11y yubingex007-a11y commented Feb 14, 2022

Previously we use unsigned short to represent bfloat16, now we give formal support for sycl/s bfloat16 in joint_matrix implementation.

@@ -11,6 +11,7 @@
#include <CL/__spirv/spirv_ops.hpp>
#include <CL/sycl/detail/defines_elementary.hpp>
#include <CL/sycl/feature_test.hpp>
#include <sycl/ext/intel/experimental/bfloat16.hpp>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buildfail will happen since

class [[sycl_detail::uses_aspects(ext_intel_bf16_conversion)]] bfloat16

and sycl_detail::uses_aspects is unsupported code in g++ which is used for intel/llvm's build

@yubingex007-a11y
Copy link
Contributor Author

we are not able to let a end-to-end testcase pass since __spirv_ConvertBF16ToFINTEL is not supported in OCL CPU Backend

@dkhaldi dkhaldi requested a review from AlexeySotkin February 14, 2022 14:23
@@ -737,6 +738,160 @@ class wi_element<uint16_t, NumRows, NumCols, Layout, Group> {
}
};

template <size_t NumRows, size_t NumCols, matrix_layout Layout, typename Group>
class wi_element<sycl::ext::intel::experimental::bfloat16, NumRows, NumCols,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: probably alias like

using bfloat16 = sycl::ext::intel::experimental::bfloat16

could improve readability of this code.

Copy link
Contributor Author

@yubingex007-a11y yubingex007-a11y Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't use "using" in header file

Copy link
Contributor

@keryell keryell Apr 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't use "using" in header file

What do you mean? I guess you can use them locally if it is in a scope not visible from the end-user.

@yubingex007-a11y yubingex007-a11y changed the title [Matrix][SYCL] Add bfloat16 support for wi_slice [Matrix][SYCL] Add bfloat16 support for joint_matrix Apr 11, 2022
@yubingex007-a11y yubingex007-a11y marked this pull request as ready for review April 11, 2022 15:20
@yubingex007-a11y yubingex007-a11y requested a review from a team as a code owner April 11, 2022 15:20
dkhaldi
dkhaldi previously approved these changes Apr 11, 2022
Copy link
Contributor

@dkhaldi dkhaldi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yubingex007-a11y
Copy link
Contributor Author

ping? @intel/llvm-reviewers-runtime

@steffenlarsen
Copy link
Contributor

Do we have tests for this?

@yubingex007-a11y
Copy link
Contributor Author

I don't think the failed testcase has relationship with this PR.

return *res;
}

unsigned short make_bf16(float x) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need these functions?
There must be some equivalent DPC++/SPIRV ones, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in bfloat16 class, we don't have bf16<->float conversion in host device. so this make_bf16 is needed in line144

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in bfloat16 class, we don't have bf16<->float conversion in host device. so this make_bf16 is needed in line144

I'd suggest to finish #5954 and reuse host version of the conversion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good.
@YetAnotherCompilerEngineer who are you? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in bfloat16 class, we don't have bf16<->float conversion in host device. so this make_bf16 is needed in line144

I'd suggest to finish #5954 and reuse host version of the conversion.

Could we merge the patch before #5954? i think we can create another PR for testcase after #5954

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we merge the patch before #5954?

Sure, please merge them in an order you like the most :)

dkhaldi
dkhaldi previously approved these changes Apr 21, 2022
Copy link
Contributor

@dkhaldi dkhaldi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 583 to 584
(void)lhs; \
(void)rhs; \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead just remove the names from the op() declaration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't get your point. actually this code is for host. the param list of friend type operator op should be the same as line555 which is for device code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe what @keryell suggests is to change this to

  friend type operator op(                                                     \
      const wi_element<sycl::ext::oneapi::experimental::bfloat16, NumRows,     \
                       NumCols, Layout, Group> &,                              \
      const sycl::ext::oneapi::experimental::bfloat16 &) {                     \
    throw runtime_error("joint matrix is not supported on host device.",       \
                        PI_INVALID_DEVICE);                                    \
  }  

which preserves the signature but doesn't have unused arguments as they are unnamed.

@@ -0,0 +1,192 @@
// RUN: %clangxx -fsycl -O2 %s -o %t.out
#include <CL/sycl.hpp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <CL/sycl.hpp>
#include <sycl/sycl.hpp>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is "include/sycl/CL/sycl.hpp", how could we change into "#include <sycl/sycl.hpp>"

#if (SYCL_EXT_ONEAPI_MATRIX == 2)
#include <iostream>

using namespace sycl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to avoid.
If so, why the sycl:: on the 2 next lines? :-)

using namespace sycl::ext::oneapi::experimental::matrix;
using bfloat16 = sycl::ext::oneapi::experimental::bfloat16;

#define TILE_SZ 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constexpr everywhere instead of macros.

using namespace sycl::ext::oneapi::experimental::matrix;
using bfloat16 = sycl::ext::oneapi::experimental::bfloat16;

static constexpr size_t TILE_SZ = 16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since your macro was not using any type at the first place, you could just use auto instead of size_t.
Because with the macro the numbers we interpreted as 32-bit int while here you are forcing 64-bit, which might be somewhat less efficient in some case (but not in this test I guess).

Copy link
Contributor

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@yubingex007-a11y
Copy link
Contributor Author

ping? @steffenlarsen @dkhaldi

@yubingex007-a11y yubingex007-a11y requested a review from dkhaldi May 5, 2022 06:51
Copy link
Contributor

@dkhaldi dkhaldi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@yubingex007-a11y
Copy link
Contributor Author

@steffenlarsen could you help us merge it?

@steffenlarsen
Copy link
Contributor

Unexpectedly Passed Tests (1):
SYCL :: api/simd_view_select_2d_int.cpp

This was since fixed with intel/llvm-test-suite#1008

@yubingex007-a11y
Copy link
Contributor Author

yubingex007-a11y commented May 6, 2022

i think i know what happen in https://github.com/intel/llvm/runs/6314339242?check_suite_focus=true.
Since unused arg caused the compfail and abort the build , llvm-spirv was not built so we can find it in bin/*. so tools/spirv-to-ir-wrapper/spirv-to-ir-wrapper.ll failed due to"llvm-spirv: command not found"

#6111 is to fix the unused variable issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants