-
Notifications
You must be signed in to change notification settings - Fork 130
[SYCL][CUDA] Matrix MMA for double type using nvptx. #553
Conversation
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: jack.kirk <[email protected]>
@@ -0,0 +1,152 @@ | |||
// REQUIRES: gpu, cuda | |||
|
|||
// RUN: %clangxx -fsycl -fsycl-targets=%sycl_triple -Xsycl-target-backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a different prefix in the name like "tensorcore" rather than "nvptx"?
Also, can you add a comment that in the tensor core case, JIT compilation is not supported an specifying the arch argument is necessary?
I know this will be added as part of the doc but repetition can be useful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah both suggestions sound good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change
item.get_group().get_id()[1]; // column id of current submatrix of | ||
// BIG C matrix | ||
|
||
joint_matrix<double, matrix_use::accumulator, M, N, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a comment is necessary here to highlight the difference with the current matrix interface WRT the use of the "use" argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change
Signed-off-by: jack.kirk <[email protected]>
I do not appear to have access to see the pre-ci-cuda report, but it may fail because sm80 architecture is not available. If this is the case could you advice whether it is possible to provide a flag for the unit test so that it only runs if sm80 is available? |
Here is the output of the failing test:
|
Signed-off-by: jack.kirk <[email protected]>
Thanks, I reverted the clang-format change. Hopefully it works now. |
…-suite#553) Signed-off-by: JackAKirk <[email protected]>
Integration test for the new CUDA MMA implementation using double type for the matrix elements.
See intel/llvm#4696 for the implementation.
Signed-off-by: JackAKirk [email protected]