Skip to content

[SYCL][DOC] Update dot_acc extension spec #10113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: sycl
Choose a base branch
from
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 101 additions & 73 deletions sycl/doc/extensions/supported/sycl_ext_oneapi_dot_accumulate.asciidoc
Original file line number Diff line number Diff line change
@@ -1,74 +1,99 @@
= SYCL_INTEL_dot_accumulate
= sycl_ext_oneapi_dot_accumulate

:source-highlighter: coderay
:coderay-linenums-mode: table

// This section needs to be after the document title.
:doctype: book
:toc2:
:toc: left
:encoding: utf-8
:lang: en
:dpcpp: pass:[DPC++]

:blank: pass:[ +]
// Set the default source code type in this document to C++,
// for syntax highlighting purposes. This is needed because
// docbook uses c++ and html5 uses cpp.
:language: {basebackend@docbook:c++:cpp}

// Set the default source code type in this document to C,
// for syntax highlighting purposes.
:language: c

// This is what is needed for C++, since docbook uses c++
// and everything else uses cpp. This doesn't work when
// source blocks are in table cells, though, so don't use
// C++ unless it is required.
//:language: {basebackend@docbook:c++:cpp}
== Notice

[%hardbreaks]
Copyright (C) 2020-2023 Intel Corporation. All rights reserved.

== Introduction
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
permission by Khronos.

IMPORTANT: This specification is a draft.

NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by permission by Khronos.
== Contact

NOTE: This document is better viewed when rendered as html with asciidoctor. GitHub does not render image icons.
To report problems with this extension, please open a new issue at:

== Name Strings
https://github.com/intel/llvm/issues

`SYCL_ONEAPI_dot_accumulate`

This is a placeholder name.
== Dependencies

== Notice
This extension is written against the SYCL 2020 revision 7 specification. All
references below to the "core SYCL specification" or to section numbers in the
SYCL specification refer to that revision.

Copyright (c) 2020 Intel Corporation. All rights reserved.

== Status

Working Draft
This extension is implemented and fully supported by {dpcpp}.

This is a preview extension specification, intended to provide early access to a feature for review and community feedback. When the feature matures, this specification may be released as a formal extension.

Because the interfaces defined by this specification are not final and are subject to change they are not intended to be used by shipping software products.
== Overview

== Version
This extension adds new SYCL built-in functions that may simplify development
and provide access specialized hardware instructions when a SYCL kernel needs
to perform a dot product of two vectors followed by a scalar accumulation.

Built On: {docdate} +
Revision: 3

== Contact
== Specification

Ben Ashbaugh, Intel (ben 'dot' ashbaugh 'at' intel 'dot' com)
=== Feature test macro

== Dependencies
This extension provides a feature-test macro as described in the core SYCL
specification. An implementation supporting this extension must predefine the
macro `SYCL_EXT_ONEAPI_DOT_ACCUMULATE` to one of the values defined in the table
below. Applications can test for the existence of this macro to determine if
the implementation supports this feature, or applications can test the macro's
value to determine which of the extension's features the implementation
supports.

This extension is written against the SYCL 1.2.1 specification, Revision v1.2.1-6.
[%header,cols="1,5"]
|===
|Value
|Description

== Overview
|1
|Initial version of this extension.
Comment on lines +73 to +74
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to call this version one and the initial version? IMHO this loses history we might want to preserve. If we do call this version one would it be better to add it as a new document and move the existing document to a "deprecated" or "removed" directory so we don't lose track of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an API version, not a document version. Normally, I would bump the API version since this PR adds new APIs. However, we never implemented any API versioning prior to this PR, and I can't go back in time and add it now. The best I can do is to start versioning at "1" now.

Regarding the loss of history ... this PR doesn't remove any of the old APIs. They are still retained as "deprecated", and this extension document still describes them. (In fact, they are described better now than in the previous document.)

|===

This extension adds new SYCL built-in functions that may simplify development and provide access specialized hardware instructions when a SYCL kernel needs to perform a dot product of two vectors followed by a scalar accumulation.
=== New functions to compute the dot product of vectors

== Enabling the extension
This extension adds the following free functions:

The extension is always enabled. The dot product functionality may be emulated in software or executed using hardware when suitable instructions are available.
[source,c++]
----
namespace sycl::ext::oneapi {

== Modifications of SYCL 1.2.1 specification
int32_t dot_acc(vec<int8_t,4> a, vec<int8_t,4> b, int32_t c);
int32_t dot_acc(vec<int8_t,4> a, vec<uint8_t,4> b, int32_t c);
int32_t dot_acc(vec<uint8_t,4> a, vec<int8_t,4> b, int32_t c);
int32_t dot_acc(vec<uint8_t,4> a, vec<uint8_t,4> b, int32_t c);

=== Add to Section 4.13.6 - Geometric Functions
int32_t doc_acc_4x8packed(int32_t a, int32_t b, int32_t c);
int32_t doc_acc_4x8packed(int32_t a, uint32_t b, int32_t c);
int32_t doc_acc_4x8packed(uint32_t a, int32_t b, int32_t c);
int32_t doc_acc_4x8packed(uint32_t a, uint32_t b, int32_t c);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a terribly strong opinion about either of these, but noting for completeness: If we want to align even closer to the related OpenCL extension we should consider the following changes:

  1. The OpenCL extension only has dot_acc functions that do the dot product and accumulate for the variants that saturate, otherwise the function is simply dot, which does a dot product.
  2. In addition to the return value difference, the type for the packed a and b arguments is unconditionally an unsigned integer in the OpenCL extension. To disambiguate the different types the signedness of the argument is encoded in the function name (the return type is also), so for example if the packed a is signed and b is unsigned the full function is int dot_4x8packed_su_int(uint a, uint b).

Links for reference:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the suggestion to use uint32_t always for the "packed" parameters, so I also adopted the "_ss" (etc.) convention. Done in 84f1ff8.

It didn't make sense to me to adopt the final "_int" / "_uint" signifying the return type for the packed APIs unless we also adopt that convention for the unpacked APIs. I decided to leave it off for both.

I don't know why this extension added dot_acc rather than just dot. Dropping the "accumulate" operation seemed like a bigger change, so I decided not to do that here.


Additionally, the following additional functions are available in the namespace `sycl::intel` on the host and device.
} // namespace sycl::ext::oneapi
----

[cols="4a,4",options="header"]
|====
Expand All @@ -95,56 +120,59 @@ int32_t dot_acc(vec<uint8_t,4> a,
{blank}
The value that is returned is equivalent to +
{blank}
*dot*(_a_, _b_) + _c_
`dot(a, b) + c`

|[source,c]
----
int32_t doc_acc_4x8packed(int32_t a,
int32_t b,
int32_t c)
int32_t doc_acc_4x8packed(int32_t a,
uint32_t b,
int32_t c)
int32_t doc_acc_4x8packed(uint32_t a,
int32_t b,
int32_t c)
int32_t doc_acc_4x8packed(uint32_t a,
uint32_t b,
int32_t c);
----

|Performs a four-component integer dot product accumulate operation, where
`a` and `b` are 32-bit integers that represent a vector of 4 8-bit elements.
When the type of `a` or `b` is `int32_t`, it is interpreted as `vec<int8_t,4>`.
When the type of `a` or `b` is `uint32_t`, it is interpreted as
`vec<uint8_t,4>`. In each case, the least significant byte is element 0, and
the most significant byte is element 3.

|====

== Sample Header
=== Deprecated functions

The following functions are deprecated. They have the same effect as the
`doc_acc_4x8packed` overloads described above.

[source,c++]
----
namespace cl {
namespace sycl {
namespace ext {
namespace oneapi {

int32_t dot_acc(vec<int8_t,4> a, vec<int8_t,4> b, int32_t c);
int32_t dot_acc(vec<int8_t,4> a, vec<uint8_t,4> b, int32_t c);
int32_t dot_acc(vec<uint8_t,4> a, vec<int8_t,4> b, int32_t c);
int32_t dot_acc(vec<uint8_t,4> a, vec<uint8_t,4> b, int32_t c);
namespace sycl::ext::oneapi {

int32_t dot_acc(int32_t a, int32_t b, int32_t c);
int32_t dot_acc(int32_t a, uint32_t b, int32_t c);
int32_t dot_acc(uint32_t a, int32_t b, int32_t c);
int32_t dot_acc(uint32_t a, uint32_t b, int32_t c);

} // oneapi
} // ext
} // sycl
} // cl
} // namespace sycl::ext::oneapi
----


== Issues

None.

== Revision History

[cols="5,15,15,70"]
[grid="rows"]
[options="header"]
|========================================
|Rev|Date|Author|Changes
|1|2019-12-13|Ben Ashbaugh|*Initial draft*
|2|2019-12-18|Ben Ashbaugh|Switched to standard C++ fixed width types.
|3|2020-10-26|Rajiv Deodhar|Added int32 types.
|========================================

//************************************************************************
//Other formatting suggestions:
//
//* Use *bold* text for host APIs, or [source] syntax highlighting.
//* Use `mono` text for device APIs, or [source] syntax highlighting.
//* Use `mono` text for extension names, types, or enum values.
//* Use _italics_ for parameters.
//************************************************************************
* The overloads that take two unsigned vectors have a signed `c` and return a
signed result. This is inconsistent with the SPIR-V primitives and the
OpenCL C APIs, both of which return an unsigned value in this case and expect
an unsigned `c`. I think we could implement the APIs as they are using the
SPIR-V primitives, but the extra unsigned-to-signed conversions might
generate less efficient code (I haven't checked). Is there a compelling
reason to keep these APIs as they are now? If not, we could deprecate them
and introduce overloads that take an unsigned `c` and return an unsigned
value.