Skip to content

Commit e8482ac

Browse files
authored
[SYCL][Doc][matrix] Incorporate Greg's review specifically:
- Add changing the default sizes in the query to M, N, K to the todo list - Add missing layout to the alias matrices in query - Add comment about the combinations array in the size-only query case - Add the combinations array to the validation case as well, for consistency - Add a table that explains each of the query class members and type aliases - Adjust the order of types and sizes in the combination type
1 parent c1b7a13 commit e8482ac

File tree

1 file changed

+76
-25
lines changed

1 file changed

+76
-25
lines changed

sycl/doc/extensions/Matrix/dpcpp-joint-matrix.asciidoc

Lines changed: 76 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -246,7 +246,37 @@ The query interface proposed here consists of three functionalities:
246246

247247
- Construct the matrices using a default shape if user does not provide a combination. This corresponds to the case where the user provides the sizes of large `tile` matrices but does not specify the sizes of the corresponding submatrices of the `tiles`. In this case, the query will construct these submatrices of the matrices whose size the user provided.
248248

249-
- General query interface for sizes, types, static/dynamic, scope. This is needed to avoid padding by the user, for tuning, and efficient code generation if used by a library. The general query return an array of `combinations` of `combination` type. Each combination includes the sizes and the types for the matrices A, B, and C. Note that for each TPU, the query returns `max_msize, max_nsize, max_ksize` or `msize, nsize, ksize` exclusively depending whether the implementation supports a continuous or discrete number of sizes. For example, Intel AMX implementation supports a continuous number of sizes so the `max_*` variant is applied and only the maximum number is returned. DPAS implementation, on the other hand, supports a discrete list of numbers so the `msize, nsize, ksize` variant is applied.
249+
- General query interface for sizes, types, static/dynamic, scope. This is needed to avoid padding by the user, for tuning, and efficient code generation if used by a library. The general query return an array of `combinations` of `combination` type. Each combination includes the sizes and the types for the matrices A, B, and C. Note that for each TPU, the query returns `max_msize, max_nsize, max_ksize` or `msize, nsize, ksize` exclusively depending whether the implementation supports a continuous or discrete number of sizes. For example, Intel AMX implementation supports a continuous number of sizes so the `max_*` variant is applied and only the maximum number is returned. DPAS implementation, on the other hand, supports a discrete list of numbers so the `msize, nsize, ksize` variant is applied.
250+
251+
The table below provides a desciption for each of the member variables and type aliases in `tpu_params` class.
252+
253+
[frame="none",options="header"]
254+
|======================
255+
| Member/type alias in `tpu_params` |Description
256+
|`type_a`| type alias for the type of matrix A
257+
|`type_b`| type alias for the type of matrix B
258+
|`type_c`| type alias for the type of matrix C
259+
|`defaultM`| when no sizes are provided by the user, indicates the suggested default size for M; usually this corresponds to the maximum size the implementation supports
260+
|`defaultN`| when no sizes are provided by the user, indicates the suggested default size for N; usually this corresponds to the maximum size the implementation supports
261+
|`defaultK`| when no sizes are provided by the user, indicates the suggested default size for K; usually this corresponds to the maximum size the implementation supports
262+
|`joint_matrix_a`| type alias for `joint_matrix` for matrix A
263+
|`joint_matrix_b`| type alias for `joint_matrix` for matrix B
264+
|`joint_matrix_c`| type alias for `joint_matrix` for matrix C
265+
|`dynamic_p`| a boolean that indicates whether the implementation supports dynamic sizes (true) or not (false)
266+
|numtiles| indicates number of tiles in Intel AMX (does not apply to DPAS)
267+
|scope| indicates the memory and execution scope supported by the TPU implementation
268+
|`combination` | composes the types and sizes of A, B, C matrices allowed in one combination
269+
|`max_msize`, `max_nsize`, `max_ksize`|When one of these members is non-zero, it indicates that the TPU supports all element sizes in the range from 1 up to the given value. By contrast, a zero value indicates that the TPU implementation supports only a discrete set of element sizes, which are given by the corresponding msize, nsize, or ksize members
270+
|`msize`, `nsize`, `ksize`| presents one of the sizes that the TPU implementation supports
271+
|`atype`, `btype`, `ctype`| indicates the types supported in the combination
272+
|`combinations` |Tells the set of supported matrix sizes and types according to the template parameters that are provided. In the "general query" form, the user provides only the TPU type, so the combinations array contains all supported tile sizes and element types for that TPU. In the "default values" form, the user provides the TPU type and element types, so the combinations array contains only those supported matrix sizes and element types that match those element types on that TPU. In the "validation" form, the user provides the TPU type, element types, and element sizes.
273+
|`num_combinations`| indicates number of combinations supported by the TPU implementation which corresponds to the size of the `combinations` array
274+
|======================
275+
276+
277+
278+
279+
250280

251281
```c++
252282
namespace sycl::ext::oneapi::experimental::matrix {
@@ -285,15 +315,31 @@ struct tpu_params<
285315

286316
template <matrix_layout Layout = matrix_layout::row_major, typename Group = sub_group>
287317
using joint_matrix_a = joint_matrix<Ta, defaultM, defaultK, Layout, Group>;
288-
template <typename Group>
318+
template <matrix_layout Layout = matrix_layout::row_major, typename Group = sub_group>
289319
using joint_matrix_b = joint_matrix<Tb, defaultK, defaultN, Layout, Group>;
290-
template <typename Group>
320+
template <matrix_layout Layout = matrix_layout::row_major, typename Group = sub_group>
291321
using joint_matrix_c = joint_matrix<Tc, defaultM, defaultN, Layout, Group>;
292322

293-
bool dynamic_p = false; // should be true in future implementations
323+
static constexpr bool dynamic_p = false; // should be true in future implementations
294324
// because Intel AMX hardware supports dynamic sizes
295-
uint32_t numtiles = 8;
296-
scope_t scope = scope_t::sub_group;
325+
static constexpr uint32_t numtiles = 8;
326+
static constexpr scope_t scope = scope_t::sub_group;
327+
struct combination {
328+
uint32_t max_msize;
329+
uint32_t max_nsize;
330+
uint32_t max_ksize;
331+
uint32_t msize;
332+
uint32_t nsize;
333+
uint32_t ksize;
334+
matrix_type atype;
335+
matrix_type btype;
336+
matrix_type ctype;
337+
};
338+
// In this case, the combinations array contains only the combination that the user provided
339+
static constexpr combination combinations[] = {
340+
{16, 16, (sizeof(Ta) == 1) ? 64 : 32, M, N, K}};
341+
static constexpr int num_combinations =
342+
sizeof(combinations) / sizeof(combination);
297343
};
298344

299345
// Sizes-only query
@@ -319,26 +365,28 @@ struct tpu_params<tpu::amx, Ta, Tb, Tc, 0, 0, 0,
319365

320366
template <matrix_layout Layout = matrix_layout::row_major, typename Group = sub_group>
321367
using joint_matrix_a = joint_matrix<Ta, defaultM, defaultK, Layout, Group>;
322-
template <typename Group>
368+
template <matrix_layout Layout = matrix_layout::row_major, typename Group = sub_group>
323369
using joint_matrix_b = joint_matrix<Tb, defaultK, defaultN, Layout, Group>;
324-
template <typename Group>
370+
template <matrix_layout Layout = matrix_layout::row_major, typename Group = sub_group>
325371
using joint_matrix_c = joint_matrix<Tc, defaultM, defaultN, Layout, Group>;
326372

327-
bool dynamic_p = false; // should be true in future implementations because
373+
static constexpr bool dynamic_p = false; // should be true in future implementations because
328374
// Intel AMX hardware supports dynamic sizes
329-
uint32_t numtiles = 8;
330-
scope_t scope = scope_t::sub_group;
375+
static constexpr uint32_t numtiles = 8;
376+
static constexpr scope_t scope = scope_t::sub_group;
331377
struct combination {
332378
uint32_t max_msize;
333379
uint32_t max_nsize;
334380
uint32_t max_ksize;
335-
matrix_type atype;
336-
matrix_type btype;
337-
matrix_type ctype;
338381
uint32_t msize;
339382
uint32_t nsize;
340383
uint32_t ksize;
384+
matrix_type atype;
385+
matrix_type btype;
386+
matrix_type ctype;
341387
};
388+
// In this case, the combinations array contain only the combinations that correspond to the Ta, Tb, and Tc
389+
// types that the user provided
342390
static constexpr combination combinations[] = {
343391
{16, 16, (sizeof(Ta) == 1) ? 64 : 32}};
344392
static constexpr int num_combinations =
@@ -353,28 +401,28 @@ struct tpu_params<tpu::amx, void, void, void, M, N, K> {
353401
static constexpr std::size_t defaultN = -1;
354402
static constexpr std::size_t defaultK = -1;
355403

356-
bool dynamic_p = false; // should be true in future implementations because
404+
static constexpr bool dynamic_p = false; // should be true in future implementations because
357405
// Intel AMX hardware supports dynamic sizes
358-
uint32_t numtiles = 8;
359-
constscope_t scope = scope_t::sub_group;
406+
static constexpr uint32_t numtiles = 8;
407+
static constexpr scope_t scope = scope_t::sub_group;
360408
struct combination {
361409
uint32_t max_msize;
362410
uint32_t max_nsize;
363411
uint32_t max_ksize;
364-
matrix_type atype;
365-
matrix_type btype;
366-
matrix_type ctype;
367412
uint32_t msize;
368413
uint32_t nsize;
369414
uint32_t ksize;
415+
matrix_type atype;
416+
matrix_type btype;
417+
matrix_type ctype;
370418
};
371419

372420
static constexpr combination combinations[] = {
373-
{16, 16, 64, matrix_type::sint8, matrix_type::sint8, matrix_type::sint32},
374-
{16, 16, 64, matrix_type::sint8, matrix_type::uint8, matrix_type::sint32},
375-
{16, 16, 64, matrix_type::uint8, matrix_type::sint8, matrix_type::sint32},
376-
{16, 16, 64, matrix_type::uint8, matrix_type::uint8, matrix_type::sint32},
377-
{16, 16, 32, matrix_type::bf16, matrix_type::bf16, matrix_type::fp32}};
421+
{16, 16, 64, 0, 0, 0, matrix_type::sint8, matrix_type::sint8, matrix_type::sint32},
422+
{16, 16, 64, 0, 0, 0, matrix_type::sint8, matrix_type::uint8, matrix_type::sint32},
423+
{16, 16, 64, 0, 0, 0, matrix_type::uint8, matrix_type::sint8, matrix_type::sint32},
424+
{16, 16, 64, 0, 0, 0, matrix_type::uint8, matrix_type::uint8, matrix_type::sint32},
425+
{16, 16, 32, 0, 0,0, matrix_type::bf16, matrix_type::bf16, matrix_type::fp32}};
378426
static constexpr int num_combinations =
379427
sizeof(combinations) / sizeof(combination);
380428
};
@@ -541,6 +589,9 @@ We did not utilize this extension for this matrix API version because sub-group
541589
## TODO List
542590
- Add support for fill matrix and element-wise operations features
543591
- Add 'matrix_use' parameter to the matrix to distinguish between matrix A, B, and matrix accumulator. This is necessary for supporting VNNI and transpose transform
592+
- Change the names default sizes in the query from defaultM, defaultN, defaultK to M,N,K
593+
- Change the type of `scope` in the query interface to be able to return more than one value. This will be useful in the event we support other scopes like workgroup besides subgroups
594+
544595

545596
## Revision History
546597

0 commit comments

Comments
 (0)