diff --git a/sycl/doc/design/spirv-extensions/SPV_INTEL_joint_matrix.asciidoc b/sycl/doc/design/spirv-extensions/SPV_INTEL_joint_matrix.asciidoc index 6059d05ece215..aad37186a4abe 100644 --- a/sycl/doc/design/spirv-extensions/SPV_INTEL_joint_matrix.asciidoc +++ b/sycl/doc/design/spirv-extensions/SPV_INTEL_joint_matrix.asciidoc @@ -1,7 +1,19 @@ :extension_name: SPV_INTEL_joint_matrix -:capability_name: JointMatrixINTEL -:capability_token: 6118 -:OpTypeJointMatrixINTEL_token: 6119 +:main_capability_name: JointMatrixINTEL +:main_capability_token: 6118 +:packed_capability_name: PackedJointMatrixINTEL +:packed_capability_token: 6434 +:wi_capability_name: JointMatrixWIInstructionsINTEL +:wi_capability_token: 6435 +:tf32_capability_name: JointMatrixTF32ComponentTypeINTEL +:tf32_capability_token: 6436 +:bf16_capability_name: JointMatrixBF16ComponentTypeINTEL +:bf16_capability_token: 6437 +:packed2_capability_name: JointMatrixPackedInt2ComponentTypeINTEL +:packed2_capability_token: 6438 +:packed4_capability_name: JointMatrixPackedInt4ComponentTypeINTEL +:packed4_capability_token: 6439 +:OpTypeJointMatrixINTEL_token: 6184 :OpJointMatrixLoadINTEL_token: 6120 :OpJointMatrixStoreINTEL_token: 6121 :OpJointMatrixMadINTEL_token: 6122 @@ -9,6 +21,9 @@ :OpJointMatrixUSMadINTEL_token: 6129 :OpJointMatrixUUMadINTEL_token: 6130 :OpJointMatrixWorkItemLengthINTEL_token: 6410 +:OpJointMatrixGetElementCoordINTEL_token: 6440 + +:DPCPP_URL: https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_intel_matrix.asciidoc {extension_name} ================ @@ -28,12 +43,16 @@ https://github.com/intel/llvm - Alexey Sotkin, Intel + - Dounia Khaldi, Intel + -- Mateusz, Belicki Intel + +- Mateusz Belicki, Intel + - Dmitry Sidorov, Intel + +- Ben Ashbaugh, Intel + +- Greg Lueck, Intel + +- Victor Mustya, Intel + +- Arvind Sudarsanam, Intel + == Notice -Copyright (c) 2021 Intel Corporation. All rights reserved. +Copyright (c) 2023 Intel Corporation. All rights reserved. == Status @@ -53,24 +72,23 @@ please let us know! [width="40%",cols="25,25"] |======================================== -| Last Modified Date | 2022-03-10 -| Revision | 4 +| Last Modified Date | 2023-02-01 +| Revision | 10 |======================================== == Dependencies This extension is written against the SPIR-V Specification, -Version 1.5 Revision 5. +Version 1.6 Revision 2. This extension requires SPIR-V 1.0. == Overview This extension adds a type and instructions for joint matrices. Such matrices -are shared among a group of work-items and is not private to each work-item. +are shared among a group of work-items and are not private to each work-item. The type introduced with this extension allows to specify memory scope and -layout of the matrix, including layouts optimized for particular hardware(AMX) . -New instructions also allow to specify synchronization scope. +location where the joint matrix is used in math operation. == Extension Name @@ -89,44 +107,71 @@ This extension introduces new capabilities: [subs="attributes"] ---- -{capability_name} +{main_capability_name} +{packed_capability_name} +{wi_capability_name} +{tf32_capability_name} +{bf16_capability_name} +{packed2_capability_name} +{packed4_capability_name} ---- == New Instructions -Instructions added under the *{capability_name}* capability: +Instructions added under the *{main_capability_name}* capability: ---- OpTypeJointMatrixINTEL OpJointMatrixLoadINTEL OpJointMatrixStoreINTEL -OpJointMatrixMADINTEL -OpJointMatrixSUMADINTEL -OpJointMatrixUSMADINTEL -OpJointMatrixUUMADINTEL +OpJointMatrixMadINTEL +OpJointMatrixSUMadINTEL +OpJointMatrixUSMadINTEL +OpJointMatrixUUMadINTEL + +---- + +Instructions added under the *{wi_capability_name}* capability: + +---- + OpJointMatrixWorkItemLengthINTEL +OpJointMatrixGetElementCoordINTEL ---- + == Token Number Assignments [width="40%"] [cols="70%,30%"] [grid="rows"] |==== -|*{capability_name}* | {capability_token} -|*OpTypeJointMatrixINTEL* | {OpTypeJointMatrixINTEL_token} -|*OpJointMatrixLoadINTEL* | {OpJointMatrixLoadINTEL_token} -|*OpJointMatrixStoreINTEL* | {OpJointMatrixStoreINTEL_token} -|*OpJointMatrixMadINTEL* | {OpJointMatrixMadINTEL_token} -|*OpJointMatrixSUMadINTEL* | {OpJointMatrixSUMadINTEL_token} -|*OpJointMatrixUSMadINTEL* | {OpJointMatrixUSMadINTEL_token} -|*OpJointMatrixUUMadINTEL* | {OpJointMatrixUUMadINTEL_token} -|*OpJointMatrixWorkItemLengthINTEL* | {OpJointMatrixWorkItemLengthINTEL_token} +|*{main_capability_name}* | {main_capability_token} +|*{packed_capability_name}* | {packed_capability_token} +|*{wi_capability_name}* | {wi_capability_token} +|*{tf32_capability_name}* | {tf32_capability_token} +|*{bf16_capability_name}* | {bf16_capability_token} +|*{packed2_capability_name}* | {packed2_capability_token} +|*{packed4_capability_name}* | {packed4_capability_token} +|*OpTypeJointMatrixINTEL* | {OpTypeJointMatrixINTEL_token} +|*OpJointMatrixLoadINTEL* | {OpJointMatrixLoadINTEL_token} +|*OpJointMatrixStoreINTEL* | {OpJointMatrixStoreINTEL_token} +|*OpJointMatrixMadINTEL* | {OpJointMatrixMadINTEL_token} +|*OpJointMatrixSUMadINTEL* | {OpJointMatrixSUMadINTEL_token} +|*OpJointMatrixUSMadINTEL* | {OpJointMatrixUSMadINTEL_token} +|*OpJointMatrixUUMadINTEL* | {OpJointMatrixUUMadINTEL_token} +|*OpJointMatrixWorkItemLengthINTEL* | {OpJointMatrixWorkItemLengthINTEL_token} +|*OpJointMatrixGetElementCoordINTEL* | {OpJointMatrixGetElementCoordINTEL_token} |==== -== Modifications to the SPIR-V Specification, Version 1.5 +== Modifications to the SPIR-V Specification, Version 1.6 + +=== 2.16 Validation Rules + +Joint matrix types (or types containing them) can only be allocated in *Function* +or *Private* <>. === 2.2 Terms Add new terms to section 2.2.2 Types: @@ -136,19 +181,71 @@ is spread across multiple invocations. Add _Joint Matrix_ to the definition of _Composite_. -=== Matrix layout +Add _Joint Matrix_ to the definition of _Concrete Type_. + +=== Joint Matrix Layout -Add section 3.XX, Matrix layout. +Add section 3.XX, Joint Matrix Layout. +'Layout' indicates how the values of joint matrix are arranged in memory. [options="header"] |==== 2+^| Layout ^| Enabling capability -| 0 | *ColumnMajor* | *{capability_name}* -| 1 | *RowMajor* | *{capability_name}* -| 2 | *PackedA* + -Suitable for VNNI instructions | *{capability_name}* -| 3 | *PackedB* + -Suitable for VNNI instructions | *{capability_name}* +| 0 | *RowMajor* | *{main_capability_name}* +| 1 | *ColumnMajor* | *{main_capability_name}* +| 2 | *Packed* + +Suitable for Vector Neural Network Instruction (VNNI) format used in Intel AMX +and Intel XMX. It specifies that the data was prepacked by user before loading +a joint matrix. +More info could be found in {DPCPP_URL}[DPCPP matrix extension spec] | *{packed_capability_name}* +|==== + +=== Joint Matrix Use + +Add section 3.XX, Joint Matrix Use. +'Use' specifies where the joint matrix is used in math operation. + +[options="header"] +|==== +2+^| Use ^| Enabling capability +| 0 | *MatrixA* | *{main_capability_name}* +| 1 | *MatrixB* | *{main_capability_name}* +| 2 | *Accumulator* | *{main_capability_name}* +|==== + +=== Joint Matrix Component Type Interpretation + +Add section 3.XX, Joint Matrix Component Type Interpretation. +To be used by 'Component Type Interpretation' optional parameter of +*TypeJointMatrixINTEL*. + +[options="header"] +|==== +2+^| Interpretation ^| Enabling capability +| 0 | *None* | +| 1 | *TF32* + +'Component Type' must be _float_. Interpret 'Component Type' of joint matrix +as TF32. | *{tf32_capability_name}* +| 2 | *Bfloat16* + +'Component Type' must be 16-bit _integer_. Interpret 'Component Type' of joint +matrix as Bfloat16. | *{bf16_capability_name}* +| 3 | *PackedInt2* + +'Component Type' must be _integer_. Interpret -bit _integer_ 'Component Type' +of joint matrix as a vector of 2-bit integers. Number of components of this +vector is limited by enabled SPIR-V capabilities, which brings limitations on +possible width of the _integer_. + +If a joint matrix type that has *ComponentTypeInterpretation* operand with +*PackedInt2* value is used in an arithmetic instruction, then to verify +this instruction's inputs 'Column' and 'Row' of the matrix should be taken with +a factor of a size of this packed vector. | *{packed2_capability_name}* +| 4 | *PackedInt4* + +Interpret -bit _integer_ 'Component Type' of joint matrix as a vector of 4-bit integers. +Number of components of this vector is limited by enabled SPIR-V capabilities, +which brings limitations on possible width of the _integer_. + +If a joint matrix type that has *ComponentTypeInterpretation* operand with +*PackedInt4* value is used in an arithmetic instruction, then to verify +this instruction's inputs 'Column' and 'Row' of the matrix should be taken with +a factor of a size of this packed vector. | *{packed4_capability_name}* |==== === Capabilities @@ -159,44 +256,89 @@ Modify Section 3.31, Capability, adding rows to the Capability table: [options="header"] |==== 2+^| Capability ^| Implicitly Declares -| {capability_token} | *{capability_name}* -| Reserved. + +| {main_capability_token} | *{main_capability_name}* + + + +Uses *TypeJointMatrixINTEL* + +See also extension: *{extension_name}* +| +| {packed_capability_token} | *{packed_capability_name}* + + +Uses *Packed* layout to <>. + See also extension: *{extension_name}* +| *{main_capability_name}* + +| {wi_capability_token} | *{wi_capability_name}* + + + +Uses <> and +<> +instructions. + +See also extension: *{extension_name}* +| *{main_capability_name}* + +| {tf32_capability_token} | *{tf32_capability_name}* + + + +Uses *TF32* in 3.XX, Joint Matrix Component Type Interpretation + + + +See also extension: *{extension_name}* +| *{main_capability_name}* + +| {bf16_capability_token} | *{bf16_capability_name}* + + + +Uses *BF16* in 3.XX, Joint Matrix Component Type Interpretation + + + +See also extension: *{extension_name}* +| *{main_capability_name}* + +| {packed2_capability_token} | *{packed2_capability_name}* + + + +Uses *PackedInt2* in 3.XX, Joint Matrix Component Type Interpretation + + + +See also extension: *{extension_name}* +| *{main_capability_name}* + +| {packed4_capability_token} | *{packed4_capability_name}* + + + +Uses *PackedInt4* in 3.XX, Joint Matrix Component Type Interpretation + + + +See also extension: *{extension_name}* +| *{main_capability_name}* + |==== -- === Instructions -==== 3.37.6 Type-Declaration Instructions +==== 3.42.6 Type-Declaration Instructions -[cols="1,1,6*3",width="100%"] +[cols="1,1,7*3",width="100%"] |===== -7+|[[OpTypeJointMatrixINTEL]]*OpTypeJointMatrixINTEL* + +8+|[[OpTypeJointMatrixINTEL]]*OpTypeJointMatrixINTEL* + + -Declare a matrix type. + +Declare a joint matrix type. + + 'Component Type' is the type of each component in the resulting type. It must be a scalar 'numerical type'. + + -'Row Count' is the number of rows in the matrix type. It must be a constant -unsigned 32-bit integer. Behavior is undefined when 'Row Count' is 0 or +'Row Count' is the number of rows in the joint matrix type. It must be an '' +of 'constant instruction' with scalar 32-bit integer. It is invalid for 'Column Count' to be 0 or <>. + + -'Column Count' is the number of columns in the matrix type. It must be a -constant unsigned 32-bit integer. Behavior is undefined when 'Column Count' is -0 or <>. + +'Column Count' is the number of columns in the joint matrix type. It must be an '' +of 'constant instruction' with scalar 32-bit integer. It is invalid for 'Column Count' to be 0 or +<>. + + -'Layout' indicates how the values are arranged internally in the matrix type. -It must be the result of a constant instruction. + +Execution is a 'Scope'. Must be an '' of 'constant instruction' +with scalar 32-bit integer. Its value must be either _Workgroup_ or +_Subgroup_ from the table 3.27. Scope . + + + +'Use' parameter shows location of the matrix in a math operation. +Must be an '' of 'constant instruction' with scalar 32-bit integer type. Its +value must be one of the values in the table 3.XX, <>. + + + +_Optional_ 'Component Type Interpretation' specifies how to interpret +'Component Type' when components of a joint matrix are storages for values of +different types. Must be an '' of 'constant instruction' with scalar 32-bit +integer type. Its value must be one of the values in the table 3.XX, +<>. + + -'Scope' is memory scope for operations on the matrix. It must be the -result of a constant instruction with scalar 'integer type'. + - 1+|Capability: + -*{capability_name}* -1+| 7 | {OpTypeJointMatrixINTEL_token} +*{main_capability_name}* +1+| 7+ | {OpTypeJointMatrixINTEL_token} | 'Result ' | '' + 'Component Type' @@ -205,41 +347,51 @@ result of a constant instruction with scalar 'integer type'. + | '' + 'Column Count' | '' + -'Layout' -| '' + 'Scope' +| '' + +'Use' +|_Optional_ '' + +'Component Type Interpretation' |===== -==== 3.37.8. Memory Instructions +==== 3.42.8. Memory Instructions -[cols="1,1,7*3",width="100%"] +[cols="1,1,6*3",width="100%"] |===== -8+|[[OpJointMatrixLoadINTEL]]*OpJointMatrixLoadINTEL* + +7+|[[OpJointMatrixLoadINTEL]]*OpJointMatrixLoadINTEL* + + -Load a matrix through a pointer. + +Load a joint matrix through a pointer. + + -'Result Type' is the type of the loaded matrix. It must be +'Result Type' is the type of the loaded joint matrix. It must be <>. + + 'Pointer' is the pointer to load through. It specifies start of memory region -where elements of the matrix are stored and arranged according to 'Layout'. + +where elements of the joint matrix are stored and arranged according to 'Layout'. +The <> of 'Pointer' must be *Workgroup*, +*CrossWorkgroup*, *StorageBuffer*, *Generic* or *PhysicalStorageBuffer*. + + -'Stride' is the number of elements in memory between beginnings of successive -rows, columns (or words) in the result. It must be a scalar integer type. + +'Stride' describes the number of elements between consecutive rows for the +RowMajor 'layout', or between columns for the ColumnMajor 'layout'. + + -'Layout' indicates how the values loaded from memory are arranged. -It must be the result of a constant instruction. + - + -'Scope' is syncronization scope for operation on the matrix. It must be the -result of a constant instruction with scalar 'integer type'. + +'Layout' indicates how the values in memory are arranged. +Must be an '' of 'constant instruction' with scalar 32-bit integer type. Its +value must be one of the values in the table 3.XX, +<>. + + If present, any 'Memory Operands' must begin with a <> literal. If not present, it is the same as specifying the <> *None*. + + + +For a given dynamic instance of this instruction, all operands of this +instruction must be the same for all invocations in a given scope instance +(where the scope is the scope the joint matrix type was created with). +All invocations in a given scope instance must be active or all must be +inactive. + + 1+|Capability: + -*{capability_name}* -1+| 7 + variable | {OpJointMatrixLoadINTEL_token} +*{main_capability_name}* +1+| 6 + variable | {OpJointMatrixLoadINTEL_token} | '' + 'Result Type' |'Result ' @@ -248,41 +400,47 @@ specifying the <> *None*. + | '' + 'Stride' | '' + -'<>' -| '' + -'Scope' +'<>' | Optional + -'Memory Access' +'Memory Operands' |===== -[cols="1,1,6*3",width="100%"] +[cols="1,1,5*3",width="100%"] |===== -7+|[[OpJointMatrixStoreINTEL]]*OpJointMatrixStoreINTEL* + +6+|[[OpJointMatrixStoreINTEL]]*OpJointMatrixStoreINTEL* + + -Store a matrix through a pointer. + +Store a joint matrix through a pointer. + + 'Pointer' is the pointer to store through. It specifies start of memory region -where elements of the matrix must be stored and arranged according to 'Layout'. + +where elements of the joint matrix must be stored and arranged according to 'Layout'. + +The <> of 'Pointer' must be *Workgroup*, +*CrossWorkgroup*, *StorageBuffer*, *Generic* or *PhysicalStorageBuffer*. + + -'Object' is the matrix to store. It must be +'Object' is the joint matrix to store. It must be <>. + + -'Stride' is the number of elements in memory between beginnings of successive -rows, columns (or words) of the 'Object'. It must be a scalar integer type. + +'Stride' describes the number of elements between consecutive rows for the +RowMajor 'layout', or between columns for the ColumnMajor 'layout'. + + -'Layout' indicates how the values stored to memory are arranged. It must be the -result of a constant instruction. + - + -'Scope' is syncronization scope for operation on the matrix. It must be the -result of a constant instruction with scalar 'integer type'. + +'Layout' indicates how the values stored are arranged in memory. +Must be an '' of 'constant instruction' with scalar 32-bit integer type. Its +value must be one of the values in the table 3.XX, +<>. + + If present, any 'Memory Operands' must begin with a <> literal. If not present, it is the same as specifying the <> *None*. + + + +For a given dynamic instance of this instruction, all operands of this +instruction must be the same for all invocations in a given scope instance +(where the scope is the scope the joint matrix type was created with). +All invocations in a given scope instance must be active or all must be +inactive. + + 1+|Capability: + -*{capability_name}* -1+| 6 + variable | {OpJointMatrixStoreINTEL_token} +*{main_capability_name}* +1+| 5 + variable | {OpJointMatrixStoreINTEL_token} | '' + 'Pointer' | '' + @@ -290,20 +448,25 @@ specifying the <> *None*. + | '' + 'Stride' | '' + -'<>' -| '' + -'Scope' +'<>' | Optional + -'Memory Access' +'Memory Operands' |===== -==== 3.37.12. Composite Instructions +==== 3.42.12. Composite Instructions + +Modify OpCompositeConstruct to make an exception for joint matrix types: +"If the 'Result Type' is <> +then there must be only one 'Constituent' and it will be used to initialize all +elements of the joint matrix." + Modify *OpVectorExtractDynamic* and *OpVectorInsertDynamic* to accept <> as the 'Vector' operand. In this case the instructions operate on an implicit vector which represents part of the joint matrix and holds components owned by the current work-item. -If the 'index' operand of these instructions exceeds the value returned by +If the 'index' operand of these instructions is less than zero or exceeds the +value returned by <>, behavior is undefined. @@ -313,13 +476,14 @@ behavior is undefined. + Return number of components owned by the current work-item in a joint matrix. + + -'Result Type' must be an 32-bit unsigned integer type scalar. + +'Result Type' must be an 'integer type' scalar. + + -'Matrix' is the <> to query the -number of the components. + +'Matrix' is an ID of <>. +The instruction returns the number of the components of this joint matrix type +owned by the current work-item. + 1+|Capability: + -*{capability_name}* +*{wi_capability_name}* 1+| 4 | {OpJointMatrixWorkItemLengthINTEL_token} | '' + 'Result Type' @@ -328,38 +492,80 @@ number of the components. + 'Matrix' |===== -==== 3.37.13. Arithmetic Instructions +[cols="1,1,4*3",width="100%"] +|===== +5+|[[OpJointMatrixGetElementCoordINTEL]]*OpJointMatrixGetElementCoordINTEL* + + + +Returns (Row, Column) coordinate of dynamically selected element of a matrix. + + + +'Result Type' must be an integer 2-elements vector, where the first component +contains the row with the selected element, and the second element contains the +column with the selected element. + + + +'Matrix' is an ID of <>. +The instruction returns the element's coordinate of this joint matrix type. + + + +'Index' must be a 'scalar integer'. It is interpreted as an index into the list +of components owned by this work-item in the joint matrix. The behavior is +undefined if 'Index' is less than zero or greater than or equal to the number +that <> +returns for this work-item. + + + + +1+|Capability: + +*{wi_capability_name}* +1+| 5 | {OpJointMatrixGetElementCoordINTEL_token} +| '' + +'Result Type' +| 'Result ' +| '' + +'Matrix' +| '' + +'Index' +|===== + +==== 3.42.13. Arithmetic Instructions -[cols="1,1,6*3",width="100%"] +[cols="1,1,5*3",width="100%"] |===== -7+|[[OpJointMatrixMadINTEL]]*OpJointMatrixMadINTEL* + +6+|[[OpJointMatrixMadINTEL]]*OpJointMatrixMadINTEL* + + Multiply matrix 'A' by matrix 'B' and add matrix 'C' to the result of the -multiplication: `A*B+C`. Here 'A' is a `M x K` matrix, 'B' is a `K x N` +multiplication: `A x B + C`. Here 'A' is a `M x K` matrix, 'B' is a `K x N` matrix and 'C' is a `M x N` matrix. + + -Behavior is undefined if sizes of operands do not meet the conditions above. +It is invalid to have sizes of operands that do not meet the conditions above. All operands and the 'Result Type' must be <>. + + 'A' must be a <> whose -'Component Type' is a signed 'numerical type', 'Row Count' equals to 'M' and -'Column Count' equals to 'K' + +'Row Count' equals to 'M' and 'Column Count' equals to 'K'. +'Use' argument of matrix type must be 'MatrixA'. + + 'B' must be a <> whose -'Component Type' is a signed 'numerical type', 'Row Count' equals to 'K' and -'Column Count' equals to 'N' + +'Row Count' equals to 'K' and 'Column Count' equals to 'N'. +'Use' argument of matrix type must be 'MatrixB'. + + 'C' and 'Result Type' must be a -<> with 'Row Count' equals to -'M' and 'Column Count' equals to 'N' + +<> with 'Row Count' equals +to 'M' and 'Column Count' equals to 'N'. 'Use' argument of joint matrix type +must be 'Accumulator'. + + + +'Scope' of 'A', 'B', 'C' and 'Result' matrices must match. + + +All invocations in a given 'Scope' instance must be active or all must be +inactive. + + +Behavior is undefined if not all invocations of this module within 'Scope' of +'Result' reach this point of execution. + + + +Behavior is undefined unless all invocations within 'Scope' of 'Result' +execute the same dynamic instance of this instruction. + + -'Scope' is syncronization scope for operation on the matrix. It must be the -result of a constant instruction with scalar 'integer type'. + 1+|Capability: + -*{capability_name}* -1+| 7 | {OpJointMatrixMadINTEL_token} +*{main_capability_name}* +1+| 6 | {OpJointMatrixMadINTEL_token} | '' + 'Result Type' |'Result ' @@ -369,40 +575,49 @@ result of a constant instruction with scalar 'integer type'. + 'B' | '' + 'C' -| '' + -'Scope' |===== -[cols="1,1,6*3",width="100%"] +[cols="1,1,5*3",width="100%"] |===== -7+|[[OpJointMatrixSUMadINTEL]]*OpJointMatrixSUMadINTEL* + +6+|[[OpJointMatrixSUMadINTEL]]*OpJointMatrixSUMadINTEL* + + Multiply matrix 'A' by matrix 'B' and add matrix 'C' to the result of the -multiplication: `A*B+C`. Here 'A' is a `M x K` matrix, 'B' is a `K x N` +multiplication: `A x B + C`. Here 'A' is a `M x K` matrix, 'B' is a `K x N` matrix and 'C' is a `M x N` matrix. + + -Behavior is undefined if sizes of operands do not meet the conditions above. +It is invalid to have sizes of operands that do not meet the conditions above. All operands and the 'Result Type' must be <>. + + 'A' must be a <> whose -'Component Type' is a signed 'numerical type', 'Row Count' equals to 'M' and -'Column Count' equals to 'K' + +'Component Type' is signed 'integer type', 'Row Count' equals to 'M' and +'Column Count' equals to 'K'. 'Use' argument of matrix type must be 'MatrixA'. + + 'B' must be a <> whose -'Component Type' is an unsigned 'numerical type', 'Row Count' equals to 'K' and -'Column Count' equals to 'N' + +'Component Type' is unsigned 'integer type', 'Row Count' equals to 'K' +and 'Column Count' equals to 'N'. 'Use' argument of joint matrix type must be +'MatrixB'. + + 'C' and 'Result Type' must be a -<> with 'Row Count' equals to -'M' and 'Column Count' equals to 'N' + +<> with signed 'integer type' +'Component Type', 'Row Count' equals to 'M' and 'Column Count' equals to 'N'. +'Use' argument of joint matrix type must be 'Accumulator'. + + + +'Scope' of 'A', 'B', 'C' and 'Result' matrices must match. + + +All invocations in a given 'Scope' instance must be active or all must be +inactive. + + +Behavior is undefined if not all invocations of this module within 'Scope' of +'Result' reach this point of execution. + + + +Behavior is undefined unless all invocations within 'Scope' of 'Result' +execute the same dynamic instance of this instruction. + + -'Scope' is syncronization scope for operation on the matrix. It must be the -result of a constant instruction with scalar 'integer type'. + 1+|Capability: + -*{capability_name}* -1+| 7 | {OpJointMatrixSUMadINTEL_token} +*{main_capability_name}* +1+| 6 | {OpJointMatrixSUMadINTEL_token} | '' + 'Result Type' |'Result ' @@ -412,40 +627,48 @@ result of a constant instruction with scalar 'integer type'. + 'B' | '' + 'C' -| '' + -'Scope' |===== -[cols="1,1,6*3",width="100%"] +[cols="1,1,5*3",width="100%"] |===== -7+|[[OpJointMatrixUSMadINTEL]]*OpJointMatrixUSMadINTEL* + +6+|[[OpJointMatrixUSMadINTEL]]*OpJointMatrixUSMadINTEL* + + Multiply matrix 'A' by matrix 'B' and add matrix 'C' to the result of the -multiplication: `A*B+C`. Here 'A' is a `M x K` matrix, 'B' is a `K x N` +multiplication: `A x B + C`. Here 'A' is a `M x K` matrix, 'B' is a `K x N` matrix and 'C' is a `M x N` matrix. + + -Behavior is undefined if sizes of operands do not meet the conditions above. +It is invalid to have sizes of operands that do not meet the conditions above. All operands and the 'Result Type' must be <>. + + 'A' must be a <> whose -'Component Type' is an unsigned 'numerical type', 'Row Count' equals to 'M' and -'Column Count' equals to 'K' + +'Component Type' is unsigned 'integer type', 'Row Count' equals to 'M' and +'Column Count' equals to 'K'. 'Use' argument of joint matrix type must be 'MatrixA'. + + 'B' must be a <> whose -'Component Type' is a signed 'numerical type', 'Row Count' equals to 'K' and -'Column Count' equals to 'N' + +'Component Type' is signed 'integer type', 'Row Count' equals to 'K' and +'Column Count' equals to 'N'. 'Use' argument of matrix type must be 'MatrixB'. + + 'C' and 'Result Type' must be a -<> with 'Row Count' equals to -'M' and 'Column Count' equals to 'N' + +<> with signed 'integer type' +'Component Type', 'Row Count' equals to 'M' and 'Column Count' equals to 'N'. +'Use' argument of joint matrix type must be 'Accumulator'. + + + +'Scope' of 'A', 'B', 'C' and 'Result' matrices must match. + + +All invocations in a given 'Scope' instance must be active or all must be +inactive. + + +Behavior is undefined if not all invocations of this module within 'Scope' of +'Result' reach this point of execution. + + + +Behavior is undefined unless all invocations within 'Scope' of 'Result' +execute the same dynamic instance of this instruction. + + -'Scope' is syncronization scope for operation on the matrix. It must be the -result of a constant instruction with scalar 'integer type'. + 1+|Capability: + -*{capability_name}* -1+| 7 | {OpJointMatrixUSMadINTEL_token} +*{main_capability_name}* +1+| 6 | {OpJointMatrixUSMadINTEL_token} | '' + 'Result Type' |'Result ' @@ -455,40 +678,48 @@ result of a constant instruction with scalar 'integer type'. + 'B' | '' + 'C' -| '' + -'Scope' |===== -[cols="1,1,6*3",width="100%"] +[cols="1,1,5*3",width="100%"] |===== -7+|[[OpJointMatrixUUMadINTEL]]*OpJointMatrixUUMadINTEL* + +6+|[[OpJointMatrixUUMadINTEL]]*OpJointMatrixUUMadINTEL* + + Multiply matrix 'A' by matrix 'B' and add matrix 'C' to the result of the -multiplication: `A*B+C`. Here 'A' is a `M x K` matrix, 'B' is a `K x N` +multiplication: `A x B + C`. Here 'A' is a `M x K` matrix, 'B' is a `K x N` matrix and 'C' is a `M x N` matrix. + + -Behavior is undefined if sizes of operands do not meet the conditions above. +It is invalid to have sizes of operands that do not meet the conditions above. All operands and the 'Result Type' must be <>. + + 'A' must be a <> whose -'Component Type' is an unsigned 'numerical type', 'Row Count' equals to 'M' and -'Column Count' equals to 'K' + +'Component Type' is unsigned 'integer type', 'Row Count' equals to 'M' and +'Column Count' equals to 'K'. 'Use' argument of joint matrix type must be 'MatrixA'. + + 'B' must be a <> whose -'Component Type' is an unsigned 'numerical type', 'Row Count' equals to 'K' and -'Column Count' equals to 'N' + +'Component Type' is unsigned 'integer type', 'Row Count' equals to 'K' and +'Column Count' equals to 'N'. 'Use' argument of joint matrix type must be 'MatrixB'. + + 'C' and 'Result Type' must be a -<> with 'Row Count' equals to -'M' and 'Column Count' equals to 'N' + +<> with unsigned 'integer type' +'Component Type', 'Row Count' equals to 'M' and 'Column Count' equals to 'N'. +'Use' argument of joint matrix type must be 'Accumulator'. + + + +'Scope' of 'A', 'B', 'C' and 'Result' matrices must match. + + +All invocations in a given 'Scope' instance must be active or all must be +inactive. + + +Behavior is undefined if not all invocations of this module within 'Scope' of +'Result' reach this point of execution. + + + +Behavior is undefined unless all invocations within 'Scope' of 'Result' +execute the same dynamic instance of this instruction. + + -'Scope' is syncronization scope for operation on the matrix. It must be the -result of a constant instruction with scalar 'integer type'. + 1+|Capability: + -*{capability_name}* -1+| 7 | {OpJointMatrixUUMadINTEL_token} +*{main_capability_name}* +1+| 6 | {OpJointMatrixUUMadINTEL_token} | '' + 'Result Type' |'Result ' @@ -498,17 +729,8 @@ result of a constant instruction with scalar 'integer type'. + 'B' | '' + 'C' -| '' + -'Scope' |===== -=== 3.42.12. Composite Instructions - -Modify OpCompositeConstruct to make an exception for joint matrix types: -"If the 'Result Type' is <> and -there is only one 'Constituent', it will be used to initialize all elements of -the matrix." - === Issues None @@ -525,4 +747,10 @@ Revision History |2|2021-09-06|Dmitry Sidorov|Split OpJointMatrixMadINTEL instruction into 4 |3|2021-12-28|Dmitry Sidorov|Add Joint Matrix to Composite definition |4|2022-03-10|Dmitry Sidorov|Add OpJointMatrixWorkItemLengthINTEL instruction +|5|2022-04-01|Dmitry Sidorov|Add Use parameter to TypeJointMatrixINTEL +|6|2022-09-07|Dmitry Sidorov|Make Use parameter to be mandatory +|7|2022-10-13|Dmitry Sidorov|Add ComponentTypeInterpretation decoration and OpJointMatrixGetElementCoordINTEL +|8|2022-12-02|Dmitry Sidorov|Remove Scope from the instructions and Layout from the type +|9|2022-12-07|Dmitry Sidorov|Split main capability into 3 +|10|2023-02-01|Dmitry Sidorov|Move ComponentTypeInterpretation to an optional type parameter |========================================