Add Gemm+Elementwise+Gemm support #1774

dhernandez0 · 2025-03-12T12:17:18Z

This PR introduces GEMM+GEMM fusion. There are some things left for future PRs:

extend allowed types
Jenkins GEMM+GEMM (tuna-script.sh, perfRegression report, etc)
CK benchmark script
conv+gemm

I've created an epic for the pending tasks: https://github.com/ROCm/rocMLIR-internal/issues/1791

I've done an initial perf comparison of these GEMM+GEMM problems:

-transO false -transC false -transB true -transA false -t f32 -g 32 -m 8192 -n 8192 -k 128 -gemmO 128
-transO false -transC false -transB true -transA false -t f32 -g 16 -m 16384 -n 16384 -k 128 -gemmO 128

Which correspond to these 4 GEMMs:

#-transO false -transC false -transB true -transA false -t f32 -g 32 -m 8192 -n 8192 -k 128 -gemmO 128
-transB true -transA false -g 32 -m 8192 -n 8192 -k 128 -t f32 -out_datatype f32
-transB false -transA false -g 32 -m 8192 -n 128 -k 8192 -t f32 -out_datatype f32
#-transO false -transC false -transB true -transA false -t f32 -g 16 -m 16384 -n 16384 -k 128 -gemmO 128
-transB true -transA false -g 16 -m 16384 -n 16384 -k 128 -t f32 -out_datatype f32
-transB false -transA false -g 16 -m 16384 -n 128 -k 16384 -t f32 -out_datatype f32

Using tuningRunner and perfRunner I get the following run-times:

run-time (ms)	PR	2 GEMMs
Problem 1	8.9470	10.9019
Problem 2	17.5556	22.1153

mlir/include/mlir/Dialect/Rock/IR/RockOps.td

mlir/lib/Dialect/Rock/IR/RockDialect.cpp

mlir/lib/Dialect/Rock/Transforms/AffixTuningParameters.cpp

mlir/lib/Dialect/Rock/Transforms/GridwiseGemmToBlockwise.cpp

umangyadav · 2025-03-12T13:28:03Z

mlir/lib/Dialect/Rock/Transforms/SortDimensionsMemoryLayout.cpp

+  RewritePatternSet patternsGemmElementwiseGemm(&ctx);
+  patternsGemmElementwiseGemm.add<GemmElementwiseGemmRewritePattern>(&ctx);
+  if (failed(applyOpPatternsGreedily(
+          getOperations<rock::GemmElementwiseGemmOp>(func),
+          std::move(patternsGemmElementwiseGemm), config)))


I wonder instead of applying each patterns separately if you can provide ranking as did in TosaToRock.

or if there is some other way

The issue here is that we are replacing (for example) GemmOp by another GemmOp. So, we need to use GreedyRewriteStrictness::ExistingOps, applyOpPatternsGreedily and pass exactly the operations that need to run the pass. Otherwise, the matcher will match the new GemmOp again and we'll go into an infinite loop.

It might work by passing all operations and all rewrite patterns, but I think that can be done in another clean up PR, I'm already doing too many changes here.

mlir/utils/performance/perfRunner.py

mlir/utils/performance/reportUtils.py

mlir/lib/Dialect/Rock/Transforms/GridwiseGemmToBlockwise.cpp

codecov · 2025-03-25T11:12:34Z

Codecov Report

Attention: Patch coverage is 67.18547% with 253 lines in your changes missing coverage. Please review.

Project coverage is 78.37%. Comparing base (1fb495e) to head (8d27a5c).
Report is 2 commits behind head on develop.

Files with missing lines	Patch %	Lines
mlir/tools/rocmlir-gen/rocmlir-gen.cpp	72.56%	38 Missing and 24 partials ⚠️
mlir/lib/Dialect/Rock/Tuning/RockTuningImpl.cpp	34.52%	40 Missing and 15 partials ⚠️
mlir/lib/Dialect/Rock/IR/RockDialect.cpp	54.70%	49 Missing and 4 partials ⚠️
...ialect/Rock/Transforms/GridwiseGemmToBlockwise.cpp	75.74%	34 Missing and 15 partials ⚠️
...lir/lib/Dialect/Rock/Transforms/GemmToGridwise.cpp	77.77%	20 Missing and 6 partials ⚠️
mlir/include/mlir/Dialect/Rock/IR/GemmGemmSize.h	0.00%	5 Missing ⚠️
.../Dialect/Rock/Transforms/AffixTuningParameters.cpp	88.23%	2 Missing ⚠️
mlir/lib/Dialect/Rock/Transforms/Regularize.cpp	0.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1774      +/-   ##
===========================================
- Coverage    78.60%   78.37%   -0.23%     
===========================================
  Files           99      100       +1     
  Lines        29389    29768     +379     
  Branches      4379     4442      +63     
===========================================
+ Hits         23100    23332     +232     
- Misses        4492     4600     +108     
- Partials      1797     1836      +39

Flag	Coverage Δ
mfma	`78.37% <67.18%> (-0.23%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

umangyadav · 2025-03-28T18:24:52Z

mlir/include/mlir/Dialect/Rock/IR/RockOps.td

+    : Rock_Op<"gemm_elementwise_gemm", [DeclareOpInterfaceMethods<
+                                            MemoryEffectsOpInterface>,
+                                        RockFusionRoot]>,
+      AllElementTypesMatch<["a", "b", "c"]>,


what if elementwise op converts the data type in between ? In that case

a * b may generate output of different dtype than expected by c

we should check for allowed ElementWise Ops

see the comment below, we don't know if the element-wise tensors are used for indirect things (for example, a mask). So, forcing the type with AllElementTypesMatch to be the same would limit valid fusions.

But you are right, I'm not sure how this is verified for attention either, I think we probably fail when lowering instead of at rock.attention verifiers.

But you are right, I'm not sure how this is verified for attention either, I think we probably fail when lowering instead of at rock.attention verifiers.

It is better to fail during verification than during lowering IMO.

I agree, I'd prefer to fix this in the future, because it affects attention as well. Also, in practice I'm not sure if this is an issue when lowering from migraphx or only in hand-crafted rock IR kernels.

umangyadav · 2025-03-28T18:25:53Z

mlir/include/mlir/Dialect/Rock/IR/RockOps.td

+      AllElementTypesMatch<["a", "b", "c"]>,
+      Arguments<(ins TensorOrMemRefOf<[F32]>:$a, TensorOrMemRefOf<[F32]>:$b,
+          TensorOrMemRefOf<[F32]>:$c,
+          Variadic<AnyTensorOrMemRef>:$elemwiseInputs,


I think dtypes that are allowed for the elementwise inputs will be tied to dtypes that GEMM expects later on.

Therefore it can't be AnyTensorOrMemRef

you can have a tensor that is used to indirectly change a*b (for example causal attention for attention kernels). So, the tensors don't have to be the same type.

mlir/include/mlir/Dialect/Rock/IR/RockOps.td

mlir/lib/Dialect/Rock/Transforms/AffixTuningParameters.cpp

umangyadav · 2025-03-28T18:43:40Z

mlir/lib/Dialect/Rock/Transforms/AffixTuningParameters.cpp

+  Type elemTypeQ = cast<MemRefType>(op.getA().getType()).getElementType();
+  Type elemTypeK = cast<MemRefType>(op.getB().getType()).getElementType();
+  Type elemTypeV = cast<MemRefType>(op.getC().getType()).getElementType();


I was wondering perhaps we can classify both Attention & GemmEwiseGemm into
GemmPlusGemmWrapperInterface and use that instead whereever we want to apply same treatment for both Attention and GemmEwiseGemm.

You can then use dyn_cast on GemmPlusGemmWrapperInterface to check if it is attention or GemmEwGemm and based on that you can enable/disable softmax and other stuffs.

mlir/lib/Dialect/Rock/Transforms/GemmToGridwise.cpp

mlir/test/rocmlir-gen/attention-kernel.mlir

umangyadav · 2025-04-07T12:44:08Z

@dhernandez0 this PR has merge conflicts.

dhernandez0 · 2025-04-07T14:31:21Z

@dhernandez0 this PR has merge conflicts.

solved

mlir/tools/rocmlir-tuning-driver/rocmlir-tuning-driver.cpp

umangyadav · 2025-04-07T14:56:17Z

mlir/include/mlir/Dialect/Rock/IR/RockGemmGemmWrapperInterface.td

+        /*desc=*/[{
+          Set the tuning parameters attribute of the first GEMM
+
+          This is needed for --affix-tuning-params to work and can go away if it does


Can go away if it doesn't require it ?

sorry, this is just copy-paste from RockGemmWrapperInterface. Not sure what the context of this is, we use the method, so it can't go away currently. Should I remove the sentence?

Yes please remove the sentence

umangyadav · 2025-04-07T14:57:33Z

mlir/include/mlir/Dialect/Rock/IR/RockOps.td

  let summary = "Attention operation of transformer models";
  let description = [{
-    Performs the operation out = SOFTMAX((queries * keys) .* scale) * values.


I think it meant scale for preSoftmaxElemwise Ops . Do we want to keep it ?

you can see here: b1e26db
scale was an actual input for attentionop, I guess that was before pre-softmax fusion was implemented. So, it made sense to have scale there as it was an input. Now, scale is just one possible fusion. So, to me it doesn't make sense to have it in the description anymore. What do you think?

Perhaps we can mention it also does elementwise op before doing softmax.

mlir/include/mlir/Dialect/Rock/IR/RockOps.td

umangyadav · 2025-04-07T14:59:53Z

mlir/lib/Dialect/Rock/IR/RockDialect.cpp

+  return GemmGemmSize(g, m, k, n, o);
+}
+
+static LogicalResult verifyAttentionOp(RockGemmGemmWrapperInterface op,


nit:

Suggested change

static LogicalResult verifyAttentionOp(RockGemmGemmWrapperInterface op,

static LogicalResult verifyGemmPlusGemmLikeOp(RockGemmGemmWrapperInterface op,

umangyadav

I don't have any pressing comments in particular. Looks good.

dhernandez0 self-assigned this Mar 12, 2025

dhernandez0 requested a review from causten as a code owner March 12, 2025 12:17

dhernandez0 force-pushed the 1704-gemm-elementwise-gemm branch from a5dd470 to 0b3e9e1 Compare March 12, 2025 12:31

umangyadav reviewed Mar 12, 2025

View reviewed changes

dhernandez0 force-pushed the 1704-gemm-elementwise-gemm branch 3 times, most recently from 8943210 to ee9f776 Compare March 24, 2025 15:47

dhernandez0 changed the title ~~[DRAFT] Add Gemm+Elementwise+Gemm support~~ Add Gemm+Elementwise+Gemm support Mar 24, 2025

dhernandez0 mentioned this pull request Mar 24, 2025

GEMM+GEMM migraphx integration #1791

Merged

1 task

dhernandez0 requested review from stefankoncarevic, djramic, mirza-halilcevic and dorde-antic March 24, 2025 16:15

dhernandez0 force-pushed the 1704-gemm-elementwise-gemm branch 2 times, most recently from bd2c24e to 54237b0 Compare March 26, 2025 07:53

dhernandez0 mentioned this pull request Mar 28, 2025

gemm+gemm: extend allowed types #1795

Merged

2 tasks

umangyadav reviewed Mar 28, 2025

View reviewed changes

dhernandez0 force-pushed the 1704-gemm-elementwise-gemm branch from 3ae73c6 to a7a804e Compare April 7, 2025 14:22

umangyadav reviewed Apr 7, 2025

View reviewed changes

mlir/tools/rocmlir-tuning-driver/rocmlir-tuning-driver.cpp Show resolved Hide resolved

umangyadav reviewed Apr 7, 2025

View reviewed changes

mlir/include/mlir/Dialect/Rock/IR/RockOps.td Show resolved Hide resolved

umangyadav reviewed Apr 7, 2025

View reviewed changes

umangyadav approved these changes Apr 7, 2025

View reviewed changes

dhernandez0 force-pushed the 1704-gemm-elementwise-gemm branch from 3f02a22 to d7ad776 Compare April 8, 2025 10:09

Add Gemm+Elementwise+Gemm support

8d27a5c

dhernandez0 force-pushed the 1704-gemm-elementwise-gemm branch from 4821b84 to 8d27a5c Compare April 8, 2025 10:41

dhernandez0 merged commit ceae1ec into develop Apr 8, 2025
14 of 25 checks passed

dhernandez0 deleted the 1704-gemm-elementwise-gemm branch April 8, 2025 13:45

	static LogicalResult verifyAttentionOp(RockGemmGemmWrapperInterface op,
	static LogicalResult verifyGemmPlusGemmLikeOp(RockGemmGemmWrapperInterface op,

Add Gemm+Elementwise+Gemm support #1774

Add Gemm+Elementwise+Gemm support #1774

Uh oh!

Conversation

dhernandez0 commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

umangyadav commented Apr 7, 2025

Uh oh!

dhernandez0 commented Apr 7, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

umangyadav left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dhernandez0 commented Mar 12, 2025 •

edited

Loading

codecov bot commented Mar 25, 2025 •

edited

Loading