Skip to content

Commit 621fcf8

Browse files
authored
[mlir][OpenMP] rewrite conversion of privatisation for omp.parallel (#111844)
The existing conversion inlined private alloc regions and firstprivate copy regions in mlir, then undoing the modification of the mlir module before completing the conversion. To make this work, LLVM IR had to be generated using the wrong mapping for privatised values and then later fixed inside of OpenMPIRBuilder. This approach violated an assumption in OpenMPIRBuilder that private variables would be values not constants. Flang sometimes generates code where private variables are promoted to globals, the address of which is treated as a constant in LLVM IR. This caused the incorrect values for the private variable from being replaced by OpenMPIRBuilder: ultimately resulting in programs producing incorrect results. This patch rewrites delayed privatisation for omp.parallel to work more similarly to reductions: translating directly into LLVMIR with correct mappings for private variables. RFC: https://discourse.llvm.org/t/rfc-openmp-fix-issue-in-mlir-to-llvmir-translation-for-delayed-privatisation/81225 Tested against the gfortran testsuite and our internal test suite. Linaro's post-commit bots will check against the fujitsu test suite. I decided to add the new tests as flang integration tests rather than in mlir/test/Target/LLVMIR: - The regression test is for an issue filed against flang. i wanted to keep the reproducer similar to the code in the ticket. - I found the "worst case" CFG test difficult to reason about in abstract it helped me to think about what was going on in terms of a Fortran program. Fixes #106297
1 parent a3010c7 commit 621fcf8

File tree

5 files changed

+490
-263
lines changed

5 files changed

+490
-263
lines changed
Lines changed: 262 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,262 @@
1+
! RUN: %flang_fc1 -fopenmp -emit-llvm %s -o - | FileCheck %s
2+
3+
! Combinational testing of control flow graph and builder insertion points
4+
! in mlir-to-llvm conversion:
5+
! - mixing multiple delayed privatizations and multiple reductions
6+
! - multiple blocks in the private alloc region
7+
! - private alloc region has to read from the mold variable
8+
! - firstprivate
9+
! - multiple blocks in the private copy region
10+
! - multiple blocks in the reduction init region
11+
! - reduction init region has to read from the mold variable
12+
! - re-used omp.private ops
13+
! - re-used omp.reduction.declare ops
14+
! - unstructured code inside of the parallel region
15+
! - needs private dealloc region, and this has multiple blocks
16+
! - needs reduction cleanup region, and this has multiple blocks
17+
18+
! This maybe belongs in the mlir tests, but what we are doing here is complex
19+
! enough that I find the kind of minimised mlir code preferred by mlir reviewers
20+
! hard to read without some fortran here for reference. Nothing like this would
21+
! be generated by other upstream users of the MLIR OpenMP dialect.
22+
23+
subroutine worst_case(a, b, c, d)
24+
real, allocatable :: a(:), b(:), c(:), d(:)
25+
integer i
26+
27+
!$omp parallel firstprivate(a,b) reduction(+:c,d)
28+
if (sum(a) == 1) stop 1
29+
!$omp end parallel
30+
end subroutine
31+
32+
! CHECK-LABEL: define internal void @worst_case_..omp_par
33+
! CHECK-NEXT: omp.par.entry:
34+
! [reduction alloc regions inlined here]
35+
! CHECK: br label %omp.private.latealloc
36+
37+
! CHECK: omp.private.latealloc: ; preds = %omp.par.entry
38+
! CHECK-NEXT: br label %omp.private.alloc5
39+
40+
! CHECK: omp.private.alloc5: ; preds = %omp.private.latealloc
41+
! [begin private alloc for first var]
42+
! [read the length from the mold argument]
43+
! [if it is non-zero...]
44+
! CHECK: br i1 {{.*}}, label %omp.private.alloc6, label %omp.private.alloc7
45+
46+
! CHECK: omp.private.alloc7: ; preds = %omp.private.alloc5
47+
! [finish private alloc for first var with zero extent]
48+
! CHECK: br label %omp.private.alloc8
49+
50+
! CHECK: omp.private.alloc8: ; preds = %omp.private.alloc6, %omp.private.alloc7
51+
! CHECK-NEXT: br label %omp.region.cont4
52+
53+
! CHECK: omp.region.cont4: ; preds = %omp.private.alloc8
54+
! CHECK-NEXT: %{{.*}} = phi ptr
55+
! CHECK-NEXT: br label %omp.private.alloc
56+
57+
! CHECK: omp.private.alloc: ; preds = %omp.region.cont4
58+
! [begin private alloc for first var]
59+
! [read the length from the mold argument]
60+
! [if it is non-zero...]
61+
! CHECK: br i1 %{{.*}}, label %omp.private.alloc1, label %omp.private.alloc2
62+
63+
! CHECK: omp.private.alloc2: ; preds = %omp.private.alloc
64+
! [finish private alloc for second var with zero extent]
65+
! CHECK: br label %omp.private.alloc3
66+
67+
! CHECK: omp.private.alloc3: ; preds = %omp.private.alloc1, %omp.private.alloc2
68+
! CHECK-NEXT: br label %omp.region.cont
69+
70+
! CHECK: omp.region.cont: ; preds = %omp.private.alloc3
71+
! CHECK-NEXT: %{{.*}} = phi ptr
72+
! CHECK-NEXT: br label %omp.private.copy
73+
74+
! CHECK: omp.private.copy: ; preds = %omp.region.cont
75+
! CHECK-NEXT: br label %omp.private.copy10
76+
77+
! CHECK: omp.private.copy10: ; preds = %omp.private.copy
78+
! [begin firstprivate copy for first var]
79+
! [read the length, is it non-zero?]
80+
! CHECK: br i1 %{{.*}}, label %omp.private.copy11, label %omp.private.copy12
81+
82+
! CHECK: omp.private.copy12: ; preds = %omp.private.copy11, %omp.private.copy10
83+
! CHECK-NEXT: br label %omp.region.cont9
84+
85+
! CHECK: omp.region.cont9: ; preds = %omp.private.copy12
86+
! CHECK-NEXT: %{{.*}} = phi ptr
87+
! CHECK-NEXT: br label %omp.private.copy14
88+
89+
! CHECK: omp.private.copy14: ; preds = %omp.region.cont9
90+
! [begin firstprivate copy for second var]
91+
! [read the length, is it non-zero?]
92+
! CHECK: br i1 %{{.*}}, label %omp.private.copy15, label %omp.private.copy16
93+
94+
! CHECK: omp.private.copy16: ; preds = %omp.private.copy15, %omp.private.copy14
95+
! CHECK-NEXT: br label %omp.region.cont13
96+
97+
! CHECK: omp.region.cont13: ; preds = %omp.private.copy16
98+
! CHECK-NEXT: %{{.*}} = phi ptr
99+
! CHECK-NEXT: br label %omp.reduction.init
100+
101+
! CHECK: omp.reduction.init: ; preds = %omp.region.cont13
102+
! [deffered stores for results of reduction alloc regions]
103+
! CHECK: br label %[[VAL_96:.*]]
104+
105+
! CHECK: omp.reduction.neutral: ; preds = %omp.reduction.init
106+
! [start of reduction initialization region]
107+
! [null check:]
108+
! CHECK: br i1 %{{.*}}, label %omp.reduction.neutral18, label %omp.reduction.neutral19
109+
110+
! CHECK: omp.reduction.neutral19: ; preds = %omp.reduction.neutral
111+
! [malloc and assign the default value to the reduction variable]
112+
! CHECK: br label %omp.reduction.neutral20
113+
114+
! CHECK: omp.reduction.neutral20: ; preds = %omp.reduction.neutral18, %omp.reduction.neutral19
115+
! CHECK-NEXT: br label %omp.region.cont17
116+
117+
! CHECK: omp.region.cont17: ; preds = %omp.reduction.neutral20
118+
! CHECK-NEXT: %{{.*}} = phi ptr
119+
! CHECK-NEXT: br label %omp.reduction.neutral22
120+
121+
! CHECK: omp.reduction.neutral22: ; preds = %omp.region.cont17
122+
! [start of reduction initialization region]
123+
! [null check:]
124+
! CHECK: br i1 %{{.*}}, label %omp.reduction.neutral23, label %omp.reduction.neutral24
125+
126+
! CHECK: omp.reduction.neutral24: ; preds = %omp.reduction.neutral22
127+
! [malloc and assign the default value to the reduction variable]
128+
! CHECK: br label %omp.reduction.neutral25
129+
130+
! CHECK: omp.reduction.neutral25: ; preds = %omp.reduction.neutral23, %omp.reduction.neutral24
131+
! CHECK-NEXT: br label %omp.region.cont21
132+
133+
! CHECK: omp.region.cont21: ; preds = %omp.reduction.neutral25
134+
! CHECK-NEXT: %{{.*}} = phi ptr
135+
! CHECK-NEXT: br label %omp.par.region
136+
137+
! CHECK: omp.par.region: ; preds = %omp.region.cont21
138+
! CHECK-NEXT: br label %omp.par.region27
139+
140+
! CHECK: omp.par.region27: ; preds = %omp.par.region
141+
! [call SUM runtime function]
142+
! [if (sum(a) == 1)]
143+
! CHECK: br i1 %{{.*}}, label %omp.par.region28, label %omp.par.region29
144+
145+
! CHECK: omp.par.region29: ; preds = %omp.par.region27
146+
! CHECK-NEXT: br label %omp.region.cont26
147+
148+
! CHECK: omp.region.cont26: ; preds = %omp.par.region28, %omp.par.region29
149+
! [omp parallel region done, call into the runtime to complete reduction]
150+
! CHECK: %[[VAL_233:.*]] = call i32 @__kmpc_reduce(
151+
! CHECK: switch i32 %[[VAL_233]], label %reduce.finalize [
152+
! CHECK-NEXT: i32 1, label %reduce.switch.nonatomic
153+
! CHECK-NEXT: i32 2, label %reduce.switch.atomic
154+
! CHECK-NEXT: ]
155+
156+
! CHECK: reduce.switch.atomic: ; preds = %omp.region.cont26
157+
! CHECK-NEXT: unreachable
158+
159+
! CHECK: reduce.switch.nonatomic: ; preds = %omp.region.cont26
160+
! CHECK-NEXT: %[[red_private_value_0:.*]] = load ptr, ptr %{{.*}}, align 8
161+
! CHECK-NEXT: br label %omp.reduction.nonatomic.body
162+
163+
! [various blocks implementing the reduction]
164+
165+
! CHECK: omp.region.cont35: ; preds =
166+
! CHECK-NEXT: %{{.*}} = phi ptr
167+
! CHECK-NEXT: call void @__kmpc_end_reduce(
168+
! CHECK-NEXT: br label %reduce.finalize
169+
170+
! CHECK: reduce.finalize: ; preds =
171+
! CHECK-NEXT: br label %omp.par.pre_finalize
172+
173+
! CHECK: omp.par.pre_finalize: ; preds = %reduce.finalize
174+
! CHECK-NEXT: %{{.*}} = load ptr, ptr
175+
! CHECK-NEXT: br label %omp.reduction.cleanup
176+
177+
! CHECK: omp.reduction.cleanup: ; preds = %omp.par.pre_finalize
178+
! [null check]
179+
! CHECK: br i1 %{{.*}}, label %omp.reduction.cleanup41, label %omp.reduction.cleanup42
180+
181+
! CHECK: omp.reduction.cleanup42: ; preds = %omp.reduction.cleanup41, %omp.reduction.cleanup
182+
! CHECK-NEXT: br label %omp.region.cont40
183+
184+
! CHECK: omp.region.cont40: ; preds = %omp.reduction.cleanup42
185+
! CHECK-NEXT: %{{.*}} = load ptr, ptr
186+
! CHECK-NEXT: br label %omp.reduction.cleanup44
187+
188+
! CHECK: omp.reduction.cleanup44: ; preds = %omp.region.cont40
189+
! [null check]
190+
! CHECK: br i1 %{{.*}}, label %omp.reduction.cleanup45, label %omp.reduction.cleanup46
191+
192+
! CHECK: omp.reduction.cleanup46: ; preds = %omp.reduction.cleanup45, %omp.reduction.cleanup44
193+
! CHECK-NEXT: br label %omp.region.cont43
194+
195+
! CHECK: omp.region.cont43: ; preds = %omp.reduction.cleanup46
196+
! CHECK-NEXT: br label %omp.private.dealloc
197+
198+
! CHECK: omp.private.dealloc: ; preds = %omp.region.cont43
199+
! [null check]
200+
! CHECK: br i1 %{{.*}}, label %omp.private.dealloc48, label %omp.private.dealloc49
201+
202+
! CHECK: omp.private.dealloc49: ; preds = %omp.private.dealloc48, %omp.private.dealloc
203+
! CHECK-NEXT: br label %omp.region.cont47
204+
205+
! CHECK: omp.region.cont47: ; preds = %omp.private.dealloc49
206+
! CHECK-NEXT: br label %omp.private.dealloc51
207+
208+
! CHECK: omp.private.dealloc51: ; preds = %omp.region.cont47
209+
! [null check]
210+
! CHECK: br i1 %{{.*}}, label %omp.private.dealloc52, label %omp.private.dealloc53
211+
212+
! CHECK: omp.private.dealloc53: ; preds = %omp.private.dealloc52, %omp.private.dealloc51
213+
! CHECK-NEXT: br label %omp.region.cont50
214+
215+
! CHECK: omp.region.cont50: ; preds = %omp.private.dealloc53
216+
! CHECK-NEXT: br label %omp.par.outlined.exit.exitStub
217+
218+
! CHECK: omp.private.dealloc52: ; preds = %omp.private.dealloc51
219+
! [dealloc memory]
220+
! CHECK: br label %omp.private.dealloc53
221+
222+
! CHECK: omp.private.dealloc48: ; preds = %omp.private.dealloc
223+
! [dealloc memory]
224+
! CHECK: br label %omp.private.dealloc49
225+
226+
! CHECK: omp.reduction.cleanup45: ; preds = %omp.reduction.cleanup44
227+
! CHECK-NEXT: call void @free(
228+
! CHECK-NEXT: br label %omp.reduction.cleanup46
229+
230+
! CHECK: omp.reduction.cleanup41: ; preds = %omp.reduction.cleanup
231+
! CHECK-NEXT: call void @free(
232+
! CHECK-NEXT: br label %omp.reduction.cleanup42
233+
234+
! CHECK: omp.par.region28: ; preds = %omp.par.region27
235+
! CHECK-NEXT: call {} @_FortranAStopStatement
236+
237+
! CHECK: omp.reduction.neutral23: ; preds = %omp.reduction.neutral22
238+
! [source length was zero: finish initializing array]
239+
! CHECK: br label %omp.reduction.neutral25
240+
241+
! CHECK: omp.reduction.neutral18: ; preds = %omp.reduction.neutral
242+
! [source length was zero: finish initializing array]
243+
! CHECK: br label %omp.reduction.neutral20
244+
245+
! CHECK: omp.private.copy15: ; preds = %omp.private.copy14
246+
! [source length was non-zero: call assign runtime]
247+
! CHECK: br label %omp.private.copy16
248+
249+
! CHECK: omp.private.copy11: ; preds = %omp.private.copy10
250+
! [source length was non-zero: call assign runtime]
251+
! CHECK: br label %omp.private.copy12
252+
253+
! CHECK: omp.private.alloc1: ; preds = %omp.private.alloc
254+
! [var extent was non-zero: malloc a private array]
255+
! CHECK: br label %omp.private.alloc3
256+
257+
! CHECK: omp.private.alloc6: ; preds = %omp.private.alloc5
258+
! [var extent was non-zero: malloc a private array]
259+
! CHECK: br label %omp.private.alloc8
260+
261+
! CHECK: omp.par.outlined.exit.exitStub: ; preds = %omp.region.cont50
262+
! CHECK-NEXT: ret void
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
!RUN: %flang_fc1 -emit-llvm -fopenmp %s -o - | FileCheck %s
2+
3+
! Regression test for https://github.com/llvm/llvm-project/issues/106297
4+
5+
program bug
6+
implicit none
7+
integer :: table(10)
8+
!$OMP PARALLEL PRIVATE(table)
9+
table = 50
10+
if (any(table/=50)) then
11+
stop 'fail 3'
12+
end if
13+
!$OMP END PARALLEL
14+
print *,'ok'
15+
End Program
16+
17+
18+
! CHECK-LABEL: define internal void {{.*}}..omp_par(
19+
! CHECK: omp.par.entry:
20+
! CHECK: %[[VAL_9:.*]] = alloca i32, align 4
21+
! CHECK: %[[VAL_10:.*]] = load i32, ptr %[[VAL_11:.*]], align 4
22+
! CHECK: store i32 %[[VAL_10]], ptr %[[VAL_9]], align 4
23+
! CHECK: %[[VAL_12:.*]] = load i32, ptr %[[VAL_9]], align 4
24+
! CHECK: %[[PRIV_TABLE:.*]] = alloca [10 x i32], i64 1, align 4
25+
! ...
26+
! check that we use the private copy of table for the assignment
27+
! CHECK: omp.par.region1:
28+
! CHECK: %[[ELEMENTAL_TMP:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
29+
! CHECK: %[[TABLE_BOX_ADDR:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
30+
! CHECK: %[[BOXED_FIFTY:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }, align 8
31+
! CHECK: %[[TABLE_BOX_ADDR2:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
32+
! CHECK: %[[TABLE_BOX_VAL:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } { ptr undef, i64 ptrtoint (ptr getelementptr (i32, ptr null, i32 1) to i64), i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]] {{\[\[}}3 x i64] [i64 1, i64 10, i64 ptrtoint (ptr getelementptr (i32, ptr null, i32 1) to i64)]] }, ptr %[[PRIV_TABLE]], 0
33+
! CHECK: store { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[TABLE_BOX_VAL]], ptr %[[TABLE_BOX_ADDR]], align 8
34+
! CHECK: %[[TABLE_BOX_VAL2:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[TABLE_BOX_ADDR]], align 8
35+
! CHECK: store { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[TABLE_BOX_VAL2]], ptr %[[TABLE_BOX_ADDR2]], align 8
36+
! CHECK: %[[VAL_26:.*]] = call {} @_FortranAAssign(ptr %[[TABLE_BOX_ADDR2]], ptr %[[BOXED_FIFTY]], ptr @{{.*}}, i32 9)
37+
! ...
38+
! check that we use the private copy of table for table/=50
39+
! CHECK: omp.par.region3:
40+
! CHECK: %[[VAL_44:.*]] = sub nsw i64 %{{.*}}, 1
41+
! CHECK: %[[VAL_45:.*]] = mul nsw i64 %[[VAL_44]], 1
42+
! CHECK: %[[VAL_46:.*]] = mul nsw i64 %[[VAL_45]], 1
43+
! CHECK: %[[VAL_47:.*]] = add nsw i64 %[[VAL_46]], 0
44+
! CHECK: %[[VAL_48:.*]] = getelementptr i32, ptr %[[PRIV_TABLE]], i64 %[[VAL_47]]
45+
! CHECK: %[[VAL_49:.*]] = load i32, ptr %[[VAL_48]], align 4
46+
! CHECK: %[[VAL_50:.*]] = icmp ne i32 %[[VAL_49]], 50

0 commit comments

Comments
 (0)