Skip to content

Commit 83bd76f

Browse files
Implementation of DPCTLQueue_Fill8/16/32/64/128 function to SyclInterface library (#831)
* Implementation of DPCTLQueue_Fill8/16/32/64/128 function Also added tests and docstrings. * genhtml uses --no-source to work around issue with non-existing integration headers * gen_coverage script runs llvm-cov rather than lcov-genhtml target, since HTMLs are not used for CI coverage collection * Added commended out use of -save-temps=obj Whenever C++ source code contains device code, DPC++ would perform two compilation passes (for host and for device portions) and combine these two into fat object file. In the process of doing this it generates temporary integration headers. So functions from these headers get instrumented and .lcov file would contain references to deleted header files, in the form of /tmp/dpctl_sycl_queue_integrface-header-hex.h Added -save-temps instructs DPC++ to keep all the temporaries around and makes these headers available for llvm-profdata and llvm-cov to use. The side-effect of using -save-temps=obj is that preprocessor output files are also kept around and referenced by instrumentation tools. These files are very large. So much so that generated .json file can not be uploaded to coverall service. Therefore the -save-temps=obj is commented out. May still be useful for manual profile collection. References to deleted header files are still present (and wreck llvm-lcov calls), but that can be addressed using llvm-cov option to ignore certain files (like /tmp/dpctl_*.h) * increase compute intensity of kern3 in the test * Fixed logic for collection of profiling binaries Since addition of --save-temps=obj, a slew of host/device object files were added and llvm-cov can not work with some of them. Since binaries can actually be shared objects, use that instead. Process traces for libtensor and libsyclinterface shared objects * llvm-cov target must export to dpctl.lcov for coveralls to work * added diagnostic to the workflow * Replaced use of numba-dpex in example with use of dpnp This is until we have GEMM implemented in dpctl * Use -ignore-filename-regex option of llvm-cov to exclude references to deleted integration headers Co-authored-by: Vladislav Perevezentsev <[email protected]>
2 parents f0a4f63 + d975549 commit 83bd76f

File tree

10 files changed

+546
-69
lines changed

10 files changed

+546
-69
lines changed

.github/workflows/conda-package.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -380,7 +380,7 @@ jobs:
380380
source $CONDA/etc/profile.d/conda.sh
381381
conda activate
382382
CHANNELS="-c $GITHUB_WORKSPACE/channel -c dppy/label/dev -c intel --override-channels"
383-
conda install -n examples -y $CHANNELS numba-dppy numpy dpctl || exit 1
383+
conda install -n examples -y $CHANNELS numpy dpctl dpnp || exit 1
384384
- name: Build and run examples with native extensions
385385
shell: bash -l {0}
386386
run: |

.github/workflows/generate-coverage.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,9 @@ jobs:
110110
fh3 = open('combined-dpctl-c-api-coverage.json', 'w'); \
111111
json.dump(f3, fh3); fh3.close()" || exit 1
112112
# merge combined file with coverage data and upload
113+
ls -lh dpctl-c-api-coverage.json pytest-dpctl-c-api-coverage.json \
114+
combined-dpctl-c-api-coverage.json \
115+
$(find _skbuild -name dpctl.lcov) $(find . -name dpctl_pytest.lcov)
113116
coveralls --service=github --merge=combined-dpctl-c-api-coverage.json
114117
env:
115118
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

dpctl/tests/test_sycl_kernel_submit.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,11 @@ def test_async_submit():
135135
" global unsigned int *arg2)"
136136
"{"
137137
" size_t index = get_global_id(0);"
138+
" size_t i = 0; "
139+
" size_t unused_sum = 0;"
140+
" for (i = 0; i < 4000; i++) { "
141+
" unused_sum += i;"
142+
" } "
138143
" res[index] = "
139144
" (arg1[index] < arg2[index]) ? arg1[index] : arg2[index];"
140145
"}"

examples/python/dppy_kernel.py renamed to examples/python/sycl_timer.py

Lines changed: 31 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -15,49 +15,50 @@
1515
# limitations under the License.
1616

1717

18-
import numba_dppy
18+
import dpnp
1919
import numpy as np
2020

2121
import dpctl
22+
import dpctl.tensor as dpt
2223
from dpctl import SyclTimer
2324

24-
25-
@numba_dppy.kernel
26-
def dppy_gemm(a, b, c):
27-
i = numba_dppy.get_global_id(0)
28-
j = numba_dppy.get_global_id(1)
29-
if i >= c.shape[0] or j >= c.shape[1]:
30-
return
31-
c[i, j] = 0
32-
for k in range(c.shape[0]):
33-
c[i, j] += a[i, k] * b[k, j]
34-
35-
36-
X = 1024
37-
Y = 16
38-
global_size = X, X
39-
40-
griddim = X, X
41-
blockdim = Y, Y
42-
43-
a = np.arange(X * X, dtype=np.float32).reshape(X, X)
44-
b = np.array(np.random.random(X * X), dtype=np.float32).reshape(X, X)
45-
c = np.ones_like(a).reshape(X, X)
25+
n = 4000
4626

4727
try:
48-
q = dpctl.SyclQueue("opencl:gpu", property="enable_profiling")
28+
q = dpctl.SyclQueue(property="enable_profiling")
4929
except dpctl.SyclQueueCreationError:
5030
print(
5131
"Skipping the example, as dpctl.SyclQueue targeting "
52-
"opencl:gpu device could not be created"
32+
"default device could not be created"
5333
)
5434
exit(0)
35+
36+
a = dpt.reshape(dpt.arange(n * n, dtype=np.float32, sycl_queue=q), (n, n))
37+
b = dpt.reshape(
38+
dpt.asarray(np.random.random(n * n), dtype=np.float32, sycl_queue=q), (n, n)
39+
)
40+
5541
timer = SyclTimer(time_scale=1)
56-
with dpctl.device_context(q):
42+
43+
wall_times = []
44+
device_times = []
45+
print(
46+
f"Performing matrix multiplication of two {n} by {n} matrices "
47+
f"on {q.sycl_device.name}, repeating 5 times."
48+
)
49+
for _ in range(5):
5750
with timer(q):
58-
dppy_gemm[griddim, blockdim](a, b, c)
59-
cc = np.dot(a, b)
51+
a_matmul_b = dpnp.matmul(a, b)
6052
host_time, device_time = timer.dt
53+
wall_times.append(host_time)
54+
device_times.append(device_time)
55+
56+
c = dpnp.asnumpy(a_matmul_b)
57+
cc = np.dot(dpnp.asnumpy(a), dpnp.asnumpy(b))
6158

62-
print("Wall time: ", host_time, "\nDevice time: ", device_time)
63-
print(np.allclose(c, cc))
59+
print("Wall time: ", wall_times, "\nDevice time: ", device_times)
60+
print(
61+
"Accuracy test: passed."
62+
if np.allclose(c, cc)
63+
else (f"Accuracy test: failed. Discrepancy {np.max(np.abs(c-cc))}")
64+
)

libsyclinterface/cmake/modules/SetupCoverage.cmake

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ function(setup_coverage_generation)
1111
"-fprofile-instr-generate "
1212
"-fcoverage-mapping "
1313
"-fno-sycl-use-footer "
14+
# "-save-temps=obj "
1415
"-DDPCTL_COVERAGE "
1516
)
1617

libsyclinterface/include/dpctl_sycl_queue_interface.h

Lines changed: 114 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ DPCTLQueue_GetDevice(__dpctl_keep const DPCTLSyclQueueRef QRef);
205205
* @ingroup QueueInterface
206206
*/
207207
DPCTL_API
208-
DPCTLSyclEventRef
208+
__dpctl_give DPCTLSyclEventRef
209209
DPCTLQueue_SubmitRange(__dpctl_keep const DPCTLSyclKernelRef KRef,
210210
__dpctl_keep const DPCTLSyclQueueRef QRef,
211211
__dpctl_keep void **Args,
@@ -253,7 +253,7 @@ DPCTLQueue_SubmitRange(__dpctl_keep const DPCTLSyclKernelRef KRef,
253253
* @ingroup QueueInterface
254254
*/
255255
DPCTL_API
256-
DPCTLSyclEventRef
256+
__dpctl_give DPCTLSyclEventRef
257257
DPCTLQueue_SubmitNDRange(__dpctl_keep const DPCTLSyclKernelRef KRef,
258258
__dpctl_keep const DPCTLSyclQueueRef QRef,
259259
__dpctl_keep void **Args,
@@ -287,10 +287,11 @@ void DPCTLQueue_Wait(__dpctl_keep const DPCTLSyclQueueRef QRef);
287287
* @ingroup QueueInterface
288288
*/
289289
DPCTL_API
290-
DPCTLSyclEventRef DPCTLQueue_Memcpy(__dpctl_keep const DPCTLSyclQueueRef QRef,
291-
void *Dest,
292-
const void *Src,
293-
size_t Count);
290+
__dpctl_give DPCTLSyclEventRef
291+
DPCTLQueue_Memcpy(__dpctl_keep const DPCTLSyclQueueRef QRef,
292+
void *Dest,
293+
const void *Src,
294+
size_t Count);
294295

295296
/*!
296297
* @brief C-API wrapper for ``sycl::queue::prefetch``.
@@ -303,9 +304,10 @@ DPCTLSyclEventRef DPCTLQueue_Memcpy(__dpctl_keep const DPCTLSyclQueueRef QRef,
303304
* @ingroup QueueInterface
304305
*/
305306
DPCTL_API
306-
DPCTLSyclEventRef DPCTLQueue_Prefetch(__dpctl_keep DPCTLSyclQueueRef QRef,
307-
const void *Ptr,
308-
size_t Count);
307+
__dpctl_give DPCTLSyclEventRef
308+
DPCTLQueue_Prefetch(__dpctl_keep DPCTLSyclQueueRef QRef,
309+
const void *Ptr,
310+
size_t Count);
309311

310312
/*!
311313
* @brief C-API wrapper for ``sycl::queue::mem_advise``.
@@ -321,10 +323,11 @@ DPCTLSyclEventRef DPCTLQueue_Prefetch(__dpctl_keep DPCTLSyclQueueRef QRef,
321323
* @ingroup QueueInterface
322324
*/
323325
DPCTL_API
324-
DPCTLSyclEventRef DPCTLQueue_MemAdvise(__dpctl_keep DPCTLSyclQueueRef QRef,
325-
const void *Ptr,
326-
size_t Count,
327-
int Advice);
326+
__dpctl_give DPCTLSyclEventRef
327+
DPCTLQueue_MemAdvise(__dpctl_keep DPCTLSyclQueueRef QRef,
328+
const void *Ptr,
329+
size_t Count,
330+
int Advice);
328331

329332
/*!
330333
* @brief C-API wrapper for sycl::queue::is_in_order that indicates whether
@@ -365,7 +368,7 @@ size_t DPCTLQueue_Hash(__dpctl_keep const DPCTLSyclQueueRef QRef);
365368
* ``sycl::queue::submit_barrier()`` function.
366369
*/
367370
DPCTL_API
368-
DPCTLSyclEventRef
371+
__dpctl_give DPCTLSyclEventRef
369372
DPCTLQueue_SubmitBarrier(__dpctl_keep const DPCTLSyclQueueRef QRef);
370373

371374
/*!
@@ -380,7 +383,7 @@ DPCTLQueue_SubmitBarrier(__dpctl_keep const DPCTLSyclQueueRef QRef);
380383
* ``sycl::queue::submit_barrier()`` function.
381384
*/
382385
DPCTL_API
383-
DPCTLSyclEventRef DPCTLQueue_SubmitBarrierForEvents(
386+
__dpctl_give DPCTLSyclEventRef DPCTLQueue_SubmitBarrierForEvents(
384387
__dpctl_keep const DPCTLSyclQueueRef QRef,
385388
__dpctl_keep const DPCTLSyclEventRef *DepEvents,
386389
size_t NDepEvents);
@@ -397,9 +400,101 @@ DPCTLSyclEventRef DPCTLQueue_SubmitBarrierForEvents(
397400
* @ingroup QueueInterface
398401
*/
399402
DPCTL_API
400-
DPCTLSyclEventRef DPCTLQueue_Memset(__dpctl_keep const DPCTLSyclQueueRef QRef,
401-
void *USMRef,
402-
uint8_t Value,
403-
size_t Count);
403+
__dpctl_give DPCTLSyclEventRef
404+
DPCTLQueue_Memset(__dpctl_keep const DPCTLSyclQueueRef QRef,
405+
void *USMRef,
406+
uint8_t Value,
407+
size_t Count);
408+
409+
/*!
410+
* @brief C-API wrapper for ``sycl::queue::fill``.
411+
*
412+
* @param QRef An opaque pointer to the ``sycl::queue``.
413+
* @param USMRef An USM pointer to the memory to fill.
414+
* @param Value A uint8_t value to fill.
415+
* @param Count A number of uint8_t elements to fill.
416+
* @return An opaque pointer to the ``sycl::event`` returned by the
417+
* ``sycl::queue::fill`` function.
418+
* @ingroup QueueInterface
419+
*/
420+
DPCTL_API
421+
__dpctl_give DPCTLSyclEventRef
422+
DPCTLQueue_Fill8(__dpctl_keep const DPCTLSyclQueueRef QRef,
423+
void *USMRef,
424+
uint8_t Value,
425+
size_t Count);
426+
427+
/*!
428+
* @brief C-API wrapper for ``sycl::queue::fill``.
429+
*
430+
* @param QRef An opaque pointer to the ``sycl::queue``.
431+
* @param USMRef An USM pointer to the memory to fill.
432+
* @param Value A uint16_t value to fill.
433+
* @param Count A number of uint16_t elements to fill.
434+
* @return An opaque pointer to the ``sycl::event`` returned by the
435+
* ``sycl::queue::fill`` function.
436+
* @ingroup QueueInterface
437+
*/
438+
DPCTL_API
439+
__dpctl_give DPCTLSyclEventRef
440+
DPCTLQueue_Fill16(__dpctl_keep const DPCTLSyclQueueRef QRef,
441+
void *USMRef,
442+
uint16_t Value,
443+
size_t Count);
444+
445+
/*!
446+
* @brief C-API wrapper for ``sycl::queue::fill``.
447+
*
448+
* @param QRef An opaque pointer to the ``sycl::queue``.
449+
* @param USMRef An USM pointer to the memory to fill.
450+
* @param Value A uint32_t value to fill.
451+
* @param Count A number of uint32_t elements to fill.
452+
* @return An opaque pointer to the ``sycl::event`` returned by the
453+
* ``sycl::queue::fill`` function.
454+
* @ingroup QueueInterface
455+
*/
456+
DPCTL_API
457+
__dpctl_give DPCTLSyclEventRef
458+
DPCTLQueue_Fill32(__dpctl_keep const DPCTLSyclQueueRef QRef,
459+
void *USMRef,
460+
uint32_t Value,
461+
size_t Count);
462+
463+
/*!
464+
* @brief C-API wrapper for ``sycl::queue::fill``.
465+
*
466+
* @param QRef An opaque pointer to the ``sycl::queue``.
467+
* @param USMRef An USM pointer to the memory to fill.
468+
* @param Value A uint64_t value to fill.
469+
* @param Count A number of uint64_t elements to fill.
470+
* @return An opaque pointer to the ``sycl::event`` returned by the
471+
* ``sycl::queue::fill`` function.
472+
* @ingroup QueueInterface
473+
*/
474+
DPCTL_API
475+
__dpctl_give DPCTLSyclEventRef
476+
DPCTLQueue_Fill64(__dpctl_keep const DPCTLSyclQueueRef QRef,
477+
void *USMRef,
478+
uint64_t Value,
479+
size_t Count);
480+
481+
/*!
482+
* @brief C-API wrapper for ``sycl::queue::fill``.
483+
*
484+
* @param QRef An opaque pointer to the ``sycl::queue``.
485+
* @param USMRef An USM pointer to the memory to fill.
486+
* @param Value A pointer to uint64_t array of 2 elements with value
487+
* to fill.
488+
* @param Count A number of 128-bit elements to fill.
489+
* @return An opaque pointer to the ``sycl::event`` returned by the
490+
* ``sycl::queue::fill`` function.
491+
* @ingroup QueueInterface
492+
*/
493+
DPCTL_API
494+
__dpctl_give DPCTLSyclEventRef
495+
DPCTLQueue_Fill128(__dpctl_keep const DPCTLSyclQueueRef QRef,
496+
void *USMRef,
497+
uint64_t *Value,
498+
size_t Count);
404499

405500
DPCTL_C_EXTERN_C_END

0 commit comments

Comments
 (0)