[AsyncAlloc][UR][Exp] Initial spec for async alloc entry points #17117

Seanst98 · 2025-02-21T16:14:02Z

Introduce unimplemented UR API for the creation and use of memory pools with enqueued allocs/frees.

Currently, this only supports device allocated memory pools.

This extension is experimental and may change drastically between revisions.

co-authored-by: Sean Stirling [email protected]
co-authored-by: Hugh Delaney [email protected]

Introduce UR API for the creation and use of memory pools with enqueued allocs/frees. Currently, this only supports device allocated memory pools. This is only a spec introduced to enable parallel working between teams. Since this spec is experimental, it may change drastically between revisions. co-authored-by: Sean Stirling <[email protected]> co-authored-by: Hugh Delaney <[email protected]>

unified-runtime/source/adapters/level_zero/async_alloc.cpp

pbalcer · 2025-02-24T12:59:09Z

This failure:

[----------] Global test environment set-up.
/home/test-user/actions-runner/sycl-ur-01/_work/llvm/llvm/unified-runtime/test/conformance/source/environment.cpp:97: Failure
Failed
Could not find any devices to test
/home/test-user/actions-runner/sycl-ur-01/_work/llvm/llvm/unified-runtime/test/conformance/source/environment.cpp:151: Failure
Failed
Could not find any devices to test
/home/test-user/actions-runner/sycl-ur-01/_work/llvm/llvm/unified-runtime/test/conformance/source/environment.cpp:3[18](https://github.com/intel/llvm/actions/runs/13495515027/job/37702784573?pr=17117#step:8:19): Failure
Failed
Could not find any devices to test

Is likely because the adapter does not define the new pool-related symbols.

kswiecicki · 2025-02-24T15:49:25Z

unified-runtime/scripts/core/exp-async-alloc.yml

+      name: hQueue
+      desc: "[in] handle of the queue object"
+    - type: $x_usm_pool_handle_t
+      desc: "[in][optional] USM pool descriptor"


Do you know if it's certain that the pool handle parameter for the free function should be optional?

I don't think it's certain whether we need this parameter at all. I've included it as optional just in case we do need the pool to determine where the ptr is located so that it can be freed. In the future, we could just remove it entirely if it's not necessary.

kswiecicki · 2025-02-24T16:03:20Z

unified-runtime/scripts/core/exp-async-alloc.yml

+    - type: const size_t
+      desc: "[in] minimum size in bytes of the USM memory object to be allocated"
+      name: size
+    - type: const $x_exp_async_usm_alloc_properties_t*


Those alloc operations are prefixed with enqueue, perhaps ur_exp_enqueue_usm_alloc_properties_t would be more fitting.

I completely see your point, and I had thought about this initially. I did this since I felt it was clearer that it fell under the async alloc extension name and followed that naming convention which can be found in other enums/structs, e.g. ASYNC_USM_ALLOCATIONS_EXP, exp_async_usm_alloc_flags_t.

However, I'm open to changing it if the consensus agrees that enqueue would be more appropriate here, so long as there is some consistency holding the extension together.

kswiecicki · 2025-02-24T16:11:53Z

unified-runtime/scripts/core/exp-async-alloc.yml

+name: PoolSetDevicePoolExp
+ordinal: "0"
+details:
+  - "Set the current pool for a device."


What's the use-case for the Pool[Set|Get]DevicePoolExp operations?

There may be a use case where a user would like to:

Create a memory pool with their own desired properties

Set a device's pool to be this newly created pool

Any subsequent allocs where no pool is specified will automatically resort to using this set pool

Subsequent allocs may not just include user written code, but also libraries which call async_malloc (without specifying a pool). This gives the user some control over how libraries using asynchronous allocation will behave when interacting with their application.

The getter then allows the user to retrieve that pool for whatever reason.

Since there's PoolGetDevicePoolExp was PoolGetDefaultDevicePoolExp added to the spec by mistake? Or does it imply that there should be an internal pool for async operations created in context by default?

That's right. There should be an internal pool by default to service async_malloc when no pool has been set for a device. This pool must always be present and so must not be destroyed.

If a user sets a pool for a device for the first time, then that is replacing the default pool. The user must then be able to retrieve the default pool again if they wanted to re-set the device's pool as the default.

Hm, that's a bit tricky, since user could get the default pool handle and misuse it somehow.

I'm not sure in what way they could misuse it that they couldn't do with any other handle. Do you have a potential problem in mind?

I'm not sure in what way they could misuse it that they couldn't do with any other handle. Do you have a potential problem in mind?

I thought it over, and the user would probably need to call PoolGetDefaultDevicePoolExp and then set obtained pool via PoolSetDevicePoolExp. Initially, I assumed that the default pool would be implicitly used for every device when no pool was explicitly set for a given device, which made its accessibility to the user unclear.

I'm not 100% understanding the issue you're presenting.

For a device, there will be a default pool initialised at application startup. Any device specific calls to async_malloc when not specifying a pool will make use of that pool. Meaning any calls to both PoolGetDefaultDevicePoolExp or PoolGetDevicePoolExp before any call to PoolSetDevicePoolExp will return this default pool.

At any point in the program, the user can replace that pool with their own, PoolSetDevicePoolExp. Any calls to PoolGetDevicePoolExp will now return the newly set pool and any device specific calls to async_malloc will use this pool. If we didn't have the PoolGetDefaultDevicePoolExp function, then if the user lost the handle to the default pool then they would be unable to retrieve it and potentially re-set it as the device's pool.

These pools are particular to the device they live on. For instance, each device will have its own set pool that it uses to service calls to async_malloc when no pool is specified.

Seanst98 · 2025-02-25T13:34:53Z

This failure:

[----------] Global test environment set-up.
/home/test-user/actions-runner/sycl-ur-01/_work/llvm/llvm/unified-runtime/test/conformance/source/environment.cpp:97: Failure
Failed
Could not find any devices to test
/home/test-user/actions-runner/sycl-ur-01/_work/llvm/llvm/unified-runtime/test/conformance/source/environment.cpp:151: Failure
Failed
Could not find any devices to test
/home/test-user/actions-runner/sycl-ur-01/_work/llvm/llvm/unified-runtime/test/conformance/source/environment.cpp:3[18](https://github.com/intel/llvm/actions/runs/13495515027/job/37702784573?pr=17117#step:8:19): Failure
Failed
Could not find any devices to test

Is likely because the adapter does not define the new pool-related symbols.

You're right. I was missing the definitions in L0 V2. Fixed. Thanks!

kswiecicki

Overall lgtm, it's experimental, we can change it later.

kswiecicki · 2025-02-27T10:44:16Z

unified-runtime/scripts/core/EXP-ASYNC-ALLOC.rst

+* ${x}_device_info_t
+    * ${X}_DEVICE_INFO_ASYNC_USM_ALLOCATIONS_EXP
+* ${x}_usm_pool_flags_t
+    * ${X}_USM_POOL_FLAG_USE_NATIVE_MEMORY_POOL_EXP


I don't get the idea behind this UR_USM_POOL_FLAG_USE_NATIVE_MEMORY_POOL_EXP flag. Was it meant to fulfill the async API requirement Create a memory pool from a user provided USM allocation? If so, there's currently no API that allows user to provide such pointer for pool creation.

Btw, my comments aren’t meant to delay merging this PR. This discussion will help with the future spec changes in separate PRs.

It's to indicate that we intend to use the native backend handled pool instead of the SYCL runtime handling the pools/allocs/frees. Because there's no CUDA equivalent to host side or shared memory pools, this will need to be handled in the SYCL runtime ourselves. And we may even be able to migrate the device side pools to using the infrastructure we build in the SYCL runtime, at which point we could use the flag to choose between the implementations.

But since this functionality is currently undecided whether we want to go ahead with it or not, we don't technically need this flag, similarly to the enqueueUSM{X}AllocExp funcs but we'll keep them in since we may implement them later.

unified-runtime/source/adapters/cuda/usm.cpp

npmiller

CUDA/HIP changes LGTM

Seanst98 · 2025-02-27T15:16:49Z

Friendly ping @intel/dpcpp-nativecpu-reviewers and @ldrumm for reviews.

Seanst98 · 2025-02-28T09:58:53Z

@intel/llvm-gatekeepers can we merge this please?

sommerlukas · 2025-02-28T10:10:28Z

@Seanst98 I feel like the title and description don't really match what's in this PR. It says "initial spec" and "this is only a spec", but changes 43 files and adds APIs in code.

Can you change the title and description please?

Seanst98 requested review from a team as code owners February 21, 2025 16:14

Seanst98 requested a review from ldrumm February 21, 2025 16:14

Seanst98 temporarily deployed to WindowsCILock February 21, 2025 16:15 — with GitHub Actions Inactive

Seanst98 temporarily deployed to WindowsCILock February 21, 2025 17:31 — with GitHub Actions Inactive

Seanst98 added 2 commits February 24, 2025 10:08

Add missing L0 immediate in order queue declarations/definitions

4f15509

Merge branch 'sycl' into sean/async-alloc-ur-spec

e93e1ea

Seanst98 temporarily deployed to WindowsCILock February 24, 2025 10:08 — with GitHub Actions Inactive

pbalcer approved these changes Feb 24, 2025

View reviewed changes

kswiecicki mentioned this pull request Feb 24, 2025

[UR][L0] Add initial USM alloc enqueue API #17112

Merged

Seanst98 temporarily deployed to WindowsCILock February 24, 2025 10:23 — with GitHub Actions Inactive

pbalcer reviewed Feb 24, 2025

View reviewed changes

unified-runtime/source/adapters/level_zero/async_alloc.cpp Show resolved Hide resolved

kswiecicki reviewed Feb 24, 2025

View reviewed changes

Seanst98 added 2 commits February 25, 2025 10:34

Merge branch 'sycl' into sean/async-alloc-ur-spec

e935b82

Fix double defintion in L0

ef87475

Seanst98 had a problem deploying to WindowsCILock February 25, 2025 12:11 — with GitHub Actions Failure

Seanst98 had a problem deploying to WindowsCILock February 25, 2025 12:15 — with GitHub Actions Failure

Add missing L0 v2 definitions

bc21a7f

Seanst98 had a problem deploying to WindowsCILock February 25, 2025 12:48 — with GitHub Actions Failure

Seanst98 temporarily deployed to WindowsCILock February 25, 2025 13:36 — with GitHub Actions Inactive

Seanst98 had a problem deploying to WindowsCILock February 25, 2025 13:50 — with GitHub Actions Failure

Minor cleanup

8aed328

Seanst98 had a problem deploying to WindowsCILock February 25, 2025 15:21 — with GitHub Actions Error

Merge branch 'sycl' into sean/async-alloc-ur-spec

c7556f9

Seanst98 temporarily deployed to WindowsCILock February 26, 2025 10:11 — with GitHub Actions Inactive

Seanst98 temporarily deployed to WindowsCILock February 26, 2025 10:44 — with GitHub Actions Inactive

Merge branch 'sycl' into sean/async-alloc-ur-spec

52d484f

EwanC mentioned this pull request Feb 26, 2025

run_prebuilt_e2e_tests CI jobs fail in cases with UR API changes #16982

Closed

Seanst98 added 2 commits February 26, 2025 15:36

Merge branch 'sycl' into sean/async-alloc-ur-spec

825d8d8

Merge branch 'sycl' into sean/async-alloc-ur-spec

f322313

Seanst98 temporarily deployed to WindowsCILock February 26, 2025 16:58 — with GitHub Actions Inactive

Seanst98 temporarily deployed to WindowsCILock February 26, 2025 19:06 — with GitHub Actions Inactive

kswiecicki approved these changes Feb 27, 2025

View reviewed changes

pbalcer approved these changes Feb 27, 2025

View reviewed changes

kswiecicki reviewed Feb 27, 2025

View reviewed changes

npmiller reviewed Feb 27, 2025

View reviewed changes

unified-runtime/source/adapters/cuda/usm.cpp Outdated Show resolved Hide resolved

Merge branch 'sycl' into sean/async-alloc-ur-spec

ebbd4ef

npmiller approved these changes Feb 27, 2025

View reviewed changes

Address nit feedback by using no-names over std::ignore

258b972

Seanst98 temporarily deployed to WindowsCILock February 27, 2025 12:00 — with GitHub Actions Inactive

Seanst98 temporarily deployed to WindowsCILock February 27, 2025 12:16 — with GitHub Actions Inactive

coldav approved these changes Feb 27, 2025

View reviewed changes

Seanst98 mentioned this pull request Feb 28, 2025

DRAFT: [AsyncAlloc][CUDA] Initial UR spec and implementation for the async oneapi-src/unified-runtime#2668

Closed

sommerlukas merged commit b183751 into intel:sycl Feb 28, 2025
29 checks passed

Seanst98 deleted the sean/async-alloc-ur-spec branch April 4, 2025 16:39

[AsyncAlloc][UR][Exp] Initial spec for async alloc entry points #17117

[AsyncAlloc][UR][Exp] Initial spec for async alloc entry points #17117

Uh oh!

Conversation

Seanst98 commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pbalcer commented Feb 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Seanst98 Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Seanst98 commented Feb 25, 2025

Uh oh!

kswiecicki left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

npmiller left a comment

Choose a reason for hiding this comment

Uh oh!

Seanst98 commented Feb 27, 2025

Uh oh!

Seanst98 commented Feb 28, 2025

Uh oh!

sommerlukas commented Feb 28, 2025

Uh oh!

Uh oh!

Uh oh!

Seanst98 commented Feb 21, 2025 •

edited

Loading

Seanst98 Feb 28, 2025 •

edited

Loading