Skip to content

[AsyncAlloc][UR][Exp] Initial spec for async alloc entry points #17117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Feb 28, 2025

Conversation

Seanst98
Copy link
Contributor

@Seanst98 Seanst98 commented Feb 21, 2025

Introduce unimplemented UR API for the creation and use of memory pools with enqueued allocs/frees.

Currently, this only supports device allocated memory pools.

This extension is experimental and may change drastically between revisions.

co-authored-by: Sean Stirling [email protected]
co-authored-by: Hugh Delaney [email protected]

Introduce UR API for the creation and use of memory pools with
enqueued allocs/frees.

Currently, this only supports device allocated memory pools.

This is only a spec introduced to enable parallel working between
teams. Since this spec is experimental, it may change drastically
between revisions.

co-authored-by: Sean Stirling <[email protected]>
co-authored-by: Hugh Delaney <[email protected]>
@Seanst98 Seanst98 requested review from a team as code owners February 21, 2025 16:14
@Seanst98 Seanst98 requested a review from ldrumm February 21, 2025 16:14
@pbalcer
Copy link
Contributor

pbalcer commented Feb 24, 2025

This failure:

[----------] Global test environment set-up.
/home/test-user/actions-runner/sycl-ur-01/_work/llvm/llvm/unified-runtime/test/conformance/source/environment.cpp:97: Failure
Failed
Could not find any devices to test
/home/test-user/actions-runner/sycl-ur-01/_work/llvm/llvm/unified-runtime/test/conformance/source/environment.cpp:151: Failure
Failed
Could not find any devices to test
/home/test-user/actions-runner/sycl-ur-01/_work/llvm/llvm/unified-runtime/test/conformance/source/environment.cpp:3[18](https://github.com/intel/llvm/actions/runs/13495515027/job/37702784573?pr=17117#step:8:19): Failure
Failed
Could not find any devices to test

Is likely because the adapter does not define the new pool-related symbols.

name: hQueue
desc: "[in] handle of the queue object"
- type: $x_usm_pool_handle_t
desc: "[in][optional] USM pool descriptor"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if it's certain that the pool handle parameter for the free function should be optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's certain whether we need this parameter at all. I've included it as optional just in case we do need the pool to determine where the ptr is located so that it can be freed. In the future, we could just remove it entirely if it's not necessary.

- type: const size_t
desc: "[in] minimum size in bytes of the USM memory object to be allocated"
name: size
- type: const $x_exp_async_usm_alloc_properties_t*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those alloc operations are prefixed with enqueue, perhaps ur_exp_enqueue_usm_alloc_properties_t would be more fitting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely see your point, and I had thought about this initially. I did this since I felt it was clearer that it fell under the async alloc extension name and followed that naming convention which can be found in other enums/structs, e.g. ASYNC_USM_ALLOCATIONS_EXP, exp_async_usm_alloc_flags_t.

However, I'm open to changing it if the consensus agrees that enqueue would be more appropriate here, so long as there is some consistency holding the extension together.

name: PoolSetDevicePoolExp
ordinal: "0"
details:
- "Set the current pool for a device."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use-case for the Pool[Set|Get]DevicePoolExp operations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a use case where a user would like to:

  1. Create a memory pool with their own desired properties
  2. Set a device's pool to be this newly created pool
  3. Any subsequent allocs where no pool is specified will automatically resort to using this set pool

Subsequent allocs may not just include user written code, but also libraries which call async_malloc (without specifying a pool). This gives the user some control over how libraries using asynchronous allocation will behave when interacting with their application.

The getter then allows the user to retrieve that pool for whatever reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there's PoolGetDevicePoolExp was PoolGetDefaultDevicePoolExp added to the spec by mistake? Or does it imply that there should be an internal pool for async operations created in context by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. There should be an internal pool by default to service async_malloc when no pool has been set for a device. This pool must always be present and so must not be destroyed.

If a user sets a pool for a device for the first time, then that is replacing the default pool. The user must then be able to retrieve the default pool again if they wanted to re-set the device's pool as the default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that's a bit tricky, since user could get the default pool handle and misuse it somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure in what way they could misuse it that they couldn't do with any other handle. Do you have a potential problem in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure in what way they could misuse it that they couldn't do with any other handle. Do you have a potential problem in mind?

I thought it over, and the user would probably need to call PoolGetDefaultDevicePoolExp and then set obtained pool via PoolSetDevicePoolExp. Initially, I assumed that the default pool would be implicitly used for every device when no pool was explicitly set for a given device, which made its accessibility to the user unclear.

Copy link
Contributor Author

@Seanst98 Seanst98 Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% understanding the issue you're presenting.

For a device, there will be a default pool initialised at application startup. Any device specific calls to async_malloc when not specifying a pool will make use of that pool. Meaning any calls to both PoolGetDefaultDevicePoolExp or PoolGetDevicePoolExp before any call to PoolSetDevicePoolExp will return this default pool.

At any point in the program, the user can replace that pool with their own, PoolSetDevicePoolExp. Any calls to PoolGetDevicePoolExp will now return the newly set pool and any device specific calls to async_malloc will use this pool. If we didn't have the PoolGetDefaultDevicePoolExp function, then if the user lost the handle to the default pool then they would be unable to retrieve it and potentially re-set it as the device's pool.

These pools are particular to the device they live on. For instance, each device will have its own set pool that it uses to service calls to async_malloc when no pool is specified.

@Seanst98
Copy link
Contributor Author

This failure:

[----------] Global test environment set-up.
/home/test-user/actions-runner/sycl-ur-01/_work/llvm/llvm/unified-runtime/test/conformance/source/environment.cpp:97: Failure
Failed
Could not find any devices to test
/home/test-user/actions-runner/sycl-ur-01/_work/llvm/llvm/unified-runtime/test/conformance/source/environment.cpp:151: Failure
Failed
Could not find any devices to test
/home/test-user/actions-runner/sycl-ur-01/_work/llvm/llvm/unified-runtime/test/conformance/source/environment.cpp:3[18](https://github.com/intel/llvm/actions/runs/13495515027/job/37702784573?pr=17117#step:8:19): Failure
Failed
Could not find any devices to test

Is likely because the adapter does not define the new pool-related symbols.

You're right. I was missing the definitions in L0 V2. Fixed. Thanks!

Copy link
Contributor

@kswiecicki kswiecicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm, it's experimental, we can change it later.

* ${x}_device_info_t
* ${X}_DEVICE_INFO_ASYNC_USM_ALLOCATIONS_EXP
* ${x}_usm_pool_flags_t
* ${X}_USM_POOL_FLAG_USE_NATIVE_MEMORY_POOL_EXP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the idea behind this UR_USM_POOL_FLAG_USE_NATIVE_MEMORY_POOL_EXP flag. Was it meant to fulfill the async API requirement Create a memory pool from a user provided USM allocation? If so, there's currently no API that allows user to provide such pointer for pool creation.

Btw, my comments aren’t meant to delay merging this PR. This discussion will help with the future spec changes in separate PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to indicate that we intend to use the native backend handled pool instead of the SYCL runtime handling the pools/allocs/frees. Because there's no CUDA equivalent to host side or shared memory pools, this will need to be handled in the SYCL runtime ourselves. And we may even be able to migrate the device side pools to using the infrastructure we build in the SYCL runtime, at which point we could use the flag to choose between the implementations.

But since this functionality is currently undecided whether we want to go ahead with it or not, we don't technically need this flag, similarly to the enqueueUSM{X}AllocExp funcs but we'll keep them in since we may implement them later.

Copy link
Contributor

@npmiller npmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA/HIP changes LGTM

@Seanst98
Copy link
Contributor Author

Friendly ping @intel/dpcpp-nativecpu-reviewers and @ldrumm for reviews.

@Seanst98
Copy link
Contributor Author

@intel/llvm-gatekeepers can we merge this please?

@sommerlukas
Copy link
Contributor

@Seanst98 I feel like the title and description don't really match what's in this PR. It says "initial spec" and "this is only a spec", but changes 43 files and adds APIs in code.

Can you change the title and description please?

@sommerlukas sommerlukas merged commit b183751 into intel:sycl Feb 28, 2025
29 checks passed
@Seanst98 Seanst98 deleted the sean/async-alloc-ur-spec branch April 4, 2025 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants