-
Notifications
You must be signed in to change notification settings - Fork 797
[AsyncAlloc][UR][Exp] Initial spec for async alloc entry points #17117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AsyncAlloc][UR][Exp] Initial spec for async alloc entry points #17117
Conversation
Introduce UR API for the creation and use of memory pools with enqueued allocs/frees. Currently, this only supports device allocated memory pools. This is only a spec introduced to enable parallel working between teams. Since this spec is experimental, it may change drastically between revisions. co-authored-by: Sean Stirling <[email protected]> co-authored-by: Hugh Delaney <[email protected]>
This failure:
Is likely because the adapter does not define the new pool-related symbols. |
name: hQueue | ||
desc: "[in] handle of the queue object" | ||
- type: $x_usm_pool_handle_t | ||
desc: "[in][optional] USM pool descriptor" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if it's certain that the pool handle parameter for the free function should be optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's certain whether we need this parameter at all. I've included it as optional just in case we do need the pool to determine where the ptr is located so that it can be freed. In the future, we could just remove it entirely if it's not necessary.
- type: const size_t | ||
desc: "[in] minimum size in bytes of the USM memory object to be allocated" | ||
name: size | ||
- type: const $x_exp_async_usm_alloc_properties_t* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those alloc operations are prefixed with enqueue
, perhaps ur_exp_enqueue_usm_alloc_properties_t
would be more fitting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely see your point, and I had thought about this initially. I did this since I felt it was clearer that it fell under the async alloc
extension name and followed that naming convention which can be found in other enums/structs, e.g. ASYNC_USM_ALLOCATIONS_EXP
, exp_async_usm_alloc_flags_t
.
However, I'm open to changing it if the consensus agrees that enqueue
would be more appropriate here, so long as there is some consistency holding the extension together.
name: PoolSetDevicePoolExp | ||
ordinal: "0" | ||
details: | ||
- "Set the current pool for a device." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use-case for the Pool[Set|Get]DevicePoolExp
operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be a use case where a user would like to:
- Create a memory pool with their own desired properties
- Set a device's pool to be this newly created pool
- Any subsequent allocs where no pool is specified will automatically resort to using this set pool
Subsequent allocs may not just include user written code, but also libraries which call async_malloc
(without specifying a pool). This gives the user some control over how libraries using asynchronous allocation will behave when interacting with their application.
The getter then allows the user to retrieve that pool for whatever reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there's PoolGetDevicePoolExp
was PoolGetDefaultDevicePoolExp
added to the spec by mistake? Or does it imply that there should be an internal pool for async operations created in context by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. There should be an internal pool by default to service async_malloc
when no pool has been set for a device. This pool must always be present and so must not be destroyed.
If a user sets a pool for a device for the first time, then that is replacing the default pool. The user must then be able to retrieve the default pool again if they wanted to re-set the device's pool as the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, that's a bit tricky, since user could get the default pool handle and misuse it somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure in what way they could misuse it that they couldn't do with any other handle. Do you have a potential problem in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure in what way they could misuse it that they couldn't do with any other handle. Do you have a potential problem in mind?
I thought it over, and the user would probably need to call PoolGetDefaultDevicePoolExp
and then set obtained pool via PoolSetDevicePoolExp
. Initially, I assumed that the default pool would be implicitly used for every device when no pool was explicitly set for a given device, which made its accessibility to the user unclear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% understanding the issue you're presenting.
For a device, there will be a default pool initialised at application startup. Any device specific calls to async_malloc
when not specifying a pool will make use of that pool. Meaning any calls to both PoolGetDefaultDevicePoolExp
or PoolGetDevicePoolExp
before any call to PoolSetDevicePoolExp
will return this default pool.
At any point in the program, the user can replace that pool with their own, PoolSetDevicePoolExp
. Any calls to PoolGetDevicePoolExp
will now return the newly set pool and any device specific calls to async_malloc
will use this pool. If we didn't have the PoolGetDefaultDevicePoolExp
function, then if the user lost the handle to the default pool then they would be unable to retrieve it and potentially re-set it as the device's pool.
These pools are particular to the device they live on. For instance, each device will have its own set pool that it uses to service calls to async_malloc
when no pool is specified.
You're right. I was missing the definitions in L0 V2. Fixed. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall lgtm, it's experimental, we can change it later.
* ${x}_device_info_t | ||
* ${X}_DEVICE_INFO_ASYNC_USM_ALLOCATIONS_EXP | ||
* ${x}_usm_pool_flags_t | ||
* ${X}_USM_POOL_FLAG_USE_NATIVE_MEMORY_POOL_EXP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get the idea behind this UR_USM_POOL_FLAG_USE_NATIVE_MEMORY_POOL_EXP
flag. Was it meant to fulfill the async API requirement Create a memory pool from a user provided USM allocation
? If so, there's currently no API that allows user to provide such pointer for pool creation.
Btw, my comments aren’t meant to delay merging this PR. This discussion will help with the future spec changes in separate PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to indicate that we intend to use the native backend handled pool instead of the SYCL runtime handling the pools/allocs/frees. Because there's no CUDA equivalent to host side or shared memory pools, this will need to be handled in the SYCL runtime ourselves. And we may even be able to migrate the device side pools to using the infrastructure we build in the SYCL runtime, at which point we could use the flag to choose between the implementations.
But since this functionality is currently undecided whether we want to go ahead with it or not, we don't technically need this flag, similarly to the enqueueUSM{X}AllocExp
funcs but we'll keep them in since we may implement them later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA/HIP changes LGTM
Friendly ping @intel/dpcpp-nativecpu-reviewers and @ldrumm for reviews. |
@intel/llvm-gatekeepers can we merge this please? |
@Seanst98 I feel like the title and description don't really match what's in this PR. It says "initial spec" and "this is only a spec", but changes 43 files and adds APIs in code. Can you change the title and description please? |
Introduce unimplemented UR API for the creation and use of memory pools with enqueued allocs/frees.
Currently, this only supports device allocated memory pools.
This extension is experimental and may change drastically between revisions.
co-authored-by: Sean Stirling [email protected]
co-authored-by: Hugh Delaney [email protected]