Skip to content

[SYCL] Defer buffer release when no host memory to be updated #6837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 68 commits into from
Dec 8, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
0601210
[SYCL] Mark mem object which may have not blocking dtor according to …
KseniyaTikhomirova Sep 16, 2022
aff3be6
Add draft how to delay buffer_impl release
KseniyaTikhomirova Sep 21, 2022
1195b59
Update symbols for non-breaking change
KseniyaTikhomirova Sep 21, 2022
8d05802
Update abi test vtable.cpp - non-breaking change
KseniyaTikhomirova Sep 21, 2022
b54b8e4
Update SYCL_MINOR_VERSION for non-breaking ABI change
KseniyaTikhomirova Sep 21, 2022
965a015
Remove ABI break
KseniyaTikhomirova Sep 21, 2022
27ccbff
Update symbols to new version
KseniyaTikhomirova Sep 21, 2022
9540fe0
Tiny rename
KseniyaTikhomirova Sep 21, 2022
c00c7cb
Revert "Update abi test vtable.cpp - non-breaking change"
KseniyaTikhomirova Sep 21, 2022
661dace
Remove isDefault method, reimplemented
KseniyaTikhomirova Sep 21, 2022
6615db3
Fix symbols again
KseniyaTikhomirova Sep 21, 2022
8174dc3
Add handling of deferred mem objects release
KseniyaTikhomirova Sep 22, 2022
d55405e
Remove unused function and restore XPTI traces collection
KseniyaTikhomirova Sep 22, 2022
bb2c4fb
Add skeleton for unit test
KseniyaTikhomirova Sep 22, 2022
5db9e85
Fix shared_ptr use_count check
KseniyaTikhomirova Sep 23, 2022
53a1892
Test draft
KseniyaTikhomirova Sep 23, 2022
4b0a3fa
[SYCL] Align usm_allocator ctor and operators with SYCCL2020
KseniyaTikhomirova Sep 23, 2022
8daea20
Update attach scheduler logic
KseniyaTikhomirova Sep 26, 2022
c855f13
Make cleanup iterative
KseniyaTikhomirova Sep 28, 2022
8dbcd1c
Fix test utils impl error
KseniyaTikhomirova Sep 28, 2022
0f61c64
Add other tests for buffer contructors
KseniyaTikhomirova Sep 28, 2022
ddf215b
Other tests for high level buffer destruction deferring logic
KseniyaTikhomirova Sep 29, 2022
23bea82
Add unittest for waitForRecordToFinish
KseniyaTikhomirova Oct 3, 2022
aa41d76
Remove debug flags uploaded by mistake
KseniyaTikhomirova Oct 3, 2022
e296d03
Merge branch 'sycl' into buff_detach
KseniyaTikhomirova Oct 3, 2022
179c472
Fix clang-format
KseniyaTikhomirova Oct 3, 2022
2076c7c
Update test to not keep ill-formed objects
KseniyaTikhomirova Oct 4, 2022
e911d0a
Check command destruction
KseniyaTikhomirova Oct 4, 2022
81c2b09
Fix clang-format
KseniyaTikhomirova Oct 4, 2022
edcfcfc
Handle set_final_data usage
KseniyaTikhomirova Oct 4, 2022
b5e85de
Fix code-review comments (round 1)
KseniyaTikhomirova Oct 5, 2022
a5980a0
Fix missed comments
KseniyaTikhomirova Oct 5, 2022
ac06f1b
Merge branch 'sycl' into buff_detach
KseniyaTikhomirova Oct 5, 2022
09b8359
Remove nagation from variable name and logic
KseniyaTikhomirova Oct 5, 2022
1e75448
Simplify deferred mem objects release - do not aggregate to capture lock
KseniyaTikhomirova Oct 5, 2022
484b1cf
Return trace of stream buffer emptyness to scheduler destructor
KseniyaTikhomirova Oct 5, 2022
1061322
Fix comments (round 2)
KseniyaTikhomirova Oct 7, 2022
d4537a3
Fix comments (round 3)
KseniyaTikhomirova Oct 7, 2022
342ff91
Fix build
KseniyaTikhomirova Oct 7, 2022
fdab0e7
Fix comments & tests (round 4)
KseniyaTikhomirova Oct 7, 2022
0872d7c
Predict comments: restore removeMemoryObject content
KseniyaTikhomirova Oct 7, 2022
3a25f1e
Fix root cause of hang when host task is not even started upon releas…
KseniyaTikhomirova Oct 11, 2022
28f008d
Fix comments (round n)
KseniyaTikhomirova Oct 14, 2022
92b5e15
Merge branch 'sycl' into buff_detach
KseniyaTikhomirova Oct 17, 2022
6247f8a
[ESIMD] Implement piEventGetInfo for event execution status
KseniyaTikhomirova Oct 18, 2022
4133862
Move comment to the right place
KseniyaTikhomirova Oct 18, 2022
79b2125
cv.notify_all should not be called under mutex paired with cv
KseniyaTikhomirova Oct 19, 2022
868973c
Remove default allocator check after SYCL2020 update
KseniyaTikhomirova Oct 21, 2022
1bc8e57
Update unittests due to default allocator check removal
KseniyaTikhomirova Oct 21, 2022
6e0943b
Update symbols after parameter removal
KseniyaTikhomirova Oct 21, 2022
1c62d08
Try to align hip context destruction handling with cuda WA
KseniyaTikhomirova Oct 26, 2022
30dfaf2
Merge branch 'sycl' into buff_detach
KseniyaTikhomirova Oct 27, 2022
60e3011
Fix unit test after mock plugin rework
KseniyaTikhomirova Nov 7, 2022
6964876
Merge branch 'sycl' into buff_detach
KseniyaTikhomirova Nov 16, 2022
3d5315e
DRAFT: try to release scheduler resources earlier using thread_local …
KseniyaTikhomirova Nov 16, 2022
467a9ea
Draft: try to release scheduler resources earlier, fix counter declar…
KseniyaTikhomirova Nov 17, 2022
0b9032a
Release scheduler resources earlier
KseniyaTikhomirova Nov 16, 2022
9d570ce
change location for buff release attempt
KseniyaTikhomirova Dec 5, 2022
c6d5dc7
Code cleanup
KseniyaTikhomirova Dec 6, 2022
a0b37ef
Code cleanup Part 2
KseniyaTikhomirova Dec 6, 2022
619ee4e
Revert "Try to align hip context destruction handling with cuda WA"
KseniyaTikhomirova Dec 6, 2022
3187f0a
Return cleanup deferred buffers to cleanupCommands call
KseniyaTikhomirova Dec 6, 2022
dbe88e2
Remove unnecessary variable in ObjectRefCounter
KseniyaTikhomirova Dec 6, 2022
06e2608
Fix hang
KseniyaTikhomirova Dec 6, 2022
71e9048
Fix comments
KseniyaTikhomirova Dec 6, 2022
a89e577
Merge branch 'sycl' into buff_detach
KseniyaTikhomirova Dec 6, 2022
ceea7f8
Prevent warning as error for release build
KseniyaTikhomirova Dec 7, 2022
1f201a9
wprotectMDeferredMemObjRelease modification with mutex
KseniyaTikhomirova Dec 7, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion sycl/include/sycl/buffer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,8 @@ class __SYCL_EXPORT buffer_plain {

size_t getSize() const;

void handleRelease() const;

std::shared_ptr<detail::buffer_impl> impl;
};

Expand Down Expand Up @@ -466,7 +468,7 @@ class buffer : public detail::buffer_plain,

buffer &operator=(buffer &&rhs) = default;

~buffer() = default;
~buffer() { buffer_plain::handleRelease(); }

bool operator==(const buffer &rhs) const { return impl == rhs.impl; }

Expand Down
36 changes: 34 additions & 2 deletions sycl/plugins/esimd_emulator/pi_esimd_emulator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1381,8 +1381,40 @@ pi_result piKernelRelease(pi_kernel) { DIE_NO_IMPLEMENTATION; }

pi_result piEventCreate(pi_context, pi_event *) { DIE_NO_IMPLEMENTATION; }

pi_result piEventGetInfo(pi_event, pi_event_info, size_t, void *, size_t *) {
DIE_NO_IMPLEMENTATION;
pi_result piEventGetInfo(pi_event Event, pi_event_info ParamName,
size_t ParamValueSize, void *ParamValue,
size_t *ParamValueSizeRet) {
if (ParamName != PI_EVENT_INFO_COMMAND_EXECUTION_STATUS) {
DIE_NO_IMPLEMENTATION;
}

auto CheckAndFillStatus = [&](const cm_support::CM_STATUS &State) {
pi_int32 Result = PI_EVENT_RUNNING;
Copy link
Contributor

@kbobrovs kbobrovs Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this an appropriate default?
@dongkyunahn-intel, your help is needed to review this part. I believe the Result initialization should be a switch on State.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It think it would be better to set Result with switch/case statement as Konst suggested. CM_STATUS_* are defined in link below.

https://github.com/intel/cm-cpu-emulation/blob/0c5fc287f34ae38d3184ab70ea5513d9fb1ff338/common/type_status.h#L13

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @kbobrovs and @dongkyunahn-intel I did only two states intentionally, because other statuses (!= CM_STATUS_FINISHED) seems logical to map to running since it is all "work in progress" and RT does not need more details. I also used L0 plugin as reference which also has differentiation for two states - finished and not finished = running. I think it would be better to align plugins and not introduce extra value handling on level above. What do you think?
Although it may be needed to report CM_STATUS_RESET separately as error. Could you please educate me what does this status stands for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @kbobrovs and @dongkyunahn-intel I did only two states intentionally, because other statuses (!= CM_STATUS_FINISHED) seems logical to map to running since it is all "work in progress" and RT does not need more details. I also used L0 plugin as reference which also has differentiation for two states - finished and not finished = running. I think it would be better to align plugins and not introduce extra value handling on level above. What do you think?

I was not aware that L0 has such implementation. Meanwhile, I looked into CM_EMU's GetStatus() implementation and found out that the function is dummy one returning only CM_STATUS_FINISHED. Now it looks like okay to maintain current implementation - with comments about FINISHED/NOT_FINISHED.

Although it may be needed to report CM_STATUS_RESET separately as error. Could you please educate me what does this status stands for?

'CM_STATUS_RESET' is defined, but never used in CM_EMU. You can disregard it.

if (State == cm_support::CM_STATUS_FINISHED)
Result = PI_EVENT_COMPLETE;
if (ParamValue) {
if (ParamValueSize < sizeof(Result))
return PI_ERROR_INVALID_VALUE;
*static_cast<pi_int32 *>(ParamValue) = Result;
}
if (ParamValueSizeRet) {
*ParamValueSizeRet = sizeof(Result);
}
return PI_SUCCESS;
};
// Dummy event is already completed ones done by CM.
if (Event->IsDummyEvent)
return CheckAndFillStatus(cm_support::CM_STATUS_FINISHED);

if (Event->CmEventPtr == nullptr)
return PI_ERROR_INVALID_EVENT;

cm_support::CM_STATUS Status;
int32_t Result = Event->CmEventPtr->GetStatus(Status);
if (Result != cm_support::CM_SUCCESS)
return PI_ERROR_COMMAND_EXECUTION_FAILURE;

return CheckAndFillStatus(Status);
}

pi_result piEventGetProfilingInfo(pi_event Event, pi_profiling_info ParamName,
Expand Down
7 changes: 7 additions & 0 deletions sycl/source/buffer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,13 @@ void buffer_plain::addOrReplaceAccessorProperties(

size_t buffer_plain::getSize() const { return impl->getSizeInBytes(); }

void buffer_plain::handleRelease() const {
// Try to detach memory object only if impl is going to be released.
// Buffer copy will have pointer to the same impl.
if (impl.use_count() == 1)
impl->detachMemoryObject(impl);
}

} // namespace detail
} // __SYCL_INLINE_VER_NAMESPACE(_V1)
} // namespace sycl
23 changes: 15 additions & 8 deletions sycl/source/detail/event_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -80,17 +80,19 @@ void event_impl::waitInternal() {

void event_impl::setComplete() {
if (MHostEvent || !MEvent) {
std::unique_lock<std::mutex> lock(MMutex);
{
std::unique_lock<std::mutex> lock(MMutex);
#ifndef NDEBUG
int Expected = HES_NotComplete;
int Desired = HES_Complete;
int Expected = HES_NotComplete;
int Desired = HES_Complete;

bool Succeeded = MState.compare_exchange_strong(Expected, Desired);
bool Succeeded = MState.compare_exchange_strong(Expected, Desired);

assert(Succeeded && "Unexpected state of event");
assert(Succeeded && "Unexpected state of event");
#else
MState.store(static_cast<int>(HES_Complete));
MState.store(static_cast<int>(HES_Complete));
#endif
}
cv.notify_all();
return;
}
Expand Down Expand Up @@ -144,8 +146,8 @@ event_impl::event_impl(RT::PiEvent Event, const context &SyclContext)
}

event_impl::event_impl(const QueueImplPtr &Queue)
: MQueue{Queue}, MIsProfilingEnabled{Queue->is_host() ||
Queue->MIsProfilingEnabled} {
: MQueue{Queue},
MIsProfilingEnabled{Queue->is_host() || Queue->MIsProfilingEnabled} {
this->setContextImpl(Queue->getContextImplPtr());

if (Queue->is_host()) {
Expand Down Expand Up @@ -429,6 +431,11 @@ void event_impl::cleanDepEventsThroughOneLevel() {
}
}

bool event_impl::isCompleted() {
return get_info<info::event::command_execution_status>() ==
info::event_command_status::complete;
}

} // namespace detail
} // __SYCL_INLINE_VER_NAMESPACE(_V1)
} // namespace sycl
2 changes: 2 additions & 0 deletions sycl/source/detail/event_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,8 @@ class event_impl {
/// state.
bool isInitialized() const noexcept { return MIsInitialized; }

bool isCompleted();

void attachEventToComplete(const EventImplPtr &Event) {
std::lock_guard<std::mutex> Lock(MMutex);
MPostCompleteEvents.push_back(Event);
Expand Down
58 changes: 57 additions & 1 deletion sycl/source/detail/global_handler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,36 @@
namespace sycl {
__SYCL_INLINE_VER_NAMESPACE(_V1) {
namespace detail {

// Utility class to track references on object.
// Used for Scheduler now and created as thread_local object.
// Origin idea is to track usage of Scheduler from main and other used threads -
// they increment MCounter; and to use but not add extra reference by our
// thread_pool threads. For this control MIncrementCounter class member is used.
template <class ResourceHandler> class ObjectUsageCounter {
public:
ObjectUsageCounter(std::unique_ptr<ResourceHandler> &Obj, bool ModifyCounter)
: MModifyCounter(ModifyCounter), MObj(Obj) {
if (MModifyCounter)
MCounter++;
}
~ObjectUsageCounter() {
if (!MModifyCounter)
return;

MCounter--;
if (!MCounter && MObj)
MObj->releaseResources();
}

private:
static std::atomic_uint MCounter;
bool MModifyCounter;
std::unique_ptr<ResourceHandler> &MObj;
};
template <class ResourceHandler>
std::atomic_uint ObjectUsageCounter<ResourceHandler>::MCounter{0};

using LockGuard = std::lock_guard<SpinLock>;

GlobalHandler::GlobalHandler() = default;
Expand All @@ -47,7 +77,24 @@ T &GlobalHandler::getOrCreate(InstWithLock<T> &IWL, Types... Args) {
return *IWL.Inst;
}

Scheduler &GlobalHandler::getScheduler() { return getOrCreate(MScheduler); }
void GlobalHandler::attachScheduler(Scheduler *Scheduler) {
// The method is used in unit tests only. Do not protect with lock since
// releaseResources will cause dead lock due to host queue release
if (MScheduler.Inst)
MScheduler.Inst->releaseResources();
MScheduler.Inst.reset(Scheduler);
}

Scheduler &GlobalHandler::getScheduler() {
getOrCreate(MScheduler);
registerSchedulerUsage();
return *MScheduler.Inst;
}

void GlobalHandler::registerSchedulerUsage(bool ModifyCounter) {
thread_local ObjectUsageCounter SchedulerCounter(MScheduler.Inst,
ModifyCounter);
}

ProgramManager &GlobalHandler::getProgramManager() {
return getOrCreate(MProgramManager);
Expand Down Expand Up @@ -141,9 +188,18 @@ void GlobalHandler::unloadPlugins() {
GlobalHandler::instance().getPlugins().clear();
}

void GlobalHandler::drainThreadPool() {
if (MHostTaskThreadPool.Inst)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we lock MHostTaskThreadPool.Lock here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if my understanding is wrong but I thought that lock in InstWithLock exists to protect during getOrCreate call to avoid data race for object creation. Drain call is done in releaseResources called on program exit and do not introduce any data races related to it.

MHostTaskThreadPool.Inst->drain();
}

void shutdown() {
// Ensure neither host task is working so that no default context is accessed
// upon its release

if (GlobalHandler::instance().MScheduler.Inst)
GlobalHandler::instance().MScheduler.Inst->releaseResources();

if (GlobalHandler::instance().MHostTaskThreadPool.Inst)
GlobalHandler::instance().MHostTaskThreadPool.Inst->finishAndWait();

Expand Down
5 changes: 5 additions & 0 deletions sycl/source/detail/global_handler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ class GlobalHandler {
GlobalHandler(const GlobalHandler &) = delete;
GlobalHandler(GlobalHandler &&) = delete;

void registerSchedulerUsage(bool ModifyCounter = true);
Scheduler &getScheduler();
ProgramManager &getProgramManager();
Sync &getSync();
Expand All @@ -74,6 +75,10 @@ class GlobalHandler {
static void registerDefaultContextReleaseHandler();

void unloadPlugins();
void drainThreadPool();

// For testing purposes only
void attachScheduler(Scheduler *Scheduler);

private:
friend void releaseDefaultContexts();
Expand Down
Loading