Skip to content

[SYCL] Fix kernel program cache for multiple devices and refactor some unit tests #5017

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jan 11, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions sycl/source/detail/kernel_program_cache.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ KernelProgramCache::~KernelProgramCache() {
Plugin.call<PiApiKind::piKernelRelease>(Kern);
}
}
MKernelsPerProgramCache.erase(KernIt);
}

const detail::plugin &Plugin = MParentContext->getPlugin();
Expand Down
7 changes: 5 additions & 2 deletions sycl/source/detail/program_manager/program_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1813,7 +1813,11 @@ device_image_plain ProgramManager::build(const device_image_plain &DeviceImage,
// Cache supports key with once device only, but here we have multiple
// devices a program is built for, so add the program to the cache for all
// other devices.
auto CacheOtherDevices = [ResProgram]() { return ResProgram; };
const detail::plugin &Plugin = ContextImpl->getPlugin();
auto CacheOtherDevices = [ResProgram, &Plugin]() {
Plugin.call<PiApiKind::piProgramRetain>(ResProgram);
return ResProgram;
};

// The program for device "0" is already added to the cache during the first
// call to getOrBuild, so starting with "1"
Expand All @@ -1833,7 +1837,6 @@ device_image_plain ProgramManager::build(const device_image_plain &DeviceImage,
// devive_image_impl shares ownership of PIProgram with, at least, program
// cache. The ref counter will be descremented in the destructor of
// device_image_impl
const detail::plugin &Plugin = ContextImpl->getPlugin();
Plugin.call<PiApiKind::piProgramRetain>(ResProgram);

DeviceImageImplPtr ExecImpl = std::make_shared<detail::device_image_impl>(
Expand Down
1 change: 0 additions & 1 deletion sycl/unittests/helpers/CommonRedefinitions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
//
//===----------------------------------------------------------------------===//

#include <CL/sycl.hpp>
#include <helpers/PiImage.hpp>
#include <helpers/PiMock.hpp>

Expand Down
1 change: 0 additions & 1 deletion sycl/unittests/helpers/PiImage.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@

#pragma once

#include <CL/sycl.hpp>
#include <CL/sycl/detail/common.hpp>
#include <CL/sycl/detail/pi.hpp>
#include <detail/platform_impl.hpp>
Expand Down
4 changes: 3 additions & 1 deletion sycl/unittests/helpers/PiMock.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,11 @@

#pragma once

#include <CL/sycl.hpp>
#include <CL/sycl/detail/common.hpp>
#include <CL/sycl/detail/pi.hpp>
#include <CL/sycl/device_selector.hpp>
#include <CL/sycl/platform.hpp>
#include <CL/sycl/queue.hpp>
#include <detail/platform_impl.hpp>

#include <functional>
Expand Down
55 changes: 55 additions & 0 deletions sycl/unittests/helpers/TestKernel.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
//===----------------------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#pragma once

#include "PiImage.hpp"

class TestKernel;

__SYCL_INLINE_NAMESPACE(cl) {
namespace sycl {
namespace detail {
template <> struct KernelInfo<TestKernel> {
static constexpr unsigned getNumParams() { return 0; }
static const kernel_param_desc_t &getParamDesc(int) {
static kernel_param_desc_t Dummy;
return Dummy;
}
static constexpr const char *getName() { return "TestKernel"; }
static constexpr bool isESIMD() { return false; }
static constexpr bool callsThisItem() { return false; }
static constexpr bool callsAnyThisFreeFunction() { return false; }
};

} // namespace detail
} // namespace sycl
} // __SYCL_INLINE_NAMESPACE(cl)

static sycl::unittest::PiImage generateDefaultImage() {
using namespace sycl::unittest;

PiPropertySet PropSet;

std::vector<unsigned char> Bin{0, 1, 2, 3, 4, 5}; // Random data

PiArray<PiOffloadEntry> Entries = makeEmptyKernels({"TestKernel"});

PiImage Img{PI_DEVICE_BINARY_TYPE_SPIRV, // Format
__SYCL_PI_DEVICE_BINARY_TARGET_SPIRV64, // DeviceTargetSpec
"", // Compile options
"", // Link options
std::move(Bin),
std::move(Entries),
std::move(PropSet)};

return Img;
}

static sycl::unittest::PiImage Img = generateDefaultImage();
static sycl::unittest::PiImageArray<1> ImgArray{&Img};
1 change: 1 addition & 0 deletions sycl/unittests/kernel-and-program/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
add_sycl_unittest(KernelAndProgramTests OBJECT
Cache.cpp
MultipleDevsCache.cpp
KernelRelease.cpp
KernelInfo.cpp
DeviceInfo.cpp
Expand Down
3 changes: 1 addition & 2 deletions sycl/unittests/kernel-and-program/Cache.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,9 @@ struct MockKernelInfo {
template <> struct KernelInfo<TestKernel> : public MockKernelInfo {
static constexpr const char *getName() { return "TestKernel"; }
};

template <> struct KernelInfo<TestKernel2> : public MockKernelInfo {
static constexpr const char *getName() { return "TestKernel2"; }
};

} // namespace detail
} // namespace sycl
} // __SYCL_INLINE_NAMESPACE(cl)
Expand Down Expand Up @@ -117,6 +115,7 @@ static pi_result redefinedKernelCreate(pi_program program,
pi_kernel *ret_kernel) {
return PI_SUCCESS;
}

static pi_result redefinedKernelRelease(pi_kernel kernel) { return PI_SUCCESS; }

class KernelAndProgramCacheTest : public ::testing::Test {
Expand Down
184 changes: 184 additions & 0 deletions sycl/unittests/kernel-and-program/MultipleDevsCache.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
//==--- KPCache.cpp --- KernelProgramCache for multiple devices unit test --==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#define SYCL2020_DISABLE_DEPRECATION_WARNINGS

#include "detail/context_impl.hpp"
#include "detail/kernel_bundle_impl.hpp"
#include "detail/kernel_program_cache.hpp"
#include <helpers/CommonRedefinitions.hpp>
#include <helpers/PiImage.hpp>
#include <helpers/PiMock.hpp>
#include <helpers/TestKernel.hpp>

#include <gtest/gtest.h>

#include <iostream>

using namespace sycl;

static pi_result redefinedContextCreate(
const pi_context_properties *properties, pi_uint32 num_devices,
const pi_device *devices,
void (*pfn_notify)(const char *errinfo, const void *private_info, size_t cb,
void *user_data),
void *user_data, pi_context *ret_context) {
*ret_context = reinterpret_cast<pi_context>(123);
return PI_SUCCESS;
}

static pi_result redefinedContextRelease(pi_context context) {
return PI_SUCCESS;
}

static pi_result redefinedDevicesGet(pi_platform platform,
pi_device_type device_type,
pi_uint32 num_entries, pi_device *devices,
pi_uint32 *num_devices) {
if (num_devices) {
*num_devices = static_cast<pi_uint32>(2);
return PI_SUCCESS;
}

if (num_entries == 2 && devices) {
devices[0] = reinterpret_cast<pi_device>(1111);
devices[1] = reinterpret_cast<pi_device>(2222);
}
return PI_SUCCESS;
}

static pi_result redefinedDeviceGetInfo(pi_device device,
pi_device_info param_name,
size_t param_value_size,
void *param_value,
size_t *param_value_size_ret) {
if (param_name == PI_DEVICE_INFO_TYPE) {
auto *Result = reinterpret_cast<_pi_device_type *>(param_value);
*Result = PI_DEVICE_TYPE_GPU;
}
if (param_name == PI_DEVICE_INFO_COMPILER_AVAILABLE) {
auto *Result = reinterpret_cast<pi_bool *>(param_value);
*Result = true;
}
return PI_SUCCESS;
}

static pi_result redefinedDeviceRetain(pi_device device) { return PI_SUCCESS; }

static pi_result redefinedDeviceRelease(pi_device device) { return PI_SUCCESS; }

static pi_result redefinedQueueCreate(pi_context context, pi_device device,
pi_queue_properties properties,
pi_queue *queue) {
*queue = reinterpret_cast<pi_queue>(1234);
return PI_SUCCESS;
}

static pi_result redefinedQueueRelease(pi_queue command_queue) {
return PI_SUCCESS;
}

static size_t ProgramNum = 12345;
static pi_result redefinedProgramCreate(pi_context context, const void *il,
size_t length,
pi_program *res_program) {
size_t CurrentProgram = ProgramNum;
*res_program = reinterpret_cast<pi_program>(CurrentProgram);
++ProgramNum;
return PI_SUCCESS;
}

static int RetainCounter = 0;
static pi_result redefinedProgramRetain(pi_program program) {
++RetainCounter;
return PI_SUCCESS;
}

static int KernelReleaseCounter = 0;
static pi_result redefinedKernelRelease(pi_kernel kernel) {
++KernelReleaseCounter;
return PI_SUCCESS;
}

class MultipleDeviceCacheTest : public ::testing::Test {
public:
MultipleDeviceCacheTest() : Plt{default_selector()} {}

protected:
void SetUp() override {
if (Plt.is_host() || Plt.get_backend() != backend::opencl) {
return;
}

Mock = std::make_unique<unittest::PiMock>(Plt);

setupDefaultMockAPIs(*Mock);
Mock->redefine<detail::PiApiKind::piDevicesGet>(redefinedDevicesGet);
Mock->redefine<detail::PiApiKind::piDeviceGetInfo>(redefinedDeviceGetInfo);
Mock->redefine<detail::PiApiKind::piDeviceRetain>(redefinedDeviceRetain);
Mock->redefine<detail::PiApiKind::piDeviceRelease>(redefinedDeviceRelease);
Mock->redefine<detail::PiApiKind::piContextCreate>(redefinedContextCreate);
Mock->redefine<detail::PiApiKind::piContextRelease>(
redefinedContextRelease);
Mock->redefine<detail::PiApiKind::piQueueCreate>(redefinedQueueCreate);
Mock->redefine<detail::PiApiKind::piQueueRelease>(redefinedQueueRelease);
Mock->redefine<detail::PiApiKind::piProgramRetain>(redefinedProgramRetain);
Mock->redefine<detail::PiApiKind::piProgramCreate>(redefinedProgramCreate);
Mock->redefine<detail::PiApiKind::piKernelRelease>(redefinedKernelRelease);
}

protected:
std::unique_ptr<unittest::PiMock> Mock;
platform Plt;
};

// Test that program is retained for each device and each kernel is released
// once
TEST_F(MultipleDeviceCacheTest, ProgramRetain) {
if (Plt.is_host() || Plt.get_backend() != backend::opencl) {
return;
}
{
std::vector<sycl::device> Devices = Plt.get_devices(info::device_type::gpu);
sycl::context Context(Devices);
sycl::queue Queue(Context, Devices[0]);
assert(Devices.size() == 2);

auto Bundle = cl::sycl::get_kernel_bundle<sycl::bundle_state::input>(
Queue.get_context());
Queue.submit(
[&](cl::sycl::handler &cgh) { cgh.single_task<TestKernel>([]() {}); });

auto BundleObject = cl::sycl::build(Bundle, Bundle.get_devices());
auto KernelID = cl::sycl::get_kernel_id<TestKernel>();
auto Kernel = BundleObject.get_kernel(KernelID);

// Because of emulating 2 devices program is retained for each one in
// build(). It is also depends on number of device images. This test has one
// image, but other tests can create other images. Additional variable is
// added to control count of piProgramRetain calls
auto BundleImpl = getSyclObjImpl(Bundle);
int NumRetains = BundleImpl->size() * 2;

EXPECT_EQ(RetainCounter, NumRetains)
<< "Expect " << NumRetains << " piProgramRetain calls";

auto CtxImpl = detail::getSyclObjImpl(Context);
detail::KernelProgramCache::KernelCacheT &KernelCache =
CtxImpl->getKernelProgramCache().acquireKernelsPerProgramCache().get();

EXPECT_EQ(KernelCache.size(), (size_t)2) << "Expect 2 kernels in cache";
}
// First kernel creating is called in handler::single_task().
// kernel_bundle::get_kernel() creates a kernel and shares it with created
// programs. Also the kernel is retained in kernel_bundle::get_kernel(). A
// kernel is removed from cache if piKernelRelease was called for it, so it
// will not be removed twice for the other programs. As a result we must
// expect 3 piKernelRelease calls.
EXPECT_EQ(KernelReleaseCounter, 3) << "Expect 3 piKernelRelease calls";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add comment describing how number 3 was retrieved. Why is it expected and not 2 or four?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why not test to see if KernelReleaseCounter is equal to the RetainCounter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added clarifying comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why not test to see if KernelReleaseCounter is equal to the RetainCounter?

AFAIR, there's some mem-leak test which checks for retain-release parity.
Although, I'm not sure it checks for kernels.

}
4 changes: 3 additions & 1 deletion sycl/unittests/pi/PiMock.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,12 @@
//
//===----------------------------------------------------------------------===//

#include <gtest/gtest.h>
#include <helpers/PiMock.hpp>

#include <detail/queue_impl.hpp>

#include <gtest/gtest.h>

using namespace cl::sycl;

pi_result piProgramBuildRedefine(pi_program, pi_uint32, const pi_device *,
Expand Down
3 changes: 2 additions & 1 deletion sycl/unittests/pi/TestGetPlatforms.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@

#pragma once

#include <CL/sycl.hpp>
#include <CL/sycl/platform.hpp>

#include <algorithm>
#include <functional>
#include <vector>
Expand Down
Loading