Skip to content

First pull request to implicit sycl graph #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 44 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
7c62056
Inital version of sycl graph prototype
reble Feb 18, 2022
59bb7da
Adding initial sycl graph doc
reble Feb 18, 2022
528017a
Reusing command list for re-execution (WIP)
reble Jan 7, 2022
aee48a5
Adding lazy execution property to queue
reble Feb 15, 2022
24fa5a9
fix merge
reble Feb 22, 2022
b7ce271
Update pi_level_zero.cpp
reble Feb 22, 2022
f3d30ed
update extension proposal started to incorporate feedback
reble Mar 11, 2022
0e96d12
typo
reble Mar 11, 2022
8f1a8dc
Apply suggestions from code review
reble Mar 14, 2022
a8c7265
fix typos and syntax issues
reble May 3, 2022
5055f59
Merge branch 'sycl' of github.com:reble/llvm into sycl
reble May 3, 2022
60507c1
Propagate lazy queue property
julianmi May 3, 2022
d0069e5
Update pi_level_zero.cpp
reble May 25, 2022
ed74e86
Update pi_level_zero.cpp
reble May 25, 2022
38a6fda
Avoid flag redefinition and fix semantics
reble May 26, 2022
058c875
Update pi_level_zero.cpp
reble May 26, 2022
ff540ee
replaced with Julian's graph.hpp
Jun 15, 2022
47251a2
added makefile and run.sh
Jun 15, 2022
aa03781
updated makefile
Jun 15, 2022
cae401e
added submit with executable_graph
Jun 15, 2022
eead9e5
added submit definition
Jun 15, 2022
ef1cf00
added capture_mode
Jun 15, 2022
d396da8
fixed typo
Jun 15, 2022
5ae2628
added two capture APIs
Jun 16, 2022
345683e
added is_lazy and is_capture APIs
Jun 16, 2022
1aa81e0
queue is eager if both properties are not set
Jun 16, 2022
238028c
inlined functions in graph.hpp
Jun 17, 2022
b0ff69d
added APIs to query #nodes and edges
Jun 22, 2022
5382989
Modefied parallel_for and add_node to support q.parallel_for
Jun 22, 2022
73324ca
q.parallel_for can take one event
Jun 29, 2022
75ac07d
q.parallel_for can take events
Jun 29, 2022
1de215e
added unittest.cpp
Jul 6, 2022
d7439b2
added std::fill in the test
Jul 6, 2022
6ecbf93
added APIs for capture mode use
cheng-hsiang-chiu Aug 10, 2022
0659d3c
If neither lazy exec nor capture mode
cheng-hsiang-chiu Aug 10, 2022
c3f492f
Added in_capture variable in queue_impl::wait()
cheng-hsiang-chiu Aug 10, 2022
16c9904
Added two APIs query is_lazy() and is_capture()
cheng-hsiang-chiu Aug 10, 2022
14d691d
Added and Modified APIs
cheng-hsiang-chiu Aug 10, 2022
da762c0
Added comments
cheng-hsiang-chiu Aug 10, 2022
7f1ef1d
Overloaded parallel_for()
cheng-hsiang-chiu Aug 10, 2022
4530a2c
Deleted debugging msg
cheng-hsiang-chiu Aug 12, 2022
a056e08
Deleted debugging msg
cheng-hsiang-chiu Aug 12, 2022
95bb4a9
Deleted debugging msg
cheng-hsiang-chiu Aug 12, 2022
2442991
Added comments and Deleted debugging msg
cheng-hsiang-chiu Aug 12, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
274 changes: 274 additions & 0 deletions sycl/doc/extensions/experimental/SYCL_EXT_ONEAPI_GRAPH.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,274 @@
= SYCL_EXT_ONEAPI_GRAPH
:source-highlighter: coderay
:coderay-linenums-mode: table

// This section needs to be after the document title.
:doctype: book
:toc2:
:toc: left
:encoding: utf-8
:lang: en

:blank: pass:[ +]

// Set the default source code type in this document to C++,
// for syntax highlighting purposes. This is needed because
// docbook uses c++ and html5 uses cpp.
:language: {basebackend@docbook:c++:cpp}

== Notice

Copyright (c) 2022 Intel Corporation. All rights reserved.

IMPORTANT: This specification is a draft.

NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are
trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc.
used by permission by Khronos.

NOTE: This document is better viewed when rendered as html with asciidoctor.
GitHub does not render image icons.

This extension is written against the SYCL 2020 revision 4 specification. All
references below to the "core SYCL specification" or to section numbers in the
SYCL specification refer to that revision.

NOTE: This extension is experimental: interfaces are subject to change later.

== Introduction

This extension introduces an interface that enables a lazy execution and easy replay of a kernel graph by separating
its definition and execution.

== Feature test macro

This extension provides a feature-test macro as described in the core SYCL
specification section 6.3.3 "Feature test macros". Therefore, an
implementation supporting this extension must predefine the macro
`SYCL_EXT_ONEAPI_GRAPH` to one of the values defined in the table below.
Applications can test for the existence of this macro to determine if the
implementation supports this feature, or applications can test the macro's
value to determine which of the extension's APIs the implementation supports.

Table 1. Values of the `SYCL_EXT_ONEAPI_GRAPH` macro.
[%header,cols="1,5"]
|===
|Value |Description
|1 |Initial extension version. Base features are supported.
|===

== SYCL Graph Terminology

Table 2. Terminology.
|===
|Concept|Description
|graph| Class that stores structured work units and their dependencies
|node| The unit of work. Can have different attributes.
|edge| Dependency between work units. Happens-before relation.
|===

== Node

Node is a class that encapsulates tasks like SYCL kernel functions or host tasks for deferred execution.
A graph has to be created first, the structure of a graph is defined second by adding nodes and edges.

[source,c++]
----
namespace sycl::ext::oneapi::experimental {

class node{
};
}
----

NOTE:

== Edge

A dependency between two nodes representing a happens-before relationship. `sender` and `receiver` may be associated to different graphs.

[source,c++]
----
namespace sycl::ext::oneapi::experimental {

// Adding dependency between two nodes.
void make_edge(node sender, node receiver);
}
----

== Graph

Graph is a class that represents a directed acyclic graph of nodes.
A graph can have different states, can be nested, can have multiple root nodes that are scheduled for execution first and multiple leaf nodes that are scheduled for execution last. The execution of a graph has been completed when all leaf node tasks have been completed.
Member functions as listed in Table 2 and 3 can be used to add nodes to a graph.

[source,c++]
----
namespace sycl::ext::oneapi::experimental {

enum class graph_state{
modifiable,
executable
};

template<graph_state State>
class graph {
public:
operator graph<graph_state::executable>();
};

graph<graph_state::modifiable> make_graph();

graph<graph_state::executable> compile(const graph<graph_state::modifiable> Graph);

}

sycl::event sycl::queue(const graph<graph_state::executable> Graph);

----

=== Executable Graph

A `graph` object in `graph_state::executable` represents a user generated device and context specific execution object that is submitted to a queue for execution.
The structure of such a `graph` object in this state is immutable and cannot be changed, so are the tasks assigned with each node.
Support of submitting a graph for execution, before a previous execution has been completed is backend specific. The runtime may throw an error.

Table 3. Constructor of the `graph` class.
|===
|Constructor|Description

|`graph()`
|Creates a `graph` object. It's default state is `graph_state::modifiable`.

|===

Table 4. Member functions of the `graph` class.
|===
|Member function|Description

|`node add_node(const std::vector<node>& dep = {});`
|This creates an empty node which is associated to no task. Its intended use is either a connection point inside a graph between groups of nodes, and can significantly reduce the number of edges ( O(n) vs. O(n^2) ). Another use-case is building the structure of a graph first and adding tasks later.

|`template<typename T>
node add_node(T cgf, const std::vector<node>& dep = {});`
|This node captures a command group function object containing host task which is scheduled by the SYCL runtime or a SYCL function for invoking kernels with all restrictions that apply as described in the spec.

|===

Table 5. Member functions of the `graph` class (memory operations).
|===
|Member function|Description

|`node add_memcpy_node(void* dest, const void* src, size_t numBytes, const std::vector<node>& dep = {});`
|Adding a node that encapsulates a `memcpy` operation.

|`node add_memset_node(void* ptr, int value, size_t numBytes, const std::vector<node>& dep = {});`
|Adding a node that encapsulates a `memset` operation.

|`node add_malloc_node(void *data, size_t numBytes, usm::alloc kind, const std::vector<node>& dep = {});`
|Adding a node that encapsulates a `malloc` operation.

|`node add_free_node(void *data, const std::vector<node>& dep = {});`
|Adding a node that encapsulates a `free` operation.

|===


== Examples

NOTE: The examples below demonstrate intended usage of the extension, but are not compatible with the proof-of-concept implementation. The proof-of-concept implementation currently requires different syntax, as described in the "Non-implemented features" section at the end of this document.

1. Dot product

[source,c++]
----
...

#include <sycl/ext/oneapi/experimental/graph.hpp>

int main() {
const size_t n = 10;
float alpha = 1.0f;
float beta = 2.0f;
float gamma = 3.0f;

sycl::queue q;

auto g = sycl::ext::oneapi::experimental::make_graph();

float *x = sycl::malloc_shared<float>(n, q);
float *y = sycl::malloc_shared<float>(n, q);
float *z = sycl::malloc_shared<float>(n, q);

float *dotp = sycl::malloc_shared<float>(1, q);

for (int i = 0; i < n; i++) {
x[i] = 1.0f;
y[i] = 2.0f;
z[i] = 3.0f;
}

auto node_a = g.add_node([&](sycl::handler &h) {
h.parallel_for(sycl::range<1>{n}, [=](sycl::id<1> it) {
const size_t i = it[0];
x[i] = alpha * x[i] + beta * y[i];
});
});

auto node_b = g.add_node([&](sycl::handler &h) {
h.parallel_for(sycl::range<1>{n}, [=](sycl::id<1> it) {
const size_t i = it[0];
z[i] = gamma * z[i] + beta * y[i];
});
});

auto node_c = g.add_node(
[&](sycl::handler &h) {
h.parallel_for(sycl::range<1>{n},
sycl::reduction(dotp, 0.0f, std::plus()),
[=](sycl::id<1> it, auto &sum) {
const size_t i = it[0];
sum += x[i] * z[i];
});
},
{node_a, node_b});

auto exec = compile(q);

q.submit(exec).wait();

sycl::free(x, q);
sycl::free(y, q);
sycl::free(z, q);
sycl::free(dotp, q);

return 0;
}


...
----

== Issues for later investigations

. Explicit memory movement can cause POC to stall.

== Non-implemented features
Please, note that the following features are not yet implemented:

. Level Zero backend only
. Memory operation nodes not implemented
. Host node not implemented
. Submit overload of a queue. `submit(graph)` Use a combination of `executable_graph::exec_and_wait()` and queue property `sycl::ext::oneapi::property::queue::lazy_execution{}` instead.
. `class graph<graph_state>` Use dedicated `class graph` (equivalent to `graph_state == modifiable`) and `class executable_graph` (equivalent to `graph_state == executable`) instead.

== Revision History

[cols="5,15,15,70"]
[grid="rows"]
[options="header"]
|========================================
|Rev|Date|Author|Changes
|1|2022-02-11|Pablo Reble|Initial public working draft
|2|2022-03-11|Pablo Reble|Incorporate feedback from PR
|========================================
Loading