33
33
// /
34
34
// / The SYCL framework defines command group (\ref CG) as an entity that
35
35
// / represents minimal execution block. The command group is submitted to SYCL
36
- // / queue and consists of a kernel and its requirements. The SYCL queue defines
37
- // / the device and context using which the kernel should be executed.
36
+ // / queue and consists of a kernel or an explicit memory operation, and their
37
+ // / requirements. The SYCL queue defines the device and context using which the
38
+ // / kernel should be executed.
38
39
// /
39
- // / There are also command groups that consist of memory requirements and
40
- // / an explicit memory operation, such as copy, fill, update_host. In this case
41
- // / it's up to an implementation how to implement these operations.
40
+ // / The commands that contain explicit memory operations include copy, fill,
41
+ // / update_host and other operations. It's up to implementation how to define
42
+ // / these operations.
42
43
// /
43
44
// / The relative order of command groups submission defines the order in which
44
45
// / kernels must be executed if their memory requirements intersect. For
93
94
// /
94
95
// / // "Host accessor creation" section
95
96
// / // Request the latest data of BufferC for the moment
96
- // / // This is a synchronization point, which means that the DPC++ RT blocks on creation of
97
- // / // the accessor until requested data is available.
97
+ // / // This is a synchronization point, which means that the DPC++ RT blocks
98
+ // / // on creation of the accessor until requested data is available.
98
99
// / auto C = BufferC.get_access<read>();
99
100
// / }
100
101
// / \endcode
101
102
// /
102
103
// / In the example above the DPC++ RT does the following:
103
104
// /
104
105
// / 1. **Copy command group**.
105
- // / The DPC++ RT allocates memory for BufferA and BufferB on CPU then executes
106
- // / an explicit copy operation on CPU.
106
+ // / The DPC++ RT allocates memory for BufferA and BufferB on CPU then
107
+ // / executes an explicit copy operation on CPU.
107
108
// /
108
109
// / 2. **Multi command group**
109
110
// / DPC++ RT allocates memory for BufferC and BufferB on GPU and copy
@@ -266,8 +267,8 @@ struct MemObjRecord {
266
267
// / executing the first command group memory allocation must be performed.
267
268
// /
268
269
// / At some point Scheduler enqueues commands to the underlying devices. To do
269
- // / this, Scheduler performs topological sort to get the order in which commands should
270
- // / be enqueued. For example, the following graph (D depends on B and C,
270
+ // / this, Scheduler performs topological sort to get the order in which commands
271
+ // / should be enqueued. For example, the following graph (D depends on B and C,
271
272
// / B and C depends on A) will be enqueued in the following order:
272
273
// / \code{.cpp}
273
274
// / EventA = Enqueue(A, /*Deps=*/{});
@@ -308,8 +309,7 @@ struct MemObjRecord {
308
309
// /
309
310
// / \section sched_impl Implementation details
310
311
// /
311
- // / The Scheduler is split up into two parts: graph builder and graph
312
- // / processor.
312
+ // / The Scheduler is split up into two parts: graph builder and graph processor.
313
313
// /
314
314
// / To build dependencies, Scheduler needs to memorize memory objects and
315
315
// / commands that modify them.
@@ -338,9 +338,9 @@ struct MemObjRecord {
338
338
// / 1. errors that happen during command enqueue process
339
339
// / 2. the error that happend during command execution.
340
340
// /
341
- // / If an error occurs during command enqueue process, the Command::enqueue method
342
- // / returns the faulty command. Scheduler then reschedules the command and all
343
- // / dependent commands (if any).
341
+ // / If an error occurs during command enqueue process, the Command::enqueue
342
+ // / method returns the faulty command. Scheduler then reschedules the command
343
+ // / and all dependent commands (if any).
344
344
// /
345
345
// / An error with command processing can happen in underlying runtime, in this
346
346
// / case Scheduler is notified asynchronously (using callback mechanism) what
@@ -378,26 +378,23 @@ class Scheduler {
378
378
379
379
// / Removes buffer from the graph.
380
380
// /
381
- // / The lifetime of memory object descriptor begins when the first command group
382
- // / that uses the memory object is submitted and ends when "removeMemoryObject(...)"
383
- // / method is called which means there will be no command group that uses the
384
- // / memory object. When removeMemoryObject is called Scheduler will enqueue
385
- // / and wait on all release commands associated with the memory object, which
386
- // / effectively guarantees that all commands accessing the memory object are
387
- // / complete and then the resources allocated for the memory object are freed. Then all the
388
- // / commands affecting the memory object are removed.
389
- // /
390
- // / On destruction Scheduler triggers destruction of all memory object
391
- // / descriptors in order to wait on all commands not yet executed and all
392
- // / memory it manages.
381
+ // / The lifetime of memory object descriptor begins when the first command
382
+ // / group that uses the memory object is submitted and ends when
383
+ // / "removeMemoryObject(...)" method is called which means there will be no
384
+ // / command group that uses the memory object. When removeMemoryObject is
385
+ // / called Scheduler will enqueue and wait on all release commands associated
386
+ // / with the memory object, which effectively guarantees that all commands
387
+ // / accessing the memory object are complete and then the resources allocated
388
+ // / for the memory object are freed. Then all the commands affecting the
389
+ // / memory object are removed.
393
390
// /
394
391
// / This member function is used by \ref buffer and \ref image.
395
392
// /
396
393
// / \param MemObj is a memory object that points to the buffer being removed.
397
394
void removeMemoryObject (detail::SYCLMemObjI *MemObj);
398
395
399
- // / Removes finished non-leaf non-alloca commands from the subgraph
400
- // / (assuming that all its commands have been waited for).
396
+ // / Removes finished non-leaf non-alloca commands from the subgraph (assuming
397
+ // / that all its commands have been waited for).
401
398
// / \sa GraphBuilder::cleanupFinishedCommands
402
399
// /
403
400
// / \param FinishedEvent is a cleanup candidate event.
@@ -458,13 +455,12 @@ class Scheduler {
458
455
Command *addCGUpdateHost (std::unique_ptr<detail::CG> CommandGroup,
459
456
QueueImplPtr HostQueue);
460
457
461
- // / Registers a \ref CG "command group" to update memory to the latest
462
- // / state.
458
+ // / Enqueues a command to update memory to the latest state.
463
459
// /
464
460
// / \param Req is a requirement, that describes memory object.
465
461
Command *addCopyBack (Requirement *Req);
466
462
467
- // / Registers a \ref CG " command group" to create a host accessor.
463
+ // / Enqueues a command to create a host accessor.
468
464
// /
469
465
// / \param Req points to memory being accessed.
470
466
Command *addHostAccessor (Requirement *Req, const bool destructor = false );
@@ -483,8 +479,9 @@ class Scheduler {
483
479
// / Reschedules the command passed using Queue provided.
484
480
// /
485
481
// / This can lead to rescheduling of all dependent commands. This can be
486
- // / used when the user provides a "secondary" queue to the submit method which may
487
- // / be used when the command fails to enqueue/execute in the primary queue.
482
+ // / used when the user provides a "secondary" queue to the submit method
483
+ // / which may be used when the command fails to enqueue/execute in the
484
+ // / primary queue.
488
485
void rescheduleCommand (Command *Cmd, QueueImplPtr Queue);
489
486
490
487
// / \return a pointer to the corresponding memory object record for the
@@ -516,7 +513,8 @@ class Scheduler {
516
513
std::vector<SYCLMemObjI *> MMemObjs;
517
514
518
515
private:
519
- // / Inserts the command required to update the memory object state in the context.
516
+ // / Inserts the command required to update the memory object state in the
517
+ // / context.
520
518
// /
521
519
// / Copy/map/unmap operations can be inserted depending on the source and
522
520
// / destination.
@@ -579,26 +577,21 @@ class Scheduler {
579
577
// / Member functions of this class do not modify the graph.
580
578
// /
581
579
// / \section sched_enqueue Command enqueueing
582
- // / \todo lazy mode is not implemented.
583
- // /
584
- // / The Scheduler can work in two modes of enqueueing commands: eager (default)
585
- // / and lazy. In eager mode commands are enqueued whenever they come to the
586
- // / Scheduler. In lazy mode they are not enqueued until the content of the buffer
587
- // / they are accessing is requested by user.
588
580
// /
589
- // / Each command has enqueue method which takes vector of events that
590
- // / represents dependencies and returns event which represents the command.
591
- // / GraphProcessor performs topological sort to get the order in which commands have to
592
- // / be enqueued. Then it enqueues each command, passing a vector of events
593
- // / that this command needs to wait on. If an error happens during command
594
- // / enqueue, the whole process is stopped, the faulty command is propagated back
595
- // / to the Scheduler.
581
+ // / Commands are enqueued whenever they come to the Scheduler. Each command
582
+ // / has enqueue method which takes vector of events that represents
583
+ // / dependencies and returns event which represents the command.
584
+ // / GraphProcessor performs topological sort to get the order in which
585
+ // / commands have to be enqueued. Then it enqueues each command, passing a
586
+ // / vector of events that this command needs to wait on. If an error happens
587
+ // / during command enqueue, the whole process is stopped, the faulty command
588
+ // / is propagated back to the Scheduler.
596
589
// /
597
- // / The command with dependencies that belong to a context different from its own
598
- // / can't be enqueued directly (limitation of OpenCL runtime).
599
- // / Instead, for each dependency, a proxy event is created in the target context
600
- // / and linked using OpenCL callback mechanism with original one. For example,
601
- // / the following SYCL code:
590
+ // / The command with dependencies that belong to a context different from its
591
+ // / own can't be enqueued directly (limitation of OpenCL runtime).
592
+ // / Instead, for each dependency, a proxy event is created in the target
593
+ // / context and linked using OpenCL callback mechanism with original one.
594
+ // / For example, the following SYCL code:
602
595
// /
603
596
// / \code{.cpp}
604
597
// / // The ContextA and ContextB are different OpenCL contexts
0 commit comments