From a8ad72c414ce0dcf89d549b6e2e9f0cefb846b45 Mon Sep 17 00:00:00 2001
From: Ruyman Reyes <ruyman@codeplay.com>
Date: Fri, 28 Feb 2020 11:33:36 +0000
Subject: [PATCH 1/2] [SYCL][CUDA] Improv. CUDA backend documentation

Co-Authored-By: Alexander Johnston <alexander@codeplay.com>
Signed-off-by: Ruyman Reyes <ruyman@codeplay.com>
---
 sycl/CMakeLists.txt         |  3 +--
 sycl/doc/GetStartedGuide.md | 46 ++++++++++++++++++++++++++++++++++---
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/sycl/CMakeLists.txt b/sycl/CMakeLists.txt
index a7c9ef2732f38..f1c8bb9b07f28 100644
--- a/sycl/CMakeLists.txt
+++ b/sycl/CMakeLists.txt
@@ -142,8 +142,7 @@ install(DIRECTORY ${OPENCL_INCLUDE}/CL
 )
 
 option(SYCL_BUILD_PI_CUDA
-  "Selects the PI API backend. When set to ON, the CUDA backend is selected. \
-   When set to OFF, the OpenCL backend is selected." OFF)
+  "Enables the CUDA backend for the Plugin Interface" OFF)
 
 # Configure SYCL version macro
 set(sycl_inc_dir ${CMAKE_CURRENT_SOURCE_DIR}/include)
diff --git a/sycl/doc/GetStartedGuide.md b/sycl/doc/GetStartedGuide.md
index 6a60594bbee24..d52821c0b5e28 100644
--- a/sycl/doc/GetStartedGuide.md
+++ b/sycl/doc/GetStartedGuide.md
@@ -123,10 +123,15 @@ should be used.
 
 There is experimental support for DPC++ for CUDA devices.
 
-To enable support for CUDA devices, the following arguments need to be added to
-the CMake command when building the DPC++ compiler.
+To enable support for CUDA devices, follow the instructions for the Linux
+DPC++ toolchain, but replace the cmake command with the following one:
+
 
 ```
+cmake -DCMAKE_BUILD_TYPE=Release \
+-DLLVM_EXTERNAL_PROJECTS="llvm-spirv;sycl" \
+-DLLVM_EXTERNAL_SYCL_SOURCE_DIR=$SYCL_HOME/llvm/sycl \
+-DLLVM_EXTERNAL_LLVM_SPIRV_SOURCE_DIR=$SYCL_HOME/llvm/llvm-spirv \
 -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda/ \
 -DLLVM_ENABLE_PROJECTS="clang;llvm-spirv;sycl;libclc" \
 -DSYCL_BUILD_PI_CUDA=ON \
@@ -145,6 +150,24 @@ above.
 
 # Use DPC++ toolchain
 
+## Using the SYCL toolchain on CUDA platforms
+
+The SYCL toolchain support on CUDA platforms is still in an experimental phase.
+Currently, the SYCL toolchain relies on having a recent OpenCL implementation
+on the system in order to link applications to the SYCL runtime.
+The OpenCL implementation is not used at runtime if only the CUDA backend is 
+used in the application, but must be installed.
+
+The OpenCL implementation provided by the CUDA SDK is OpenCL 1.2, which is
+too old to link with the SYCL runtime and lacks some symbols.
+
+We recommend installing the low level CPU runtime, following the instructions 
+in the next section.
+
+Instead of installing the low level CPU runtime, it is possible to build and 
+install the [Khronos ICD loader](https://github.com/KhronosGroup/OpenCL-ICD-Loader), 
+which contains all the symbols required.
+
 ## Install low level runtime
 
 To run DPC++ applications on OpenCL devices, OpenCL implementation(s) must be
@@ -262,6 +285,9 @@ ninja check-all
 If no OpenCL GPU/CPU runtimes are available, the corresponding tests are
 skipped.
 
+If CUDA support has been built, it is tested only if there are CUDA devices 
+available.
+
 ### Run Khronos\* SYCL\* conformance test suite (optional)
 
 Khronos\* SYCL\* conformance test suite (CTS) is intended to validate
@@ -394,6 +420,19 @@ clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda-sycldevice \
 This `simple-sycl-app.exe` application doesn't specify SYCL device for
 execution, so SYCL runtime will use `default_selector` logic to select one
 of accelerators available in the system or SYCL host device.
+In this case, the behaviour of the `default_selector` can be altered 
+using the `SYCL_BE` environment variable, setting `PI_CUDA` forces
+the usage of the CUDA backend (if available), `PI_OPENCL` will
+force the usage of the OpenCL backend.
+
+```bash
+SYCL_BE=PI_CUDA ./simple-sycl-app-cuda.exe
+```
+
+The default is the OpenCL backend if available.
+If there are no OpenCL or CUDA devices available, the SYCL host device is used.
+The SYCL host device executes the SYCL application directly in the host,
+without using any low-level API.
 
 Note: `nvptx64-nvidia-cuda-sycldevice` is usable with `-fsycl-targets`
 if clang was built with the cmake option `SYCL_BUILD_PI_CUDA=ON`.
@@ -403,6 +442,7 @@ if clang was built with the cmake option `SYCL_BUILD_PI_CUDA=ON`.
 ./simple-sycl-app.exe
 The results are correct!
 ```
+
 **Note**:
 Currently, when the application has been built with the CUDA target, the CUDA
 backend must be selected at runtime using the `SYCL_BE` environment variable.
@@ -411,7 +451,7 @@ backend must be selected at runtime using the `SYCL_BE` environment variable.
 SYCL_BE=PI_CUDA ./simple-sycl-app-cuda.exe
 ```
 
-NOTE: DPC++/SYCL developer can specify SYCL device for execution using device
+NOTE: DPC++/SYCL developers can specify SYCL device for execution using device
 selectors (e.g. `cl::sycl::cpu_selector`, `cl::sycl::gpu_selector`,
 [Intel FPGA selector(s)](extensions/IntelFPGA/FPGASelector.md)) as
 explained in following section [Code the program for a specific

From d2641e5e81be9ecc434a8c5d334f2d39929dcc16 Mon Sep 17 00:00:00 2001
From: Ruyman Reyes <ruyman@codeplay.com>
Date: Mon, 16 Mar 2020 15:27:12 +0000
Subject: [PATCH 2/2] Addressing comments from review

---
 sycl/doc/GetStartedGuide.md | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/sycl/doc/GetStartedGuide.md b/sycl/doc/GetStartedGuide.md
index d52821c0b5e28..f7de12125cc03 100644
--- a/sycl/doc/GetStartedGuide.md
+++ b/sycl/doc/GetStartedGuide.md
@@ -130,8 +130,8 @@ DPC++ toolchain, but replace the cmake command with the following one:
 ```
 cmake -DCMAKE_BUILD_TYPE=Release \
 -DLLVM_EXTERNAL_PROJECTS="llvm-spirv;sycl" \
--DLLVM_EXTERNAL_SYCL_SOURCE_DIR=$SYCL_HOME/llvm/sycl \
--DLLVM_EXTERNAL_LLVM_SPIRV_SOURCE_DIR=$SYCL_HOME/llvm/llvm-spirv \
+-DLLVM_EXTERNAL_SYCL_SOURCE_DIR=$DPCPP_HOME/llvm/sycl \
+-DLLVM_EXTERNAL_LLVM_SPIRV_SOURCE_DIR=$DPCPP_HOME/llvm/llvm-spirv \
 -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda/ \
 -DLLVM_ENABLE_PROJECTS="clang;llvm-spirv;sycl;libclc" \
 -DSYCL_BUILD_PI_CUDA=ON \
@@ -150,16 +150,16 @@ above.
 
 # Use DPC++ toolchain
 
-## Using the SYCL toolchain on CUDA platforms
+## Using the DPC++ toolchain on CUDA platforms
 
-The SYCL toolchain support on CUDA platforms is still in an experimental phase.
-Currently, the SYCL toolchain relies on having a recent OpenCL implementation
-on the system in order to link applications to the SYCL runtime.
+The DPC++ toolchain support on CUDA platforms is still in an experimental phase.
+Currently, the DPC++ toolchain relies on having a recent OpenCL implementation
+on the system in order to link applications to the DPC++ runtime.
 The OpenCL implementation is not used at runtime if only the CUDA backend is 
 used in the application, but must be installed.
 
 The OpenCL implementation provided by the CUDA SDK is OpenCL 1.2, which is
-too old to link with the SYCL runtime and lacks some symbols.
+too old to link with the DPC++ runtime and lacks some symbols.
 
 We recommend installing the low level CPU runtime, following the instructions 
 in the next section.