Skip to content

[SYCL][CUDA] Wrong global range when sycl::range is passed a prime #10051

@joeatodd

Description

@joeatodd

Describe the bug

I believe #9787 introduces a bug whereby prime sycl::range values will be rounded up, producing an incorrect final global range.

To Reproduce

#include <sycl/sycl.hpp>

int main(){
  sycl::queue q{};
  q.submit([=](sycl::handler& cgh){

   unsigned int x_max = 256;
   unsigned int y_max = 5;
   cgh.parallel_for(sycl::range<2>{x_max, y_max},[=](sycl::item<2> item){
        int x = item.get_id(0);
        int y = item.get_id(1);
        if(y >= y_max)
          printf("Oh no! y: %d\n", y);
    }
   );
  });
}

Compile & run:

clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda mwe.cpp && ./a.out

On my system (RTX 3060) this produces the following grid & block size:

grid:  <<<3, 1, 1>>>
block: <<<2, 256, 1>>>

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcudaCUDA back-end

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions