Skip to content

Support std::memcpy or improve detail::memcpy #3816

@Pennycook

Description

@Pennycook

Is your feature request related to a problem? Please describe
As noted in #3815, the performance behavior of sycl::detail::memcpy is different to the performance behavior of std::memcpy. In my tests, performance is up to 2x better with std::memcpy.

Describe the solution you would like
I think there are two options:

  1. Support std::memcpy in device code.
    This appears to work already, but the function isn't explicitly listed here. This is my preferred solution, because it would allow us to call std::memcpy in the implementation and for users to call std::memcpy in their kernels.

  2. Implement sycl::detail::memcpy the same way as std::memcpy where possible.
    The implementation of sycl::detail::memcpy here is just a simple loop, and the compiler doesn't seem to optimize this as aggressively as it does std::memcpy. Making sycl::detail::memcpy faster wouldn't help user code, but would improve performance for those parts of the implementation currently relying on it.

Describe alternatives you have considered
Calling __builtin_memcpy might also work, but adding a third variant of memcpy to the mix seems more confusing.

Additional context
I think there are other headers that currently assume std::memcpy works in device kernels, and I wouldn't be surprised if there was also user code relying on this behavior.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions