Skip to content

CUDA: add mean operation #14313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 22, 2025
Merged

CUDA: add mean operation #14313

merged 3 commits into from
Jun 22, 2025

Conversation

am17an
Copy link
Collaborator

@am17an am17an commented Jun 21, 2025

Refactor sum_rows to use also do norm. Added a performance test as well. Sum-rows and mean can be abstracted even more but I think it's a cleaner API to keep them like this.

Backend Device us/run Bandwidth Speedup
CPU Ryzen 3800XT 8-core 116.68 6.30 1.00
GPU RTX 3090 2.72 us 270.31 42.9

@github-actions github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jun 21, 2025
@am17an am17an requested a review from JohannesGaessler June 21, 2025 07:48
Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're already working on reduction ops, you could take a look at the discussion in ggml-org/ggml#1005 . The person who said they'd do it has so far not delivered anything so I think it's safe to say they won't in the future.

For the CUDA code specifically my preference would be to have the reduction ops in a single file so that the template and the code using it is close together but this is a minor issue.

@am17an am17an merged commit aa064b2 into ggml-org:master Jun 22, 2025
87 of 88 checks passed
@am17an am17an deleted the cuda_add_mean branch June 22, 2025 04:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants