-
Notifications
You must be signed in to change notification settings - Fork 14.6k
Description
Bugzilla Link | 26859 |
Resolution | FIXED |
Resolved on | Oct 12, 2018 09:56 |
Version | trunk |
OS | All |
Depends On | #32148 |
Blocks | #29972 |
CC | @adibiagio,@RKSimon |
Extended Description
Packed horizontal add - phaddd / phaddw:
These are SSSE3 (yes, 3 Ss) instructions that should probably never be generated for performance reasons, only to save on code size. They're just about guaranteed to be slow because they operate across vector lanes.
Here we're not only generating these things, but for a target that doesn't have SSSE3:
$ cat accum.c
int please_no_phaddd(int *x) {
int sum = 0;
for (int i=0; i<1024; ++i)
sum += x[i];
return sum;
}
short please_no_phaddw(short *x) {
short sum = 0;
for (int i=0; i<1024; ++i)
sum += x[i];
return sum;
}
bin $ ./clang -O2 -S -o - accum.c -msse -fno-unroll-loops|grep phadd
.globl _please_no_phaddd
_please_no_phaddd: ## @please_no_phaddd
phaddd %xmm1, %xmm1
.globl _please_no_phaddw
_please_no_phaddw: ## @please_no_phaddw
phaddw %xmm0, %xmm0