Skip to content

[x86, SSE] only use phaddw / phaddd when optimizing for minsize? #27233

@rotateright

Description

@rotateright
Bugzilla Link 26859
Resolution FIXED
Resolved on Oct 12, 2018 09:56
Version trunk
OS All
Depends On #32148
Blocks #29972
CC @adibiagio,@RKSimon

Extended Description

Packed horizontal add - phaddd / phaddw:
These are SSSE3 (yes, 3 Ss) instructions that should probably never be generated for performance reasons, only to save on code size. They're just about guaranteed to be slow because they operate across vector lanes.

Here we're not only generating these things, but for a target that doesn't have SSSE3:

$ cat accum.c
int please_no_phaddd(int *x) {
int sum = 0;
for (int i=0; i<1024; ++i)
sum += x[i];
return sum;
}

short please_no_phaddw(short *x) {
short sum = 0;
for (int i=0; i<1024; ++i)
sum += x[i];
return sum;
}
bin $ ./clang -O2 -S -o - accum.c -msse -fno-unroll-loops|grep phadd
.globl _please_no_phaddd
_please_no_phaddd: ## @​please_no_phaddd
phaddd %xmm1, %xmm1
.globl _please_no_phaddw
_please_no_phaddw: ## @​please_no_phaddw
phaddw %xmm0, %xmm0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions