Description
Context
Consider the following code:
#include <array>
#include <cstddef>
constexpr size_t MAX_DATA_INDEX = 1024;
constexpr std::array<size_t, 256> GET_DATA_INDEX_V1 = {};
bool foo(unsigned *data, unsigned char first_idx)
{
size_t second_idx = GET_DATA_INDEX_V1[first_idx];
if (second_idx >= MAX_DATA_INDEX)
return false;
data[second_idx] = true;
return true;
}
constexpr std::array<size_t, 256> GET_DATA_INDEX_V2 = {0, 1, 2, 0, 0, /* ... */};
bool bar(unsigned *data, unsigned char first_idx)
{
size_t second_idx = GET_DATA_INDEX_V2[first_idx];
if (second_idx >= MAX_DATA_INDEX)
return false;
data[second_idx] = true;
return true;
}
And the assembly that gets generated with clang 19.1.0 (and trunk), with -O3
enabled:
foo(unsigned int*, unsigned char):
mov dword ptr [rdi], 1
mov al, 1
ret
bar(unsigned int*, unsigned char):
mov eax, esi
lea rcx, [rip + GET_DATA_INDEX_V2]
mov rax, qword ptr [rcx + 8*rax]
cmp rax, 1023
ja .LBB1_2
mov dword ptr [rdi + 4*rax], 1
.LBB1_2:
cmp rax, 1024
setb al
ret
GET_DATA_INDEX_V2:
.quad 0
.quad 1
.quad 2
.zero 2024
Discussion
When the lookup table is filled with a single value (e.g. zero), like in foo()
, clang is able to get rid of the second_idx >= MAX_DATA_INDEX
branch, because it sees at compile time that the only possible value that second_idx
can have is zero, which is less than MAX_DATA_INDEX
.
But in bar()
, where the lookup table is filled with other values besides 0, clang doesn't see that all possible values of second_idx
satisfy second_idx < MAX_DATA_INDEX
, even though this is also derivable at compile time. So the second_idx >= MAX_DATA_INDEX
branch is also redundant in bar()
, and ideally that branch would be deleted.
How hard would it be to add an optimization to clang/LLVM such that the second_idx >= MAX_DATA_INDEX
branch in bar()
gets pruned?
You could imagine some code that has a bunch of transformation layers through which it passes an initial index (first_idx -> second_idx -> third_idx -> ... -> last_idx
). And when moving from one layer to the next, it does bounds checking to verify that the current index is less than MAX_DATA_INDEX
. In this case, it seems like this kind of optimization could get rid of a pretty significant amount of effectively dead code. If this were to work with more complex predicates (and not just bounds checking), then the utility would be even greater.