Skip to content

Missed optimization: unreachable branch could be pruned after taking into account the possible values of a constexpr lookup table #125003

@inicula

Description

@inicula

Context

Consider the following code:

#include <array>
#include <cstddef>

constexpr size_t MAX_DATA_INDEX = 1024;

constexpr std::array<size_t, 256> GET_DATA_INDEX_V1 = {};

bool foo(unsigned *data, unsigned char first_idx)
{
    size_t second_idx = GET_DATA_INDEX_V1[first_idx];
    if (second_idx >= MAX_DATA_INDEX)
        return false;

    data[second_idx] = true;
    return true;
}

constexpr std::array<size_t, 256> GET_DATA_INDEX_V2 = {0, 1, 2, 0, 0, /* ... */};

bool bar(unsigned *data, unsigned char first_idx)
{
    size_t second_idx = GET_DATA_INDEX_V2[first_idx];
    if (second_idx >= MAX_DATA_INDEX)
        return false;

    data[second_idx] = true;
    return true;
}

And the assembly that gets generated with clang 19.1.0 (and trunk), with -O3 enabled:

foo(unsigned int*, unsigned char):
        mov     dword ptr [rdi], 1
        mov     al, 1
        ret

bar(unsigned int*, unsigned char):
        mov     eax, esi
        lea     rcx, [rip + GET_DATA_INDEX_V2]
        mov     rax, qword ptr [rcx + 8*rax]
        cmp     rax, 1023
        ja      .LBB1_2
        mov     dword ptr [rdi + 4*rax], 1
.LBB1_2:
        cmp     rax, 1024
        setb    al
        ret

GET_DATA_INDEX_V2:
        .quad   0
        .quad   1
        .quad   2
        .zero   2024

Discussion

When the lookup table is filled with a single value (e.g. zero), like in foo(), clang is able to get rid of the second_idx >= MAX_DATA_INDEX branch, because it sees at compile time that the only possible value that second_idx can have is zero, which is less than MAX_DATA_INDEX.

But in bar(), where the lookup table is filled with other values besides 0, clang doesn't see that all possible values of second_idx satisfy second_idx < MAX_DATA_INDEX, even though this is also derivable at compile time. So the second_idx >= MAX_DATA_INDEX branch is also redundant in bar(), and ideally that branch would be deleted.

How hard would it be to add an optimization to clang/LLVM such that the second_idx >= MAX_DATA_INDEX branch in bar() gets pruned?

You could imagine some code that has a bunch of transformation layers through which it passes an initial index (first_idx -> second_idx -> third_idx -> ... -> last_idx). And when moving from one layer to the next, it does bounds checking to verify that the current index is less than MAX_DATA_INDEX. In this case, it seems like this kind of optimization could get rid of a pretty significant amount of effectively dead code. If this were to work with more complex predicates (and not just bounds checking), then the utility would be even greater.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions