Skip to content

LLVM and SPIRV-LLVM-Translator pulldown (WW20-21) #3779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1,751 commits into from
May 25, 2021

Conversation

vmaksimo
Copy link
Contributor

@vmaksimo vmaksimo commented May 18, 2021

xling-liao and others added 30 commits May 17, 2021 11:30
1.[bool, char, short] bitfields have the same alignment as unsigned int
2.Adjust alignment on typedef field decls/honor align attribute
3.Fix alignment for scoped enum class
4.Long long bitfield has 4bytes alignment and StorageUnitSize under 32 bit
  compile mode

Differential Revision: https://reviews.llvm.org/D87029
This is to allow disasm with any bits in the unused fields.

Differential Revision: https://reviews.llvm.org/D102526
This patch adds a new test for loop-unrolling with multiple exiting
blocks, where the latch does not exit, but the header does. This can
happen when the loop has not been rotated, e.g. due to minsize.

Inspired by the following end-to-end test, using -Oz
https://godbolt.org/z/fP6sna8qK

    bool foo(int *ptr, int limit) {
        #pragma clang loop unroll(full)
        for (unsigned int i = 0; i < 4; i++) {
            if (ptr[i] > limit)
            return false;
            ptr[i]++;
        }
        return true;
    }
Bug 49356 (https://bugs.llvm.org/show_bug.cgi?id=49356) reports crash in
the test case `tasking/bug_taskwait_detach.cpp`, which is caused by the wrong
function declaration. `gtid` in `__kmpc_omp_task` should be `kmp_int32`.

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D102584
Since we have both aliasing mode and Intel LAM on x86_64, we need to
choose the mode at either run time or compile time.  This patch
implements the plumbing to build both and choose between them at
compile time.

Reviewed By: vitalybuka, eugenis

Differential Revision: https://reviews.llvm.org/D102286
Mutli-line headers are not allowed in RST, reformat the header to be a
single wide line.
…ync instructions

Adds NVPTX builtins and intrinsics for the CUDA PTX `redux.sync` instructions
for `sm_80` architecture or newer.

PTX ISA description of `redux.sync`:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-redux-sync

Authored-by: Steffen Larsen <[email protected]>

Differential Revision: https://reviews.llvm.org/D100124
Initial version of pooling assumed normalization was accross all elements
equally. TOSA actually requires the noramalization is perform by how
many elements were summed (edges are not artifically dimmer). Updated
the lowering to reflect this change with corresponding tests.

Reviewed By: NatashaKnk

Differential Revision: https://reviews.llvm.org/D102540
Missing or duplicate spack package should not cause error, since
users may only installed llvm/clang package, or users may installed
duplicate HIP package but will use environment variable or compiler
option to choose HIP path.

The message about missing or duplicate spack package is informational,
therefore should be emitted only when -v is specified.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D102556
This change makes the conversion of an mlir::OpState to bool `explicit`. Idiomatic boolean uses continue to work as before, but questionable implicit uses (e.g. accumulating over a range of OpStates to count "true" states) become ill-formed. This makes the class interface a lilttle less error-prone.

I tested this change on our internal (fairly large) codebase, and only one fix was needed, which was ultimately an improvement of the affected code.

Reviewed By: rriddle, mehdi_amini

Differential Revision: https://reviews.llvm.org/D101989
Alias mode is not expected work on non-x86, so don't build it there.
Should fix the aarch64 bot.
…n steroids" idiom recognition.

I think i've added exhaustive test coverage, and i have verified that alive2 is happy with all the tests,
so in principle i'm fine with landing this without review, but just in case..

This adds support for the "count active bits" pattern, i.e.:
```
int countActiveBits(unsigned val) {
    int cnt = 0;
    for( ; (val >> cnt) != 0; ++cnt)
        ;
    return cnt;
}
```
but a somewhat more general one, since that is what i need:
```
int countActiveBits(unsigned val, int start, int off) {
    int cnt;
    for (cnt = start; val >> (cnt + off); cnt++)
        ;
    return cnt;
}
```

I've followed in footstep of 'left-shift until bittest' idiom (D91038),
in the sense that iff the `ctlz` intrinsic is cheap, we'll transform,
regardless of all other factors.

This can have a shocking effect on certain benchmarks:
```
raw.pixls.us-unique/Olympus/XZ-1$ /repositories/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf
RUNNING: /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmp49_28zcm
2021-05-09T01:06:05+03:00
Running /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench
Run on (32 X 3600.24 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 32768 KiB (x2)
Load Average: 5.26, 6.29, 3.49
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_mean          145 ms          145 ms          128   0.145319         0.999981   10.1568M       69.8949M        69.8936M      6.88159       6.88146   0.145322
p1319978.orf/threads:32/process_time/real_time_median        145 ms          145 ms          128   0.145317         0.999986   10.1568M       69.8941M        69.8931M      6.88151       6.88141   0.145319
p1319978.orf/threads:32/process_time/real_time_stddev      0.766 ms        0.766 ms          128   766.586u         15.1302u          0       354.167k        354.098k    0.0348699     0.0348631   766.469u
RUNNING: /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpwb9sw2x0
2021-05-09T01:06:24+03:00
Running /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
Run on (32 X 3599.95 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 32768 KiB (x2)
Load Average: 4.05, 5.95, 3.43
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_mean         99.8 ms         99.8 ms          128  0.0997758         0.999972   10.1568M       101.797M        101.794M      10.0225       10.0222  0.0997786
p1319978.orf/threads:32/process_time/real_time_median       99.7 ms         99.7 ms          128  0.0997165         0.999985   10.1568M       101.857M        101.854M      10.0284       10.0281  0.0997195
p1319978.orf/threads:32/process_time/real_time_stddev      0.224 ms        0.224 ms          128   224.166u          34.345u          0        226.81k        227.231k    0.0223309     0.0223723   224.586u
Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
Benchmark                                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_pvalue                 0.0000          0.0000      U Test, Repetitions: 128 vs 128
p1319978.orf/threads:32/process_time/real_time_mean                  -0.3134         -0.3134           145           100           145           100
p1319978.orf/threads:32/process_time/real_time_median                -0.3138         -0.3138           145           100           145           100
p1319978.orf/threads:32/process_time/real_time_stddev                -0.7073         -0.7078             1             0             1             0

```

Reviewed By: craig.topper, zhuhan0

Differential Revision: https://reviews.llvm.org/D102116
…pe->GetByteSize() in ParseSingleMember

We have a bug in which using member_clang_type.GetByteSize() triggers record
layout and during this process since the record was not yet complete we ended
up reaching a record that had not been layed out yet.
Using member_type->GetByteSize() avoids this situation since it relies on size
from DWARF and will not trigger record layout.

For reference: rdar://77293040

Differential Revision: https://reviews.llvm.org/D102445
This patch contains the bare minimum to run the new Pass Manager from the LLVM-C APIs. It does not feature PGOOptions, PassPlugins or Debugify in its current state. Bugzilla: PR48499

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D102136
These are intended to mimic warnings available in gcc.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D100581
MSVC has a `try-except` statement.
This statement could containt a `__leave` keyword, which is similar to
`goto` to the end of the try block. The semantic of this keyword is not
implemented.

We should at least parse such code without crashing.

https://docs.microsoft.com/en-us/cpp/cpp/try-except-statement?view=msvc-160

Patch By: AbbasSabra!

Reviewed By: steakhal

Differential Revision: https://reviews.llvm.org/D102280
Has the effect that `__mh_execute_header` stays in the symbol table of
outputs even after running `strip` on the output. I don't know if that's
important for anything -- my motivation for the patch is just is to make
the output more similar to ld64.

(Corresponds to symbolTableInAndNeverStrip in ld64.)

Differential Revision: https://reviews.llvm.org/D102619
  CONFLICT (content): Merge conflict in llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
  CONFLICT (content): Merge conflict in llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
The experimental flag for "inplace" bufferization in the sparse
compiler can be replaced with the new inplace attribute. This gives
a uniform way of expressing the more efficient way of bufferization.

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D102538
This patch contains the bare minimum to run the new Pass Manager from the LLVM-C APIs. It does not feature PGOOptions, PassPlugins or Debugify in its current state. Bugzilla: PR48499

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D102136
- Enables inferring return type for ConstShape, takes into account valid return types;
- The compatible return type function could be reused, leaving that for next use refactoring;

Differential Revision: https://reviews.llvm.org/D102182
The LAM mode is currently untested by check-hwasan, so we only need
to build the runtime in aliasing mode. Because LAM mode will always
need to be conditional (because only certain hardware will support
it) we can always just disable the LAM lit tests if it ever starts
being tested.
Follow up to D88631 but for aarch64; the Linux kernel uses the command
line flags:

1. -mstack-protector-guard=sysreg
2. -mstack-protector-guard-reg=sp_el0
3. -mstack-protector-guard-offset=0

to use the system register sp_el0 for the stack canary, enabling the
kernel to have a unique stack canary per task (like a thread, but not
limited to userspace as the kernel can preempt itself).

Address pr/47341 for aarch64.

Fixes: ClangBuiltLinux/linux#289
Signed-off-by: Nick Desaulniers <[email protected]>

Reviewed By: xiangzhangllvm, DavidSpickett, dmgreen

Differential Revision: https://reviews.llvm.org/D100919
This is one of the folds requested in:
https://llvm.org/PR39480

https://alive2.llvm.org/ce/z/NczU3V

Note - this uses the normal FMF propagation logic
(flags transfer from the final value to new/intermediate ops).
It's not clear if this matches what Alive2 implements,
so we may want to adjust one or the other.
@vmaksimo
Copy link
Contributor Author

/summary:run

@vmaksimo vmaksimo force-pushed the llvmspirv_pulldown branch from 3068e96 to 4c65c65 Compare May 21, 2021 11:21
@vmaksimo
Copy link
Contributor Author

vmaksimo commented May 21, 2021

Hi @steffenlarsen, could you please help to investigate a build failure (check-sycl target) on CUDA?
Example of failure: http://ci.llvm.intel.com:8010/#/builders/37/builds/8925/steps/16/logs/stdio

Possibly, it can be related to google test update in LLORG: d4d80a2

@vmaksimo
Copy link
Contributor Author

/summary:run

@steffenlarsen
Copy link
Contributor

Possibly, it can be related to google test update in LLORG: d4d80a2

Good intuition! That was exactly it. steffenlarsen@0331956 did the trick on my machine.

@vmaksimo vmaksimo force-pushed the llvmspirv_pulldown branch from dcea17c to 0621c56 Compare May 24, 2021 11:05
@vmaksimo
Copy link
Contributor Author

Good intuition! That was exactly it. steffenlarsen@0331956 did the trick on my machine.

Thanks @steffenlarsen! This helped to fix the failure

@vmaksimo
Copy link
Contributor Author

/summary:run

bader
bader previously approved these changes May 25, 2021
Copy link
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except typo in the comments.

@@ -1,3 +1,7 @@
// XFAIL: *
// Failure is expected untill fixed in LLORG upstream.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

untill -> until

@@ -1,3 +1,7 @@
// XFAIL: *
// Failure is expected untill fixed in LLORG upstream.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

untill -> until

@vladimirlaz vladimirlaz merged commit 611746c into intel:sycl May 25, 2021
@vmaksimo vmaksimo mentioned this pull request Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.