-
Notifications
You must be signed in to change notification settings - Fork 795
LLVM and SPIRV-LLVM-Translator pulldown (WW20-21) #3779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1.[bool, char, short] bitfields have the same alignment as unsigned int 2.Adjust alignment on typedef field decls/honor align attribute 3.Fix alignment for scoped enum class 4.Long long bitfield has 4bytes alignment and StorageUnitSize under 32 bit compile mode Differential Revision: https://reviews.llvm.org/D87029
This is to allow disasm with any bits in the unused fields. Differential Revision: https://reviews.llvm.org/D102526
This patch adds a new test for loop-unrolling with multiple exiting blocks, where the latch does not exit, but the header does. This can happen when the loop has not been rotated, e.g. due to minsize. Inspired by the following end-to-end test, using -Oz https://godbolt.org/z/fP6sna8qK bool foo(int *ptr, int limit) { #pragma clang loop unroll(full) for (unsigned int i = 0; i < 4; i++) { if (ptr[i] > limit) return false; ptr[i]++; } return true; }
Bug 49356 (https://bugs.llvm.org/show_bug.cgi?id=49356) reports crash in the test case `tasking/bug_taskwait_detach.cpp`, which is caused by the wrong function declaration. `gtid` in `__kmpc_omp_task` should be `kmp_int32`. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D102584
Since we have both aliasing mode and Intel LAM on x86_64, we need to choose the mode at either run time or compile time. This patch implements the plumbing to build both and choose between them at compile time. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D102286
Mutli-line headers are not allowed in RST, reformat the header to be a single wide line.
…c instructions Adds NVPTX builtins and intrinsics for the CUDA PTX `cp.async` instructions for `sm_80` architecture or newer. PTX ISA description of `cp.async`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-asynchronous-copy https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive Authored-by: Stuart Adams <[email protected]> Co-Authored-by: Alexander Johnston <[email protected]> Differential Revision: https://reviews.llvm.org/D100394
…ync instructions Adds NVPTX builtins and intrinsics for the CUDA PTX `redux.sync` instructions for `sm_80` architecture or newer. PTX ISA description of `redux.sync`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-redux-sync Authored-by: Steffen Larsen <[email protected]> Differential Revision: https://reviews.llvm.org/D100124
Initial version of pooling assumed normalization was accross all elements equally. TOSA actually requires the noramalization is perform by how many elements were summed (edges are not artifically dimmer). Updated the lowering to reflect this change with corresponding tests. Reviewed By: NatashaKnk Differential Revision: https://reviews.llvm.org/D102540
Missing or duplicate spack package should not cause error, since users may only installed llvm/clang package, or users may installed duplicate HIP package but will use environment variable or compiler option to choose HIP path. The message about missing or duplicate spack package is informational, therefore should be emitted only when -v is specified. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D102556
This change makes the conversion of an mlir::OpState to bool `explicit`. Idiomatic boolean uses continue to work as before, but questionable implicit uses (e.g. accumulating over a range of OpStates to count "true" states) become ill-formed. This makes the class interface a lilttle less error-prone. I tested this change on our internal (fairly large) codebase, and only one fix was needed, which was ultimately an improvement of the affected code. Reviewed By: rriddle, mehdi_amini Differential Revision: https://reviews.llvm.org/D101989
Alias mode is not expected work on non-x86, so don't build it there. Should fix the aarch64 bot.
…n steroids" idiom recognition. I think i've added exhaustive test coverage, and i have verified that alive2 is happy with all the tests, so in principle i'm fine with landing this without review, but just in case.. This adds support for the "count active bits" pattern, i.e.: ``` int countActiveBits(unsigned val) { int cnt = 0; for( ; (val >> cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one, since that is what i need: ``` int countActiveBits(unsigned val, int start, int off) { int cnt; for (cnt = start; val >> (cnt + off); cnt++) ; return cnt; } ``` I've followed in footstep of 'left-shift until bittest' idiom (D91038), in the sense that iff the `ctlz` intrinsic is cheap, we'll transform, regardless of all other factors. This can have a shocking effect on certain benchmarks: ``` raw.pixls.us-unique/Olympus/XZ-1$ /repositories/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf RUNNING: /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmp49_28zcm 2021-05-09T01:06:05+03:00 Running /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench Run on (32 X 3600.24 MHz CPU s) CPU Caches: L1 Data 32 KiB (x16) L1 Instruction 32 KiB (x16) L2 Unified 512 KiB (x16) L3 Unified 32768 KiB (x2) Load Average: 5.26, 6.29, 3.49 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ p1319978.orf/threads:32/process_time/real_time_mean 145 ms 145 ms 128 0.145319 0.999981 10.1568M 69.8949M 69.8936M 6.88159 6.88146 0.145322 p1319978.orf/threads:32/process_time/real_time_median 145 ms 145 ms 128 0.145317 0.999986 10.1568M 69.8941M 69.8931M 6.88151 6.88141 0.145319 p1319978.orf/threads:32/process_time/real_time_stddev 0.766 ms 0.766 ms 128 766.586u 15.1302u 0 354.167k 354.098k 0.0348699 0.0348631 766.469u RUNNING: /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpwb9sw2x0 2021-05-09T01:06:24+03:00 Running /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Run on (32 X 3599.95 MHz CPU s) CPU Caches: L1 Data 32 KiB (x16) L1 Instruction 32 KiB (x16) L2 Unified 512 KiB (x16) L3 Unified 32768 KiB (x2) Load Average: 4.05, 5.95, 3.43 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ p1319978.orf/threads:32/process_time/real_time_mean 99.8 ms 99.8 ms 128 0.0997758 0.999972 10.1568M 101.797M 101.794M 10.0225 10.0222 0.0997786 p1319978.orf/threads:32/process_time/real_time_median 99.7 ms 99.7 ms 128 0.0997165 0.999985 10.1568M 101.857M 101.854M 10.0284 10.0281 0.0997195 p1319978.orf/threads:32/process_time/real_time_stddev 0.224 ms 0.224 ms 128 224.166u 34.345u 0 226.81k 227.231k 0.0223309 0.0223723 224.586u Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------- p1319978.orf/threads:32/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 128 vs 128 p1319978.orf/threads:32/process_time/real_time_mean -0.3134 -0.3134 145 100 145 100 p1319978.orf/threads:32/process_time/real_time_median -0.3138 -0.3138 145 100 145 100 p1319978.orf/threads:32/process_time/real_time_stddev -0.7073 -0.7078 1 0 1 0 ``` Reviewed By: craig.topper, zhuhan0 Differential Revision: https://reviews.llvm.org/D102116
…pe->GetByteSize() in ParseSingleMember We have a bug in which using member_clang_type.GetByteSize() triggers record layout and during this process since the record was not yet complete we ended up reaching a record that had not been layed out yet. Using member_type->GetByteSize() avoids this situation since it relies on size from DWARF and will not trigger record layout. For reference: rdar://77293040 Differential Revision: https://reviews.llvm.org/D102445
This patch contains the bare minimum to run the new Pass Manager from the LLVM-C APIs. It does not feature PGOOptions, PassPlugins or Debugify in its current state. Bugzilla: PR48499 Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D102136
Differential Revision: https://reviews.llvm.org/D102562
This reverts commit cd220a0. Doesn't build.
These are intended to mimic warnings available in gcc. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D100581
MSVC has a `try-except` statement. This statement could containt a `__leave` keyword, which is similar to `goto` to the end of the try block. The semantic of this keyword is not implemented. We should at least parse such code without crashing. https://docs.microsoft.com/en-us/cpp/cpp/try-except-statement?view=msvc-160 Patch By: AbbasSabra! Reviewed By: steakhal Differential Revision: https://reviews.llvm.org/D102280
Differential Revision: https://reviews.llvm.org/D102636
Has the effect that `__mh_execute_header` stays in the symbol table of outputs even after running `strip` on the output. I don't know if that's important for anything -- my motivation for the patch is just is to make the output more similar to ld64. (Corresponds to symbolTableInAndNeverStrip in ld64.) Differential Revision: https://reviews.llvm.org/D102619
CONFLICT (content): Merge conflict in llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp CONFLICT (content): Merge conflict in llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
The experimental flag for "inplace" bufferization in the sparse compiler can be replaced with the new inplace attribute. This gives a uniform way of expressing the more efficient way of bufferization. Reviewed By: bixia Differential Revision: https://reviews.llvm.org/D102538
This patch contains the bare minimum to run the new Pass Manager from the LLVM-C APIs. It does not feature PGOOptions, PassPlugins or Debugify in its current state. Bugzilla: PR48499 Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D102136
- Enables inferring return type for ConstShape, takes into account valid return types; - The compatible return type function could be reused, leaving that for next use refactoring; Differential Revision: https://reviews.llvm.org/D102182
The LAM mode is currently untested by check-hwasan, so we only need to build the runtime in aliasing mode. Because LAM mode will always need to be conditional (because only certain hardware will support it) we can always just disable the LAM lit tests if it ever starts being tested.
Follow up to D88631 but for aarch64; the Linux kernel uses the command line flags: 1. -mstack-protector-guard=sysreg 2. -mstack-protector-guard-reg=sp_el0 3. -mstack-protector-guard-offset=0 to use the system register sp_el0 for the stack canary, enabling the kernel to have a unique stack canary per task (like a thread, but not limited to userspace as the kernel can preempt itself). Address pr/47341 for aarch64. Fixes: ClangBuiltLinux/linux#289 Signed-off-by: Nick Desaulniers <[email protected]> Reviewed By: xiangzhangllvm, DavidSpickett, dmgreen Differential Revision: https://reviews.llvm.org/D100919
This is one of the folds requested in: https://llvm.org/PR39480 https://alive2.llvm.org/ce/z/NczU3V Note - this uses the normal FMF propagation logic (flags transfer from the final value to new/intermediate ops). It's not clear if this matches what Alive2 implements, so we may want to adjust one or the other.
/summary:run |
3068e96
to
4c65c65
Compare
Hi @steffenlarsen, could you please help to investigate a build failure (check-sycl target) on CUDA? Possibly, it can be related to google test update in LLORG: d4d80a2 |
/summary:run |
Good intuition! That was exactly it. steffenlarsen@0331956 did the trick on my machine. |
Signed-off-by: Steffen Larsen <[email protected]>
dcea17c
to
0621c56
Compare
Thanks @steffenlarsen! This helped to fix the failure |
/summary:run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except typo in the comments.
@@ -1,3 +1,7 @@ | |||
// XFAIL: * | |||
// Failure is expected untill fixed in LLORG upstream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
untill -> until
@@ -1,3 +1,7 @@ | |||
// XFAIL: * | |||
// Failure is expected untill fixed in LLORG upstream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
untill -> until
LLVM: llvm/llvm-project@d30dfa867
SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@c62ef5e