Description
I'll preface with saying that the compiler was generating much worse code before 1.52, with up to 4 memcpy
calls where there should have been one. So the LLVM 12 upgrade was great :)
Here's the sample:
https://godbolt.org/z/TdPdWqq6q
As you can see, both allocate_naive
and allocate_ptr_write
generate two memcpy
calls (and significant stack usage). Manual unsafe optimization with allocate_separate_write
generates a single memcpy
. The same can be observed on x86_64: https://godbolt.org/z/6Mq9vKP3s
If I replace pub type Payload = RefCell<[u8; 1000]>;
with just [u8; 1000]
, all three functions seem to generate good code with a single memcpy
call: https://godbolt.org/z/8K7s8vbPa
Here's the weirdest part: with the original Payload
, if I manually inline the MyStruct
type parameter and turn MyStruct
into a simple non-generic struct, it still generates bad code on wasm, but on x86_64 correctly optimized all functions into a single memcpy
: https://godbolt.org/z/jsdrbGExW
Why does a seemingly-equivalent generic struct appears to generate worse code than a non-generic one?