-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Open
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-bugCategory: This is a bug.Category: This is a bug.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.O-x86_64Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Description
I tried this code:
pub fn count_non_ascii(buffer: &[u8]) -> u64 {
let mut count = 0;
for &b in buffer {
if b >= 0x80 {
count += 1;
}
}
count
}
I expected to see this happen: I expected the compiler to autovectorize along the lines of
pub fn count_non_ascii_sse2(buffer: &[u8]) -> u64 {
let mut count = 0;
let (prefix, simd, suffix) = unsafe { buffer.align_to::<core::arch::x86_64::__m128i>() };
for &b in prefix {
if b >= 0x80 {
count += 1;
}
}
for &s in simd {
count += unsafe {core::arch::x86_64::_mm_movemask_epi8(s)}.count_ones() as u64;
}
for &b in suffix {
if b >= 0x80 {
count += 1;
}
}
count
}
Instead, this happened: It is autovectorized to something more complex and considerably slower than the manual vectorization given above. (The above manual vectorization becomes even faster when compiled with a target_cpu
that supports the POPCNT
instruction.)
Meta
rustc --version --verbose
:
rustc 1.45.0-nightly (a74d1862d 2020-05-14)
MSxDOS and Dushistov
Metadata
Metadata
Assignees
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-bugCategory: This is a bug.Category: This is a bug.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.O-x86_64Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.