-
Notifications
You must be signed in to change notification settings - Fork 14.7k
Description
In both the ISEL under generic combines, various select with constants combine into binary ops with zext/sext operand like
select Cond, C1, C1-1 --> add (zext Cond), (C1-1)
select Cond, Pow2, 0 --> shl (zext Cond), log2(Pow2)
select Cond, C1, C1+1 --> add (sext Cond), (C1+1)
For various architecture, instruction materialization for zext/sext might be cheaper as compared to select, thus making sense for above combine optimization.
But in case of AMDGPU, both the zext/sext & select (for f32 with inline constants) materializes into v_cndmask_b32_e64
. Thus the above optimization increases the cost by introducing an additional binary instruction.
If you look from different persepective, as in AMDGPU both the Zext/Sext and Select boils down to same machine instruction canonincally, thus really undoing the folding of binOp into Select. For example :
Select Cond, 7, 6 --> add ( zext Cond ), 6
materializes as :
v_cndmask_b32_e64 v0, 0, 1, vcc
v_add_u32_e32 v0, 6, v0
instead of
v_cndmask_b32_e64 v0, 6, 7, vcc
on which the binOp into Select combine is really missed, as Select is eliminated, but nevertheless (Zext cond) materializes as same as (Select cond 1, 0). So for AMDGPU : add ( zext Cond ), 6 <==> add ( Select 1, 0 ), 6
after the instruction selection is done. This really showcases that zext introduction (via select's combine) really caused the skip of BinOp fold into select, introducing the additional binary instruction.
It is the root cause of SWDEV-505394, as increases the code length.