-
Notifications
You must be signed in to change notification settings - Fork 796
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Does the joint matrix support the similar operation ?
bmma_sync
Waits until all warp lanes have executed bmma_sync, and then performs the warp-synchronous bit matrix multiply-accumulate operation D = (A op B) + C, where op consists of a logical operation bmmaBitOp followed by the accumulation defined by bmmaAccumulateOp. The available operations are:
bmmaBitOpXOR, a 128-bit XOR of a row in matrix_a with the 128-bit column of matrix_b
bmmaBitOpAND, a 128-bit AND of a row in matrix_a with the 128-bit column of matrix_b, available on devices with compute capability 8.0 and higher.
The accumulate op is always bmmaAccumulateOpPOPC which counts the number of set bits.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request