Call rot to perform eigenvector update of steqr #1120

sh-zheng · 2025-04-09T12:31:45Z

Call *rot instead of *lasr to perform eigenvector update of *steqr to fully utilize blas subroutines.

Code equivalence is verified with norm $\Vert{A-ZDZ^T}\Vert_2$. Entries is generated randomly.

Single Precision Case

Dimension	Norm before PR	Norm after PR
1000	8.621327e-05	8.621327e-05
2000	1.890181e-04	1.890181e-04

Double Precision Case

Dimension	Norm before PR	Norm after PR
1000	2.297470e-13	2.297470e-13
2000	4.402642e-13	4.402642e-13

Performance is measured in millisecond and shows an improvement. The platform is a Intel Xeon 1660 v3 @ 3GHz. As an additional test, we also measured the performance of subroutine *kteqr in PR #1049 .

Single Precision Case

Dimension	Elapse before PR	Elapse after PR	Elapse of skteqr
1000	4281	3703	2359
2000	37063	31625	18781

Double Precision Case

Dimension	Elapse before PR	Elapse after PR	Elapse of dkteqr
1000	6063	5234	2562
2000	47499	40406	22469

angsch

I see this PR as a fit in an optimized library that lacks an optimized version of LASR. Reference LAPACK prioritizes algorithmic clarity and potential. From an algorithmic standpoint, LASR is better than ROT. Below is a breakdown of the arithmetic intensity to support this perspective. The analysis assumes real arithmetic (in complex arithmetic, the flops are higher, but the conclusion is the same).

ROT applies a single rotation to two adjacent columns.

Load the 2 columns: Z(1:n,l:l+1): 2 * n * sizeof(datatype)
Compute Z(1:n,l:l+1) * P**T: 6 flops (4 multiplications, 2 additions) per row, so 6n flops in total
Store the 2 columns: 2 * n * sizeof(datatype)
arithmetic intensity = compute/data = 6n / (4n*sizeof(datatype)) ~ 1.5

LASR can be implemented to achieve higher arithmetic intensity. Although reference LAPACK does not currently support this, it has been discussed for example in #710. For the approach in #710, the analysis is as follows.

Load 3 columns. Z(1:n,l:l+2): 3 * n * sizeof(datatype)
Compute Z(1:n,l:l+1) * P1**T * P2**T: 12n flops
Store 3 columns: 3 * n * sizeof(datatype)
arithmetic intensity = compute/data = 12n / (6n*sizeof(datatype) ~ 2

In the limit, LASR can achieve:

Load n-by-n matrix: $n^2$ * sizeof(datatype)
Apply (n-1) Givens rotations to n-by-n matrix: $6n^2$ flops
Store n-by-n matrix: $n^2$ * sizeof(datatype)
arithmetic intensity = compute/data = $6n^2$ / ($2n^2$*sizeof(datatype) ~ 3

This MR has initiated LASR3, where the arithmetic intensity is O(n). A more detailed analysis can be found in this paper.

In summary, LASR offers the potential to reduce data movement and thereby achieve higher arithmetic intensity. In contrast, ROT limits the vector update to an operation with low arithmetic intensity that cannot be improved. Even if ROT is currently faster (when linked against optimized BLAS), algorithmically, it is not the right direction for future development in reference LAPACK.

sh-zheng · 2025-06-29T09:12:24Z

Thank you very much for the review @angsch . I agree with your point. I indeed didn't take it into account in terms of algorithm efficiency in this PR. My original intention was to unify the existing algorithm of symmetric matrices and skew-symmetric matrices in #1049 . It seems that a better approach would be to implement #1049 in the same way as steqr. I'll update that PR later. This PR will be closed.

Call *rot to perform eigenvector update of *steqr

8a10f2d

angsch reviewed Jun 29, 2025

View reviewed changes

sh-zheng closed this Jun 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Call rot to perform eigenvector update of steqr #1120

Call rot to perform eigenvector update of steqr #1120

Uh oh!

sh-zheng commented Apr 9, 2025

Uh oh!

angsch left a comment

Uh oh!

sh-zheng commented Jun 29, 2025

Uh oh!

Uh oh!

Call *rot to perform eigenvector update of *steqr #1120

Call *rot to perform eigenvector update of *steqr #1120

Uh oh!

Conversation

sh-zheng commented Apr 9, 2025

Uh oh!

angsch left a comment

Choose a reason for hiding this comment

Uh oh!

sh-zheng commented Jun 29, 2025

Uh oh!

Uh oh!

Call rot to perform eigenvector update of steqr #1120

Call rot to perform eigenvector update of steqr #1120