Skip to content

Call *rot to perform eigenvector update of *steqr #1120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sh-zheng
Copy link

@sh-zheng sh-zheng commented Apr 9, 2025

Call *rot instead of *lasr to perform eigenvector update of *steqr to fully utilize blas subroutines.

Code equivalence is verified with norm $\Vert{A-ZDZ^T}\Vert_2$. Entries is generated randomly.

Single Precision Case

Dimension Norm before PR Norm after PR
1000 8.621327e-05 8.621327e-05
2000 1.890181e-04 1.890181e-04

Double Precision Case

Dimension Norm before PR Norm after PR
1000 2.297470e-13 2.297470e-13
2000 4.402642e-13 4.402642e-13

Performance is measured in millisecond and shows an improvement. The platform is a Intel Xeon 1660 v3 @ 3GHz. As an additional test, we also measured the performance of subroutine *kteqr in PR #1049 .

Single Precision Case

Dimension Elapse before PR Elapse after PR Elapse of skteqr
1000 4281 3703 2359
2000 37063 31625 18781

Double Precision Case

Dimension Elapse before PR Elapse after PR Elapse of dkteqr
1000 6063 5234 2562
2000 47499 40406 22469

Copy link
Collaborator

@angsch angsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this PR as a fit in an optimized library that lacks an optimized version of LASR. Reference LAPACK prioritizes algorithmic clarity and potential. From an algorithmic standpoint, LASR is better than ROT. Below is a breakdown of the arithmetic intensity to support this perspective. The analysis assumes real arithmetic (in complex arithmetic, the flops are higher, but the conclusion is the same).

ROT applies a single rotation to two adjacent columns.

  • Load the 2 columns: Z(1:n,l:l+1): 2 * n * sizeof(datatype)
  • Compute Z(1:n,l:l+1) * P**T: 6 flops (4 multiplications, 2 additions) per row, so 6n flops in total
  • Store the 2 columns: 2 * n * sizeof(datatype)
  • arithmetic intensity = compute/data = 6n / (4n*sizeof(datatype)) ~ 1.5

LASR can be implemented to achieve higher arithmetic intensity. Although reference LAPACK does not currently support this, it has been discussed for example in #710. For the approach in #710, the analysis is as follows.

  • Load 3 columns. Z(1:n,l:l+2): 3 * n * sizeof(datatype)
  • Compute Z(1:n,l:l+1) * P1**T * P2**T: 12n flops
  • Store 3 columns: 3 * n * sizeof(datatype)
  • arithmetic intensity = compute/data = 12n / (6n*sizeof(datatype) ~ 2

In the limit, LASR can achieve:

  • Load n-by-n matrix: $n^2$ * sizeof(datatype)
  • Apply (n-1) Givens rotations to n-by-n matrix: $6n^2$ flops
  • Store n-by-n matrix: $n^2$ * sizeof(datatype)
  • arithmetic intensity = compute/data = $6n^2$ / ($2n^2$*sizeof(datatype) ~ 3

This MR has initiated LASR3, where the arithmetic intensity is O(n). A more detailed analysis can be found in this paper.

In summary, LASR offers the potential to reduce data movement and thereby achieve higher arithmetic intensity. In contrast, ROT limits the vector update to an operation with low arithmetic intensity that cannot be improved. Even if ROT is currently faster (when linked against optimized BLAS), algorithmically, it is not the right direction for future development in reference LAPACK.

@sh-zheng
Copy link
Author

Thank you very much for the review @angsch . I agree with your point. I indeed didn't take it into account in terms of algorithm efficiency in this PR. My original intention was to unify the existing algorithm of symmetric matrices and skew-symmetric matrices in #1049 . It seems that a better approach would be to implement #1049 in the same way as steqr. I'll update that PR later. This PR will be closed.

@sh-zheng sh-zheng closed this Jun 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants