Skip to content

Create .so symlinks for driver libraries in container #326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 23, 2025

Conversation

elezar
Copy link
Member

@elezar elezar commented Feb 1, 2024

This change explicitly creates .so and SONAME symlinks for injected driver libraries. This means that the create-soname-symlinks hook that was added in #947 is not longer required and is therefore removed.

$ docker run --rm -ti --runtime=nvidia --gpus all busybox  ls -al /usr/lib/x86_64-linux-gnu/
total 263564
drwxr-xr-x    2 root     root          4096 Jul 21 10:00 .
drwxr-xr-x    3 root     root          4096 Jul 21 10:00 ..
lrwxrwxrwx    1 root     root            12 Jul 21 10:00 libcuda.so -> libcuda.so.1
lrwxrwxrwx    1 root     root            21 Jul 21 10:00 libcuda.so.1 -> libcuda.so.570.133.20
-rw-r--r--    1 root     root      71365624 Apr 13 04:46 libcuda.so.570.133.20
lrwxrwxrwx    1 root     root            20 Jul 21 10:00 libcudadebugger.so -> libcudadebugger.so.1
lrwxrwxrwx    1 root     root            29 Jul 21 10:00 libcudadebugger.so.1 -> libcudadebugger.so.570.133.20
-rw-r--r--    1 root     root      10244640 Apr 13 04:17 libcudadebugger.so.570.133.20
lrwxrwxrwx    1 root     root            17 Jul 21 10:00 libnvidia-ml.so -> libnvidia-ml.so.1
lrwxrwxrwx    1 root     root            26 Jul 21 10:00 libnvidia-ml.so.1 -> libnvidia-ml.so.570.133.20
-rw-r--r--    1 root     root       2217912 Apr 13 04:23 libnvidia-ml.so.570.133.20
lrwxrwxrwx    1 root     root            19 Jul 21 10:00 libnvidia-nvvm.so -> libnvidia-nvvm.so.4
lrwxrwxrwx    1 root     root            28 Jul 21 10:00 libnvidia-nvvm.so.4 -> libnvidia-nvvm.so.570.133.20
-rw-r--r--    1 root     root      81978912 Apr 13 05:03 libnvidia-nvvm.so.570.133.20
lrwxrwxrwx    1 root     root            21 Jul 21 10:00 libnvidia-opencl.so -> libnvidia-opencl.so.1
lrwxrwxrwx    1 root     root            30 Jul 21 10:00 libnvidia-opencl.so.1 -> libnvidia-opencl.so.570.133.20
-rw-r--r--    1 root     root      65758768 Apr 13 04:46 libnvidia-opencl.so.570.133.20
-rw-r--r--    1 root     root         10176 Apr 13 04:21 libnvidia-pkcs11-openssl3.so.570.133.20
-rw-r--r--    1 root     root         10168 Apr 13 04:21 libnvidia-pkcs11.so.570.133.20
lrwxrwxrwx    1 root     root            29 Jul 21 10:00 libnvidia-ptxjitcompiler.so -> libnvidia-ptxjitcompiler.so.1
lrwxrwxrwx    1 root     root            38 Jul 21 10:00 libnvidia-ptxjitcompiler.so.1 -> libnvidia-ptxjitcompiler.so.570.133.20
-rw-r--r--    1 root     root      38251952 Apr 13 04:33 libnvidia-ptxjitcompiler.so.570.133.20

And:

$ docker run --rm -ti --runtime=nvidia --gpus all ubuntu ldconfig -p | grep libcuda
        libcudadebugger.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudadebugger.so.1
        libcudadebugger.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudadebugger.so
        libcuda.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcuda.so.1
        libcuda.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcuda.so

showing that the .so and .so.1 entries are present in the ldcache.

@elezar elezar self-assigned this Feb 13, 2024
@elezar elezar force-pushed the CNT-4766/create-so-symlinks branch 4 times, most recently from e20826f to 7b5f6b3 Compare April 3, 2024 15:10
@elezar elezar marked this pull request as ready for review April 3, 2024 15:10
@elezar elezar requested review from klueska and cdesiniotis April 3, 2024 15:11
Copy link
Contributor

@klueska klueska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial pass. I lost some steam near the end, but wanted to drop the comments I had for now.

Copy link
Member Author

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of the issues were because this was out of date.

@elezar elezar force-pushed the CNT-4766/create-so-symlinks branch 3 times, most recently from 93753a9 to 020f598 Compare April 15, 2024 14:31
@elezar elezar requested a review from klueska May 15, 2024 12:32
@ArangoGutierrez
Copy link
Collaborator

@elezar does #906 supersede this one?

@elezar
Copy link
Member Author

elezar commented Feb 25, 2025

@elezar does #906 supersede this one?

No. This is not related to the CUDA Forward Compat libraries at all. This is about creating .so -> SONAME symlinks for driver libraries in the container. #906 is about ensuring that the CUDA Forward compat libraries are properly detectable in a container if required.

@elezar elezar marked this pull request as draft June 18, 2025 21:42
@elezar elezar force-pushed the CNT-4766/create-so-symlinks branch from 020f598 to 81a6b00 Compare July 17, 2025 09:51
@coveralls
Copy link

coveralls commented Jul 17, 2025

Pull Request Test Coverage Report for Build 16469131076

Details

  • 70 of 99 (70.71%) changed or added relevant lines in 4 files are covered.
  • 16 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.4%) to 35.391%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/nvcdi/driver-nvml.go 0 1 0.0%
internal/discover/symlinks.go 68 96 70.83%
Files with Coverage Reduction New Missed Lines %
internal/ldconfig/ldconfig.go 2 0.0%
internal/discover/list.go 14 0.0%
Totals Coverage Status
Change from base Build 16421142187: 0.4%
Covered Lines: 4511
Relevant Lines: 12746

💛 - Coveralls

@elezar elezar force-pushed the CNT-4766/create-so-symlinks branch from 81a6b00 to 3a1f96c Compare July 21, 2025 10:01
@elezar elezar requested a review from klueska July 21, 2025 10:12
@elezar elezar marked this pull request as ready for review July 21, 2025 10:12
@elezar elezar added this to the v1.18.0 milestone Jul 21, 2025
@elezar elezar force-pushed the CNT-4766/create-so-symlinks branch from 20abfcc to 3fea29c Compare July 21, 2025 10:22
Copilot

This comment was marked as outdated.

@elezar elezar force-pushed the CNT-4766/create-so-symlinks branch 9 times, most recently from e205edd to bc7ed44 Compare July 22, 2025 20:18
Comment on lines 142 to 144
if soname == "" || soname == libraryName {
return nil, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not still want the .so link even if the SONAME is the same as the libraryName?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you're right. We want the .so link but not the soname link. I don't think this comes up in typical driver installations though, so we may want to skip this for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine then. We can always revisit this if anyone complains.


// Create the .so -> SONAME symlink
soLink := getSoLink(soname)
if soLink != "" && soLink != soname && soLink != libraryName {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the need for all these special cases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get "" this means that soname is not a .so.* file which one could argue is unexpected since we do filter on driver libraries at the start of this function. The other cases make sure that we don't try to create a link that links to itself. I think these are unlikely to occur in our driver installations, but are there for completenes.

Comment on lines 157 to 174
// Create the .so -> SONAME symlink
if soLink := getSoLink(soname); soLink != "" && soLink != soname {
s := Symlink{
target: soname,
link: filepath.Join(libraryDir, soLink),
}
soSymlinks = append(soSymlinks, s.String())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this could be a bit more readable (in terms of the conditional cases). Maybe just a comment on what the cases are / mean would be sufficient, e.g.:

// Create the .so -> SONAME symlink
// If soname does not have a '.so' extension or soname itself ends with '.so'
// we skip this (because we will have just created the same symlink above).

@klueska
Copy link
Contributor

klueska commented Jul 23, 2025

A couple of minor comments, otherwise LGTM

@elezar elezar force-pushed the CNT-4766/create-so-symlinks branch 2 times, most recently from 87082a6 to a560b8b Compare July 23, 2025 10:40
elezar added 6 commits July 23, 2025 13:16
This change ensures that .so and SONAME symlinks are created for
driver libraries in the container.

Signed-off-by: Evan Lezar <[email protected]>
This change removes the create-soname-symlinks hook introduced
in v1.18.0-rc.1. Instead we rely on explicitly creating the
.so -> SONAME -> .so.RM_VERSION symlink chain through the
create-symlink hook.

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar force-pushed the CNT-4766/create-so-symlinks branch from a560b8b to 3a59233 Compare July 23, 2025 11:16
@klueska
Copy link
Contributor

klueska commented Jul 23, 2025

Looking good. Thanks.

Copy link
Collaborator

@ArangoGutierrez ArangoGutierrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@elezar elezar merged commit db68c39 into NVIDIA:main Jul 23, 2025
16 checks passed
@elezar elezar deleted the CNT-4766/create-so-symlinks branch July 23, 2025 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants