Skip to content

Add vfio mode to generate CDI specs for NVIDIA passthrough GPUs #315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

elezar
Copy link
Member

@elezar elezar commented Jan 29, 2024

This change adds a vfio mode to the nvcdi API (and nvidia-ctk generate command).

---
cdiVersion: 0.5.0
kind: nvidia.com/pgpu
devices:
    - name: "0"
      containerEdits:
        deviceNodes:
            - path: /dev/vfio/5
              hostPath: /dev/vfio/5
containerEdits:
    env:
        - NVIDIA_VISIBLE_DEVICES=void
    deviceNodes:
        - path: /dev/vfio/vfio
          hostPath: /dev/vfio/vfio

@elezar
Copy link
Member Author

elezar commented Jan 29, 2024

@cdesiniotis if this is still valid, please rebase or close.

@elezar
Copy link
Member Author

elezar commented Jan 15, 2025

@varunrsekar would this be interesting for what you're looking at in NVIDIA/k8s-dra-driver-gpu#183?

// GetCommonEdits returns common edits for ALL devices.
// Note, currently there are no common edits.
func (l *vfiolib) GetCommonEdits() (*cdi.ContainerEdits, error) {
return &cdi.ContainerEdits{ContainerEdits: &specs.ContainerEdits{}}, nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing a device node for /dev/vfio/vfio:

&cdi.ContainerEdits{
    ContainerEdits: &specs.ContainerEdits{
        DeviceNodes: []*specs.DeviceNode{
	    &specs.DeviceNode{
	        Path: "/dev/vfio/vfio",
	    },
	},
    },
}

@varunrsekar
Copy link

@varunrsekar would this be interesting for what you're looking at in NVIDIA/k8s-dra-driver#183?

yeah this will help!

@cdesiniotis
Copy link
Contributor

@varunrsekar I am no longer working on this -- feel free to take ownership of this PR if you need it.

@coveralls
Copy link

coveralls commented Jul 7, 2025

Pull Request Test Coverage Report for Build 16119484166

Details

  • 69 of 81 (85.19%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.3%) to 34.398%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/nvcdi/mode.go 0 1 0.0%
pkg/nvcdi/lib.go 6 8 75.0%
pkg/nvcdi/lib-vfio.go 59 68 86.76%
Totals Coverage Status
Change from base Build 16115644541: 0.3%
Covered Lines: 4565
Relevant Lines: 13271

💛 - Coveralls

@elezar elezar requested a review from cdesiniotis July 7, 2025 12:10
@elezar elezar marked this pull request as ready for review July 7, 2025 12:10
Copilot

This comment was marked as outdated.

@elezar elezar changed the title Add 'vfio' mode to pkg/nvcdi for generating CDI specs for NVIDIA pass… Add vfio mode to generate CDI specs for NVIDIA passthrough GPUs Jul 7, 2025
Copy link
Collaborator

@ArangoGutierrez ArangoGutierrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 non blocking nits

@elezar elezar requested a review from varunrsekar July 7, 2025 14:20
@elezar elezar self-assigned this Jul 7, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new VFIO mode to the nvcdi API and the nvidia-ctk generate command to support NVIDIA GPU passthrough via VFIO.

  • Introduces a ModeVfio constant and integrates it into the mode resolver.
  • Adds WithPCILib option and dependency on nvpci for querying PCI devices.
  • Implements vfiolib to generate VFIO-based CDI specs, including unit tests.

Reviewed Changes

Copilot reviewed 6 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/nvcdi/options.go Import nvpci and add WithPCILib option
pkg/nvcdi/mode.go Register ModeVfio in the list of supported modes
pkg/nvcdi/lib.go Wire up nvpcilib in New for ModeVfio
pkg/nvcdi/lib-vfio.go Implement vfiolib and VFIO device spec generation
pkg/nvcdi/lib-vfio_test.go Add unit tests for the new VFIO mode
go.mod Bump github.com/NVIDIA/go-nvlib for PCI library support
Comments suppressed due to low confidence (3)

pkg/nvcdi/lib-vfio.go:93

  • [nitpick] Consider using %q instead of %v to quote the invalid ID in the error message for clarity ("invalid channel ID %q: %w").
			return nil, fmt.Errorf("invalid channel ID %v: %w", id, err)

pkg/nvcdi/lib-vfio_test.go:28

  • Tests cover positive VFIO scenarios but don’t verify that non-vfio-pci devices are filtered out or that error paths (e.g., invalid ID parsing, PCI library failures) behave as expected. Consider adding cases for those.
func TestModeVfio(t *testing.T) {

pkg/nvcdi/lib-vfio.go:33

  • [nitpick] The field group in vfioDevice is ambiguous — consider renaming it to iommuGroup to make its purpose clearer.
	group   int

@@ -43,6 +44,13 @@ func WithInfoLib(infolib info.Interface) Option {
}
}

// WithPCILib sets the pci library to be used for CDI spec generation.
Copy link
Preview

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment should use the uppercase acronym PCI for consistency with the function name (WithPCILib).

Suggested change
// WithPCILib sets the pci library to be used for CDI spec generation.
// WithPCILib sets the PCI library to be used for CDI spec generation.

Copilot uses AI. Check for mistakes.

DeviceNodes: []*specs.DeviceNode{
{
Path: path,
HostPath: filepath.Join(l.devRoot, path),
Copy link
Preview

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When path starts with a slash, filepath.Join(l.devRoot, path) will ignore l.devRoot. You may need to strip the leading slash or construct the host path as filepath.Join(l.devRoot, path[1:]).

Suggested change
HostPath: filepath.Join(l.devRoot, path),
HostPath: filepath.Join(l.devRoot, path[1:]),

Copilot uses AI. Check for mistakes.

DeviceNodes: []*specs.DeviceNode{
{
Path: "/dev/vfio/vfio",
HostPath: filepath.Join(l.devRoot, "/dev/vfio/vfio"),
Copy link
Preview

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here: joining an absolute path to l.devRoot will ignore the root override. Consider constructing this path without the leading slash or using path.Join on relative segments.

Suggested change
HostPath: filepath.Join(l.devRoot, "/dev/vfio/vfio"),
HostPath: filepath.Join(l.devRoot, "dev/vfio/vfio"),

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants