Skip to content

L0: switch to zesInit() #695

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 13, 2024
Merged

L0: switch to zesInit() #695

merged 7 commits into from
Nov 13, 2024

Conversation

bgoglin
Copy link
Contributor

@bgoglin bgoglin commented Nov 8, 2024

  • only support implementations with zesDriverGetDeviceByUuidExp()
  • the above means we may assume zesInit() is available and works
  • remove the constructor that forces ZES_ENABLE_SYSMAN in the environment (FINALLY!!!)
  • use different handles for ZE and ZES devices during discovery
  • cleanup/update helpers and tests accordingly

Still needs to be tested on a machine with subdevices (PVC), but I don't have any such machine with a recent-enough L0 implementation.

L0 osdevs may have subdevices, hence the PCI parent isn't always the first parent.

Signed-off-by: Brice Goglin <[email protected]>
They are the same for now because we use the legacy way
to enable Sysman, which allows to cast between handles,
but it won't be the case anymore soon.

By the way, clear an array on error to help coverity.

Signed-off-by: Brice Goglin <[email protected]>
zesDriverGetDeviceByUuidExp() will be mandatory in the next commits.

Now we assume zesInit() exists (and works) since it was added way earlier
than zesDriverGetDeviceByUuidExp()

Signed-off-by: Brice Goglin <[email protected]>
Instead of casting from ZE handles.

Disable tests/hwloc/levelzero for now since the core library is still
transitioning to zesInit(), and helpers haven't been updated yet.

Signed-off-by: Brice Goglin <[email protected]>
Just rely on zesInit() being supported (because we use features added later).
Remove the constructor that was setting ZES_ENABLE_SYSMAN=1 in the environment.

tests/hwloc/levelzero.c is updated too, but still disabled
until helpers are updated.

Signed-off-by: Brice Goglin <[email protected]>
We cannot use the same handle for ZE/Core and ZES/Sysman APIs anymore,
so split the helpers in levelzero.h into normal and sysman functions,
and update tests to check both.

Reenable tests/hwloc/levelzero.c and update it to test both ZE and
ZES APIs.

By the way, cleanup random things and clarify some docs.

Signed-off-by: Brice Goglin <[email protected]>
@bgoglin bgoglin merged commit 0474e06 into open-mpi:master Nov 13, 2024
1 check passed
@bgoglin bgoglin deleted the l0-byuuid branch November 13, 2024 08:05
@TApplencourt
Copy link

Still needs to be tested on a machine with subdevices (PVC), but I don't have any such machine with a recent-enough L0 implementation.

BTW, when I tests it was on a PVC

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants