Skip to content

hwloc-affinity #359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed

Conversation

NicolasDenoyelle
Copy link

Extend hwloc utilities with a tool to bind threads to single processing units with a policy.

@sthibaul
Copy link
Contributor

Interesting :)

In cpuaffinity_exec, I believe you need to add a synchronization between the child and the parent. Otherwise the child might have the time to clone() before the parent manages to call ptrace().

@bgoglin
Copy link
Contributor

bgoglin commented May 23, 2019

Options -i and --input should use the existing helpers in utils.h so that they handle XML/CPUID/Synthetic etc.

There's also some duplication in hwloc_obj_from_string() which already exists as generic helpers in hwloc-calc.h with support for ranges etc.

If it's only needed for --restrict obj:X, we should extend --restrict in other tools to also support object specifications. Things should be uniform there. Unfortunately, once you start doing that, it means you need to support both physical and logical specification with command-line options. I wonder if we should just use --restrict cpuset and keep telling people to use hwloc-calc for generating the cpuset. Or we should support foo:Px and foo:Lx for logical and physical indexes.

The scatter policy already exists in hwloc-distrib, which could also be extended to export objects and/or indexes.

I think this should be split into two parts:

  1. one for generating list of objects, and this should be integrated in hwloc-calc. It's just a matter of adding new output options. We already have options for exporting with different separators, physical or logical index, etc. Those will be needed for your code anyway.
    Also -i and --input are already supported there.
  2. one for binding new threads using such a list, which doesn't have -i or --input since it uses binding

Nicolas Denoyelle added 28 commits May 23, 2019 10:42
Signed-off-by: Nicolas Denoyelle <[email protected]>
…lstopo into hwloc-affinity

Signed-off-by: Nicolas Denoyelle <[email protected]>
…lstopo into hwloc-affinity

Signed-off-by: Nicolas Denoyelle <[email protected]>
Signed-off-by: Nicolas Denoyelle <[email protected]>
Signed-off-by: Nicolas Denoyelle <[email protected]>
Signed-off-by: Nicolas Denoyelle <[email protected]>
Signed-off-by: Nicolas Denoyelle <[email protected]>
Signed-off-by: Nicolas Denoyelle <[email protected]>
Signed-off-by: Nicolas Denoyelle <[email protected]>
Signed-off-by: Nicolas Denoyelle <[email protected]>
Signed-off-by: Nicolas Denoyelle <[email protected]>
Signed-off-by: Nicolas Denoyelle <[email protected]>
@bgoglin
Copy link
Contributor

bgoglin commented May 29, 2019

You should likely split this PR in small and easier-to-review pieces to ease its converging/merging. One first good step would be to just add hwloc-bind-threads, with this as a possible usage:

hwloc-bind-threads -t core -- myprog

Where -t core means we're binding the first thread to first core, second thread to second core, etc. And last threads are not bound if there aren't enough cores. Or option to wrap-around.

hwloc-bind-threads <location> ... -- myprog

This would bind each thread to its respective location if any, where location is implemented exactly as in hwloc-calc/bind/...

Add --single and --strict just like in hwloc-bind.

Unless I am mistaken, Linux is the only OS where a process can bind a thread of another process, hence all this can be made Linux-specific.

And add a basic test that counts the number of cores, spawns N+1 threads with pthread_create and they check their own binding once ptrace had time to bind them. Not sure testing with OpenMP is a good idea because the way OS threads are created and given to user OpenMP tasks can vary from one implementation to another.

@sthibaul
Copy link
Contributor

Actually I believe windows can as well, by using OpenThread to get a handle on the thread to be bound

@NicolasDenoyelle
Copy link
Author

NicolasDenoyelle commented Jun 14, 2019

Actually I believe windows can as well, by using OpenThread to get a handle on the thread to be bound

In order to call OpenThread, you have to know which thread to open. ptrace allow to hook on threads
creation. I can't find such a thing on windows documentation.
I found this: https://docs.microsoft.com/en-us/windows/desktop/Dlls/dllmain
which will notify you when a DLL will create a thread. However you can't get the thread ID to further call OpenThread.
Edit: Nevermind I found this: https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/ntddk/nf-ntddk-pssetcreatethreadnotifyroutine. This probably what is needed to hook on thread creation.

@bgoglin
Copy link
Contributor

bgoglin commented Jun 14, 2019

Forget about Windows for now, there are too many missing pieces there. Make all your thread-binding stuff Linux-specific as I said 17 days ago.

@jsquyres
Copy link
Member

Thanks for all the updates.

Taking a step back, though, I see a lot of other pthread tests in configure already, and I see pthread code used throughout the rest of hwloc. Is there really a need for your new configure test? Or does your new pthread-using C code just need to be protected by existing macros?

@NicolasDenoyelle
Copy link
Author

it looks like this test is only run from hwloc's configure.ac. Is this macro also defined if hwloc is embedded?

ptrace configure tests are running in the scope of macros used for hwloc utilities.
pthread tests are running indeed from configure.ac. I guess it does run in embedded mode.
From configure.ac and config/hwloc_internal.m4 I could not find anything preventing it.

Is there really a need for your new configure test? Or does your new pthread-using C code just need to be protected by existing macros?

hwloc does not define macro HAVE_PTHREAD. pthread is required to enable some tests at make check. If it is not defined then tests do not run. I think there is a need for this test in order to
have pthread-using C protected by this macro which has a different meaning than existing C macros.

Make all your thread-binding stuff Linux-specific as I said 17 days ago.

I am working on it. I'll do one branche and one PR per feature as you proposed.

@jsquyres
Copy link
Member

@NicolasDenoyelle Thanks for investigating the configury stuff (embedded, etc.).

@bgoglin
Copy link
Contributor

bgoglin commented Mar 24, 2020

Note to open pull requests: some things changed in the CI yesterday, you'll need to rebase on top of master to avoid total CI failure.

@NicolasDenoyelle NicolasDenoyelle deleted the hwloc-affinity branch April 29, 2020 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants