-
Notifications
You must be signed in to change notification settings - Fork 30
Make dppl rt threadlocal #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Also contains the changes from PR #7 |
dppl/oneapi_interface.pyx
Outdated
|
||
# Initialize a thread local instance of _DpplRuntime | ||
_tls = threading_local() | ||
_tls._in_dppl_ctxt = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we ever discussed this scenario. However, I think I agree that in a new thread, you go back to the default of not being in a context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am removing the flag. Instead of the flag let me add a public getter that returns True/False based on if we are inside any context by checking the size of the stack of contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change in 8318d44
dppl/oneapi_interface.pyx
Outdated
print("Removing the context from the deque of active contexts") | ||
runtime._deactivate_current_queue() | ||
_tls._runtime._deactivate_current_queue() | ||
_tls._in_dppl_ctxt = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a virtual stack of contexts? I would think that on "finally" that you pop the top off the stack and then only set in_dppl_ctxt to false if the stack is empty. Even better, you don't have this in_dppl_ctxt at all and just check the size of the stack if you need to know if you're in a context or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stack of contexts is maintained inside the _runtime.rt object. I am using a std::deque to store the list of currently activated queues. The first entry of the deque is always the global or default queue. Using an example:
default_q = dppl.get_current_queue()
with dppl.device_context(dppl.device_type.cpu, 0) as cpu_q:
...
with dppl.device_context(dppl.device_type.gpu, 0) as gpu_0_q:
...
with dppl.device_context(dppl.device_type.gpu, 1) as gpu_1_q:
...
Here the stack will be maintained inside dppl._tls._runtime as:
default_q
(initial state)
cpu_q --> default_q
(on entering first with context)
gpu_0_q --> cpu_q --> default_q
(on entering the first nested with context)
cpu_q --> default_q
(on exiting the first nested with context)
gpu_1_q --> cpu_q --> default_q
(on entering the second nested with context)
cpu_q --> default_q
(on exiting the second nested with context)
default_q
(on exiting the top-level with context)
The _deactivate_current_queue()
call does the pop from the deque. dppl.get_current_queue
always returns the current top of stack queue.
About your second point about in_dppl_ctxt
, yes the way it is currently implemented it returns wrong result when contexts are nested. Let me fix it to return the size of the deque -1 (since the default queue is always going to be inside the deque).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change in 8318d44
-- Adds a new helper function to check the size of the deque to infer if inside a device_context. Works for nested contexts as well. -- Add unit test cases.
_tls = threading_local() | ||
_tls._runtime = _DpplRuntime() | ||
|
||
|
||
################################################################################ | ||
#--------------------------------- Public API ---------------------------------# | ||
################################################################################ | ||
|
||
|
||
dump = _tls._runtime.dump | ||
dump_queue_info = _tls._runtime.dump_queue_info | ||
get_current_queue = _tls._runtime.get_current_queue | ||
get_num_platforms = _tls._runtime.get_num_platforms | ||
set_default_queue = _tls._runtime.set_default_queue | ||
is_in_dppl_ctxt = _tls._runtime.is_in_dppl_ctxt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the entire runtime thread-local? The only thing which is thread-local is the stack of contexts. Making shared data thread-local is calling for trouble.
I think the runtime class should encapsulate the thread-local business and return the thread-local values in its member functions like get_current_queue. Maybe this needs to be done in C++ (e.g. your std::deque be thread-local).
Stores the sycl device queues in a thread local storage. The DpplRuntime object is no longer exposed, but new global prebound functions exposes the same API.