Skip to content

Make dppl rt threadlocal #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

diptorupd
Copy link
Contributor

Stores the sycl device queues in a thread local storage. The DpplRuntime object is no longer exposed, but new global prebound functions exposes the same API.

@diptorupd
Copy link
Contributor Author

Also contains the changes from PR #7


# Initialize a thread local instance of _DpplRuntime
_tls = threading_local()
_tls._in_dppl_ctxt = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we ever discussed this scenario. However, I think I agree that in a new thread, you go back to the default of not being in a context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am removing the flag. Instead of the flag let me add a public getter that returns True/False based on if we are inside any context by checking the size of the stack of contexts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change in 8318d44

print("Removing the context from the deque of active contexts")
runtime._deactivate_current_queue()
_tls._runtime._deactivate_current_queue()
_tls._in_dppl_ctxt = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a virtual stack of contexts? I would think that on "finally" that you pop the top off the stack and then only set in_dppl_ctxt to false if the stack is empty. Even better, you don't have this in_dppl_ctxt at all and just check the size of the stack if you need to know if you're in a context or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stack of contexts is maintained inside the _runtime.rt object. I am using a std::deque to store the list of currently activated queues. The first entry of the deque is always the global or default queue. Using an example:

default_q = dppl.get_current_queue()

with dppl.device_context(dppl.device_type.cpu, 0) as cpu_q:
    ...
    with dppl.device_context(dppl.device_type.gpu, 0) as gpu_0_q:
        ...
    with dppl.device_context(dppl.device_type.gpu, 1) as gpu_1_q:
        ...

Here the stack will be maintained inside dppl._tls._runtime as:

default_q (initial state)
cpu_q --> default_q (on entering first with context)
gpu_0_q --> cpu_q --> default_q (on entering the first nested with context)
cpu_q --> default_q (on exiting the first nested with context)
gpu_1_q --> cpu_q --> default_q (on entering the second nested with context)
cpu_q --> default_q (on exiting the second nested with context)
default_q (on exiting the top-level with context)

The _deactivate_current_queue() call does the pop from the deque. dppl.get_current_queue always returns the current top of stack queue.

About your second point about in_dppl_ctxt, yes the way it is currently implemented it returns wrong result when contexts are nested. Let me fix it to return the size of the deque -1 (since the default queue is always going to be inside the deque).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change in 8318d44

    -- Adds a new helper function to check the size of the deque
       to infer if inside a device_context. Works for nested contexts
       as well.
    -- Add unit test cases.
@diptorupd diptorupd requested a review from DrTodd13 August 11, 2020 06:38
Comment on lines +147 to +161
_tls = threading_local()
_tls._runtime = _DpplRuntime()


################################################################################
#--------------------------------- Public API ---------------------------------#
################################################################################


dump = _tls._runtime.dump
dump_queue_info = _tls._runtime.dump_queue_info
get_current_queue = _tls._runtime.get_current_queue
get_num_platforms = _tls._runtime.get_num_platforms
set_default_queue = _tls._runtime.set_default_queue
is_in_dppl_ctxt = _tls._runtime.is_in_dppl_ctxt

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the entire runtime thread-local? The only thing which is thread-local is the stack of contexts. Making shared data thread-local is calling for trouble.
I think the runtime class should encapsulate the thread-local business and return the thread-local values in its member functions like get_current_queue. Maybe this needs to be done in C++ (e.g. your std::deque be thread-local).

@diptorupd diptorupd marked this pull request as draft August 13, 2020 23:07
@diptorupd diptorupd closed this Aug 18, 2020
@diptorupd diptorupd deleted the make_dppl_rt_threadlocal branch August 18, 2020 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants