Skip to content

Shared memory data transfer between Functions Host and Python worker #765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from

Conversation

gohar94
Copy link
Contributor

@gohar94 gohar94 commented Oct 14, 2020

Description

Fixes Azure/azure-functions-host#6791


PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made and CI is passing.

Quality of Code and Contribution Guidelines

@gohar94 gohar94 marked this pull request as draft October 14, 2020 17:51
@gohar94
Copy link
Contributor Author

gohar94 commented Oct 14, 2020

cc: @goiri

Opening a draft PR for now. You can skim through whenever.

@vrdmr vrdmr added the blocked label Oct 19, 2020
@gohar94 gohar94 force-pushed the gochaudh/shared_mem_data_transfer branch from 23e80ec to b81dbcc Compare December 16, 2020 23:57
# Having a mapping of the name and the memory map is then later used to close a given
# memory map by its name, after it has been used.
# key: map_name, val: mmap.mmap
self.allocated_mem_maps: Dict[str, mmap.mmap] = {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little convoluted, but what if the worker restarts and the mmaps stay there?
Should we try to repopulate them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ummm... that's a good point. If the worker restarts in case of error, then there is already trouble.
The reason why I don't want to repopulate this is because if the memory map was allocated by the functions host or a different worker process, then things can get weird.
So, we need to think on how to avoid such leaks if the worker restarts.
But then, what if the functions host restarts - even that could possibly leave maps around, right?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm pretty sure it would be fine but worth giving a though to those situations.

This is an interface that must be implemented by sub-classes to provide platform-specific
support for accessing memory maps.
Currently the following two sub-classes are implemented:
1) FileAccessorWindows
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows is not a file right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will rename to SharedMemoryMapAccessor*.
I am also thinking of refactoring this similar to what we have in C# i.e. have a class called SharedMemoryMap and do all the operations which are being done in FileReader/FileWriter as part of that class.

if datum.type == 'bytes':
value = datum.value
map_name = shmem_mgr.put_bytes(value)
if map_name is not None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else of this if would end up into a not implemented for a datum type while it is not the issue here.

out_name: str,
shmem_mgr: SharedMemoryManager) -> protos.ParameterBinding:
datum = get_datum(binding, obj, pytype)
# TODO gochaudh: IMPORTANT: Right now we set the AppSetting to disable this
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now from C# we check the capabilities of the python worker; which is the case we would expect while rollout.
I know is unlikely but do we handle the host C# side not supporting shared memory but Python yes?

# Having a mapping of the name and the memory map is then later used to close a given
# memory map by its name, after it has been used.
# key: map_name, val: mmap.mmap
self.allocated_mem_maps: Dict[str, mmap.mmap] = {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm pretty sure it would be fine but worth giving a though to those situations.

# Zero byte.
# E.g. Used to compare the first byte of a newly created memory map against this; if it is a
# non-zero byte then the memory map was already created.
ZERO_BYTE = b'\x00'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call it EMPTY_MAP_MARK and then in the comment we say this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what does the first byte contain in principle that guarantees a non-0 first byte? Is there any corner case like if the content has a specific lenght that happens to generate a first byte as 0?

@vrdmr vrdmr force-pushed the dev branch 2 times, most recently from a4a3be3 to 698c524 Compare January 13, 2021 21:45
@microsoft-github-updates microsoft-github-updates bot changed the base branch from dev to main January 14, 2021 01:41
@gohar94 gohar94 closed this Feb 5, 2021
@gohar94
Copy link
Contributor Author

gohar94 commented Feb 5, 2021

Closed and moved to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shared memory data transfer between Functions host and out-of-proc workers for binary data
3 participants