-
Notifications
You must be signed in to change notification settings - Fork 107
Shared memory data transfer between Functions Host and Python worker #765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…fore passing to the function
cc: @goiri Opening a draft PR for now. You can skim through whenever. |
23e80ec
to
b81dbcc
Compare
# Having a mapping of the name and the memory map is then later used to close a given | ||
# memory map by its name, after it has been used. | ||
# key: map_name, val: mmap.mmap | ||
self.allocated_mem_maps: Dict[str, mmap.mmap] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little convoluted, but what if the worker restarts and the mmaps stay there?
Should we try to repopulate them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ummm... that's a good point. If the worker restarts in case of error, then there is already trouble.
The reason why I don't want to repopulate this is because if the memory map was allocated by the functions host or a different worker process, then things can get weird.
So, we need to think on how to avoid such leaks if the worker restarts.
But then, what if the functions host restarts - even that could possibly leave maps around, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm pretty sure it would be fine but worth giving a though to those situations.
This is an interface that must be implemented by sub-classes to provide platform-specific | ||
support for accessing memory maps. | ||
Currently the following two sub-classes are implemented: | ||
1) FileAccessorWindows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows is not a file right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will rename to SharedMemoryMapAccessor*
.
I am also thinking of refactoring this similar to what we have in C# i.e. have a class called SharedMemoryMap
and do all the operations which are being done in FileReader
/FileWriter
as part of that class.
if datum.type == 'bytes': | ||
value = datum.value | ||
map_name = shmem_mgr.put_bytes(value) | ||
if map_name is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The else of this if would end up into a not implemented for a datum type while it is not the issue here.
out_name: str, | ||
shmem_mgr: SharedMemoryManager) -> protos.ParameterBinding: | ||
datum = get_datum(binding, obj, pytype) | ||
# TODO gochaudh: IMPORTANT: Right now we set the AppSetting to disable this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now from C# we check the capabilities of the python worker; which is the case we would expect while rollout.
I know is unlikely but do we handle the host C# side not supporting shared memory but Python yes?
# Having a mapping of the name and the memory map is then later used to close a given | ||
# memory map by its name, after it has been used. | ||
# key: map_name, val: mmap.mmap | ||
self.allocated_mem_maps: Dict[str, mmap.mmap] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm pretty sure it would be fine but worth giving a though to those situations.
# Zero byte. | ||
# E.g. Used to compare the first byte of a newly created memory map against this; if it is a | ||
# non-zero byte then the memory map was already created. | ||
ZERO_BYTE = b'\x00' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call it EMPTY_MAP_MARK and then in the comment we say this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what does the first byte contain in principle that guarantees a non-0 first byte? Is there any corner case like if the content has a specific lenght that happens to generate a first byte as 0?
a4a3be3
to
698c524
Compare
Closed and moved to this PR. |
Description
Fixes Azure/azure-functions-host#6791
PR information
Quality of Code and Contribution Guidelines