Skip to content

Fix issue when isolating libraries when file not ready #854

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 4, 2021

Conversation

Hazhzeng
Copy link
Contributor

@Hazhzeng Hazhzeng commented May 28, 2021

Description

Fixes an issue in Python 3.9 Linux Consumption plan. This occurs when the customer's azure.functions library is not presented as we tried to clear out its cache prematurely. This will result in a longer cold start time since the function host will restart several times to wait for the customer's content before doing specialization. The worst scenario will cause the website goes into a 404 state for a few seconds during the code start.

Result:
Failure Exception:
KeyError: 'azure.functions'
Stack:
   File '/azure-functions-host/workers/python/3.9/LINUX/X64/azure_functions_worker/dispatcher.py', line 482, in _handle__function_environment_reload_request     DependencyManager.reload_azure_google_namespace(
   File '/azure-functions-host/workers/python/3.9/LINUX/X64/azure_functions_worker/utils/dependency.py', line 155, in reload_azure_google_namespace     cls.reload_all_namespaces_from_customer_deps(cx_working_dir)
   File '/azure-functions-host/workers/python/3.9/LINUX/X64/azure_functions_worker/utils/dependency.py', line 207, in reload_all_namespaces_from_customer_deps     cls._remove_from_sys_path(cls.worker_deps_path)
   File '/azure-functions-host/workers/python/3.9/LINUX/X64/azure_functions_worker/utils/dependency.py', line 259, in _remove_from_sys_path     cls._clear_path_importer_cache_and_modules(path)
   File '/azure-functions-host/workers/python/3.9/LINUX/X64/azure_functions_worker/utils/dependency.py', line 278, in _clear_path_importer_cache_and_modules     cls._remove_module_cache(path)
   File '/azure-functions-host/workers/python/3.9/LINUX/X64/azure_functions_worker/utils/dependency.py', line 353, in _remove_module_cache     elif getattr(module, '__path__', None) and getattr(
   File '<frozen importlib._bootstrap_external>', line 1255, in __len__
   File '<frozen importlib._bootstrap_external>', line 1234, in _recalculate
   File '<frozen importlib._bootstrap_external>', line 1230, in _get_parent_path 

Discussion

  1. When customer's trigger name collided with customer's libraries, should give a warning.

Mitigation

This is currently impacting the customers using Python 3.9 Linux Consumption. Setting the PYTHON_ISOLATE_WORKER_DEPENDENCIES to 0 will mitigate this issue for now.

Validation:

  1. Use pytest --instafail --cov=./azure_functions_worker --cov-report xml --cov-branch .\tests\unittests\test_extension.py .\tests\unittests\test_utilities_dependency.py --randomly-seed=3973973583 to introduce unexpected module loading order. This should fails with azure.functions key not found before this fix.

Also found that improper namespacing in worker nuget artifact:

The __init__.py file should only be placed in https://github.com/Azure/azure-functions-python-library source code. However, this shouldn't be included in a release package.
image

According to Python namespace specification https://www.python.org/dev/peps/pep-0420/#specification. Quote:

Namespace packages cannot contain an init.py. As a consequence, pkgutil.extend_path and pkg_resources.declare_namespace become obsolete for purposes of namespace package creation. There will be no marker file or directory for specifying a namespace package.


PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made and CI is passing.

Quality of Code and Contribution Guidelines

Hanzhang Zeng (Roger) added 2 commits June 2, 2021 22:06
Adding namespace import scenarios

Ensure changes

Add unittests

Amend test cases

Fix an issue when customer defines a module with collided name

Fix test_paths_resolution

Fix case
@Hazhzeng Hazhzeng force-pushed the hazeng/fix-azure-functions branch from 6f43ad1 to 8ac6543 Compare June 3, 2021 05:12
@Hazhzeng
Copy link
Contributor Author

Hazhzeng commented Jun 3, 2021

Validation:

Without the fix from Azure/azure-functions-python-library#90:
image

With the fix from Azure/azure-functions-python-library#90:
image

Tested with the following mesh docker image and command:
run_host_mesh_custom.ps1

.\run_host_mesh_custom.ps1 -ImageName mcr.microsoft.com/azure-functions/mesh:3.0.15828-python3.9 -PythonVersion 3.9

@Hazhzeng
Copy link
Contributor Author

Hazhzeng commented Jun 3, 2021

Validation on Linux Consumption Python 3.9 (image: mcr.microsoft.com/azure-functions/mesh:3.0.15828-python3.9)
Without the change, able to reproduce the issue (expect to fail):
image
image

With the latest Worker and Library mounted, able to mitigate this issue. Confirmed with multiple runs (expect to succeed):
image
image

@Hazhzeng
Copy link
Contributor Author

Hazhzeng commented Jun 3, 2021

Validation on Linux Consumption Python 3.8 (image: mcr.microsoft.com/azure-functions/mesh:3.0.15828-python3.8)
Without any changes, feature flag disabled by default (expect to succeed):
image
image

With the latest worker and library mounted, feature flag disabled by default (expect to succeed):
image
image

@Hazhzeng
Copy link
Contributor Author

Hazhzeng commented Jun 3, 2021

Validation on Linux Dedicated Python 3.9 (image: mcr.microsoft.com/azure-functions/python:3.0.15828-python3.9)
With the original image (expect to succeed):
image
image

With the latest change (expect to succeed):
image
image

@Hazhzeng
Copy link
Contributor Author

Hazhzeng commented Jun 3, 2021

Validation on Linux Dedicated Python 3.8 (image: mcr.microsoft.com/azure-functions/python:3.0.15828-python3.8)
With the original image, feature flag is disabled by default. Customer brings grpcio-1.35.0 / Worker has grpcio-1.33.2 (collision detected, function is not able to start):
image

With the original image, setting the PYTHON_ISOLATE_WORKER_DEPENDENCIES = 1, function app is now working:
image
image

With the latest worker update, feature flag is disabled by default. (collision detected, function is not able to start):
image

With the latest worker update, enable the feature flag, function is recovered:
image
image

@Hazhzeng
Copy link
Contributor Author

Hazhzeng commented Jun 4, 2021

A new round of test with the __init__.py removed from worker dependencies azure/ folder.
Python 3.9 consumption:
image

Python 3.9 dedicated:
image

Python 3.8 consumption:
image

Python 3.8 dedicated (expect to fail, since collision in [customer grpcio-1.35.0 / worker grpcio-1.33.2] library):
image

Copy link
Member

@vrdmr vrdmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments.

LGTM

@@ -480,7 +483,7 @@ async def _handle__function_environment_reload_request(self, req):
)

# Reload azure google namespaces
DependencyManager.reload_azure_google_namespace(
DependencyManager.reload_customer_libraries(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the # Reload azure google namespaces comment here. Not needed.

@@ -18,6 +18,7 @@

# Platform Environment Variables
AZURE_WEBJOBS_SCRIPT_ROOT = "AzureWebJobsScriptRoot"
CONTAINER_NAME = "CONTAINER_NAME"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this constant in dependency.py itself? This isn't being used globally.

if cx_working_dir:
working_directory: str = os.path.abspath(cx_working_dir)
if not working_directory:
working_directory = cls.cx_working_dir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there ever be a case that cx_working_dir isn't a property in cls?

# The modules defined in customer's working directory should have the
# least priority since we uses the new folder structure.
# Please check the "Message to customer" section in the following PR:
# https://github.com/Azure/azure-functions-python-worker/pull/726
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make a pinned issue as well (or check if this is added to Python Dev Docs itself)?


@classmethod
def reload_azure_google_namespace(cls, cx_working_dir: str):
def reload_customer_libraries(cls, cx_working_dir: str):
"""Reload azure and google namespace, this including any modules in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not only doing Azure and Google right? Comment should reflect this now.

@@ -12,7 +12,10 @@ steps:
inputs:
disableAutoCwd: true
scriptPath: 'pack/scripts/nix_deps.sh'
# The following task is disabled since Python 3.9 to fix azure/ namespace
# https://github.com/Azure/azure-functions-python-worker/pull/854
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we re-enable this particular task later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is an actual fix for Python 3.9 and afterward.



def main(req: func.HttpRequest) -> func.HttpResponse:
"""This function is an HttpTrigger to check if the modules are loaded from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test! :)

@vivek-bsn
Copy link

@Hazhzeng Currently the mitigation provided works to avoid 404 errors. Any update on when this fix is expected to roll out to the production so that the app setting 'PYTHON_ISOLATE_WORKER_DEPENDENCIES ' can be removed?

@vrdmr
Copy link
Member

vrdmr commented Aug 5, 2021

The fix is currently being rolled out and is halfway across all our fleet. It should be completed by mid-next week to all the geo-locations.

@vivek-bsn
Copy link

Once the fix is deployed, we can remove the setting 'PYTHON_ISOLATE_WORKER_DEPENDENCIES' to 0 right? (or) do we need to retain it in the app settings?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants