Skip to content

feature: Add invocation request hooks #757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
Closed

feature: Add invocation request hooks #757

wants to merge 11 commits into from

Conversation

lzchen
Copy link
Member

@lzchen lzchen commented Oct 9, 2020

Description

Fixes #


From talking to @Hazhzeng , we want to be able to execute custom logic in a customer's function, especially when it comes to reading/modifying the context before the request is made. This PR adds hooks for before and after the handling of the invocation request.

An example using OpenCensus AzureExporter which exports telemetry to App Insights. This is in a normal HTTPTrigger function.

import json
import logging
import requests

import azure.functions as func
from azure_functions_worker.extensions import register_before_invocation_request, register_after_invocation_request
from opencensus.ext.azure.trace_exporter import AzureExporter
from opencensus.trace import config_integration
from opencensus.trace.samplers import ProbabilitySampler
from opencensus.trace.tracer import Tracer
from opencensus.trace import execution_context
from opencensus.trace.propagation.trace_context_http_header_format import TraceContextPropagator

config_integration.trace_integrations(['requests'])

exporter = AzureExporter(instrumentation_key='<your-ikey-here>')

def callback(context):
    print("TRACEPARENT: " + context.trace_context.Traceparent)
    span_context = TraceContextPropagator().from_headers({"traceparent": context.trace_context.Traceparent, "tracestate": context.trace_context.Tracestate})
    # Create a tracer using the current traceparent passed in from the context in the callback
    tracer = Tracer(span_context=span_context, exporter=exporter, sampler=ProbabilitySampler(1.0))
    context.tracer = tracer #  < -- modify the context by passing in a tracer

def callback2(context):
    print("DONE")

register_before_invocation_request(callback)
register_after_invocation_request(callback2)

def main(req: func.HttpRequest, context: func.Context) -> func.HttpResponse:
    tracer = context.tracer
    execution_context.set_opencensus_tracer(tracer)  # < -- sets the passed in tracer as the current tracer
    with tracer.span("parent"):
        response = requests.get(url='http://example.com')
    return json.dumps({
        'method': req.method,
        'ctx_func_name': context.function_name,
        'ctx_func_dir': context.function_directory,
        'ctx_invocation_id': context.invocation_id,
        'ctx_trace_context_Traceparent': context.trace_context.Traceparent,
        'ctx_trace_context_Tracestate': context.trace_context.Tracestate,
    })

In the above code, the context is passed through these hooks and made available to the callback functions. Within the callbacks, OpenCensus SDK treats the context as the parent context, creates a new tracer with this context as the parent, and sets this tracer to be the one that is used currently (execution_context.set_opencensus_tracer(tracer)). Then any custom dependencies that will be tracked using Opencensus SDK (like response = requests.get(url='http://example.com'), will be connected with the current context within the functions environment.

PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made and CI is passing.

Quality of Code and Contribution Guidelines

@lzchen
Copy link
Member Author

lzchen commented Oct 9, 2020

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@anthonychu
Copy link
Member

This is a good idea, but wondering if there should be a discussion in this repo about how this is implemented. (if the issue exists, please link to it, would love to take a look)

Specifically interested in examples of how these callbacks are registered, and whether there was any discussions on pros/cons compared to a middleware/pipeline model.

We also had asks for this in other languages, so it feels like there might be a broader discussion here to ensure the right scenarios are covered and see if there's a possibility to create a consistent approach.

@lzchen
Copy link
Member Author

lzchen commented Oct 13, 2020

@anthonychu
There isn't an issue that currently exists for this. There is an internal email thread that proposes the idea (but not any implementation details) by @Hazhzeng and I. We need this feature on the monitoring team to be able to hook on with our SDKs and I took a stab at it to expedite the discussions.

@markwolff is implementing this on the Node side and we are essentially doing the same thing. What would you recommend for next steps if we want to get this feature in ASAP?

@anthonychu
Copy link
Member

@lzchen OK so this is internal to the workers and it's specific to supporting App Insights correlation and is not something that's exposed to the customer? If that's the case, I don't have any concerns.

@Hazhzeng
Copy link
Contributor

Hi @lzchen, thanks for making the PR, overall it looks good.
Could you update the PR description and give us a code example on how does AppInsight integrate with the extension interface? (e.g. register_before_invocation_request and clear_before_invocation_request). It would be great if you can explain it in a few sentences.

@lzchen
Copy link
Member Author

lzchen commented Oct 15, 2020

@Hazhzeng
Updated the PR description with an example and info on how this lights up our correlation scenario.

@vrdmr
Copy link
Member

vrdmr commented Oct 19, 2020

@Hazhzeng - In the example, I can see the following -

from azure_functions_worker.extensions import register_before_invocation_request, register_after_invocation_request
  • Would this mean that the customer needs to add azure_functions_worker to the requirements.txt?

/cc @lzchen

@vrdmr vrdmr added the blocked label Oct 19, 2020
@Hazhzeng
Copy link
Contributor

@vrdmr - Well spotted

The customer writing the Azure Functions code shouldn't need to import the azure_functions_worker, but the library writing the extension will need to do it. I would expect the appinsight will publish a separate library and wrapping the opencensus (e.g. azure-functions-appinsight) and place the initialization code into it. So that the customer only need to import azure-functions-appinsight.

But this still introduces an issue since Linux Consumption SKU prioritizes the customer sys.path over the Azure Functions Worker sys.path. This has a potential to trigger the backward-compatibility issue like #729 since azure-functions-appinsight will also introduce an installation of azure_functions_worker -> grpcio, grpcio-tools -> protobuf.
image

I think to resolve this issue from the root, we do need to fork the grpcio (Apache-2.0), grpcio-tools (Apache-2.0) and protobuf (license) library from google. I consider other alternatives but none of them actually solve the issue.

  1. If we try to remove the azure_functions_worker library in remote build, we still don't have control on grpcio, grpcio-tools and protobuf dependent libraries (e.g. say if we remove grpcio, grpcio-tools and protobuf if customer define azure_functions_worker in requirements.txt, this will break customers using tensorflow since it also depends on grpcio and grpcio-tools).
  2. Use pipdeptree to resolve grpcio-tools dependent libraries and remove them only if used by azure_functions_worker. This is a generally safer solution, but the oryx build does not generate virtual environment and thus the packages cannot be analyze correctly.
    image
  3. Load azure_functions_worker from customer packages. This is dangerous since we have no idea which customer brought old worker into function app. Enabling this will break the current behavior in their function app and they will lose all the bug fixes and update features.

@lzchen - Sorry, I should've caught this much earlier when planning the change. We need to solve this potential backward-compatibility issue before releasing this change, will reset my review.

@Hazhzeng Hazhzeng self-requested a review October 19, 2020 17:11
@vrdmr vrdmr force-pushed the dev branch 2 times, most recently from a4a3be3 to 698c524 Compare January 13, 2021 21:45
@microsoft-github-updates microsoft-github-updates bot changed the base branch from dev to main January 14, 2021 01:41
@lzchen
Copy link
Member Author

lzchen commented Jan 26, 2021

@Hazhzeng

Any updates on this?

@Hazhzeng
Copy link
Contributor

Hey @vrdmr,

I think we should continue merge this PR since we introduces PYTHON_ISOLATE_WORKER_DEPENDENCIES flag.
If the customer runs into the protobuf issue. We can just simply ask them to turn on the feature flag for mitigation.

@vrdmr
Copy link
Member

vrdmr commented Jan 28, 2021

Sounds good @Hazhzeng.

@Hazhzeng/@lzchen - Is there a way to test this end to end? We don't need to block the PR for the E2E, but can be added post this PR. Your thoughts?

@vrdmr vrdmr changed the base branch from main to dev January 28, 2021 02:48
@vrdmr vrdmr requested a review from AnatoliB as a code owner January 28, 2021 02:48
@AnatoliB
Copy link

AnatoliB commented Jan 28, 2021

@anthonychu

@lzchen Leighton Chen FTE OK so this is internal to the workers and it's specific to supporting App Insights correlation and is not something that's exposed to the customer? If that's the case, I don't have any concerns.

I'm a bit confused by this. From the description and the usage sample, my impression is that this is indeed a feature exposed to the customer. Whether we document/advertise this or not, we are going to expose the register_before_invocation_request and register_after_invocation_request functions, so customers can and will start depending on them. Is this correct?

I'm not saying we shouldn't be doing this. But we need to be aware of this, and also consider consistency with other languages. Also, are we going to document this?

@lzchen
Copy link
Member Author

lzchen commented Jan 28, 2021

@vrdmr

@Hazhzeng/@lzchen - Is there a way to test this end to end? We don't need to block the PR for the E2E, but can be added post this PR. Your thoughts?

An e2e test could be created to light up the scenario of connecting custom dependencies via Opencensus with other services yes. I agree this can be added after this PR.

@lzchen
Copy link
Member Author

lzchen commented Jan 28, 2021

@AnatoliB

Whether we document/advertise this or not, we are going to expose the register_before_invocation_request and register_after_invocation_request functions, so customers can and will start depending on them. Is this correct?

Good question. I'm not sure about the functions worker guidelines for not exposing methods to the user (in Azure monitor, we simply make the methods protected and it implies to users that methods are not exposed in our API). Would this be sufficient enough? It is difficult (especially a language like Python) to completely prevent users from using this.

also consider consistency with other languages.

This feature has already been implemented in nodejs, however the methods are marked as private.

Also, are we going to document this?

If this functionality is not being exposed to customers, do we have to document this?

# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

_EXTENSIONS_CONTEXT = dict()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hazhzeng @lzchen - I don't know the internal details or discussions of the feature, but wouldn't a function.json would have been a better approach for this? To make the contract clearer for all the languages?

My issue (not strong though) is still with this design choice - to import this - as this isn't the convention and this would start new precedence.

from azure_functions_worker.extensions import register_before_invocation_request, register_after_invocation_request

Can you explain an example app - I am assuming azure_functions_worker would be added to the requirements.txt and then using the extensions view.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the scenario in which we are trying to light up is that incoming requests to the user's function won't be correlated with dependency calls made within the user's function when collecting telemetry via Opencensus. Currently, our workaround is to tell users to connect the telemetry manually by setting the used tracer (Opencensus uses a tracer object to enable distributed tracing), into the passed in context that is received in the function header: def main(req: func.HttpRequest, context: func.Context) -> func.HttpResponse: as well as extracting the context from the headers.

In order to not have to have the user call this manually in their functions app, we have discussed implementing the ability to attach function hooks in the functions worker, which can run code before/after running a user's function. With this, we can run the "connecting traces" logic right before the user's function is run as shown in the example in the PR description.

The example above simply shows how this can be done in code. It is not what we expect the customers to do because we will be planning on adding this "hook" logic (callback) in the Opencensus code upon initiation of the AzureExporter. This means that Opencensus will take a dependency on azure-functions-worker, attach a callback function with the hook on initiation and users will only have to manually set the tracer to be the current tracer in their functions app.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lzchen After this PR is merged, what and how is going to invoke register_before_invocation_request and register_after_invocation_request? From the description and the sample, my impression is that this is going to be up to the function app code (which makes this a feature available to the customer). If you don't intend to make app authors do that, how is this supposed to happen? Are you going to clone and modify the worker (that would explain everything)? Are we expecting another PR that will actually invoke these from the worker code?

from azure_functions_worker import extensions


class TestExtensions(TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a function test in the test_http_functions file with this approach, to get a more complete test.



def register_before_invocation_request(callback):
if _EXTENSIONS_CONTEXT.get("BEFORE_INVOCATION_REQUEST_CALLBACKS"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make BEFORE_INVOCATION_REQUEST_CALLBACKS and AFTER_INVOCATION_REQUEST_CALLBACKS as private variables itself rather than raw strings here.

Also - Any specific reason for the long names here? Would this be better?

BEFORE_INVOCATION_CALLBACKS
AFTER_INVOCATION_CALLBACKS

_EXTENSIONS_CONTEXT["AFTER_INVOCATION_REQUEST_CALLBACKS"] = [callback]


def clear_before_invocation_request_callbacks():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all the functions in here, please add py-doc strings.

def function():
  """Doc here
  """

def tearDown(self):
extensions._EXTENSIONS_CONTEXT.clear()

def test_register_before_invocation_request(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above for PyDocs.

Please add a single line for each test to describe what the test is doing.

@AnatoliB
Copy link

@lzchen Please help me understand this, I might be missing some context. I'm still not clear on how it is not exposed to customers. I thought the sample code in the description demonstrated intended usage. So, after this PR, any Python function app can take advantage of this feature by invoking register_before_invocation_request and register_after_invocation_request. Furthermore, preventing the customer from invoking these would defeat the purpose, as your sample in the description will stop working. Is this correct?

In the nodejs implementation you are referring to, registerBeforeInvocationRequest and registerAfterInvocationRequest seem to be public, probably for exactly the same reason.

Copy link
Member

@vrdmr vrdmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the names of the functions and other variables to - Using pre and post instead of before and after respectively. Pre- and Post- make it clearer of what they are supposed to do.

get_pre_invocation_request_callbacks
get_post_invocation_request_callbacks

callback(context)
except Exception as ex:
logger.warning(
"Before invocation callback failed with: %s.", ex)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lzchen @Hazhzeng - How is this exposed to the user?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically - is this a USER log or System log? I think its system and won't be shown to the user. @Hazhzeng could you please confirm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logger is a system log. Yes, it won't be shown to user.

try:
callback(context)
except Exception as ex:
logger.warning(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah same, this is system log. For extension to expose the log to users, they need to define a new logger without the azure_functions_worker prefix.

import logging

logger = logging.getLogger('SomeExtension')
logger.info('xxx')

@lzchen
Copy link
Member Author

lzchen commented Jan 28, 2021

@AnatoliB
Yes I can see why the example can be confusing. Please look at this comment.

The intention was to simply light up our Azure monitor scenario and make things simpler for functions customers to obtain connected traces without having to manually add too much code. It is not really within the scope of this PR to try to add new functionality. Speaking to @Hazhzeng it was just a nice addition to have to allow users to be able to do this and I believed it to be a simple addition. If you think that exposing this to customers adds too much functionality, and if the customers are okay with manually instrumenting their code in their functions app then that could be brought into the discussion.

@AnatoliB
Copy link

@lzchen I like the feature, I believe it is useful, and the implementation looks good to me. But I keep reading two things that look mutually-exclusive to me:

  1. "This is just a worker implementation detail." In this case, Functions users cannot take (or not take) a dependency on this, no documentation is required, consistency with other languages is irrelevant.
  2. "This is a useful and simple feature." In this case, we should review it from this perspective: consider consistency with other languages, documentation, etc., and make sure we keep supporting this feature moving forward. I'm not saying this PR is not good enough for these purposes, and chances are we'll just take it as is. But we need to know.

I think we need to choose one, it cannot be both :-)

@lzchen
Copy link
Member Author

lzchen commented Jan 28, 2021

@AnatoliB
Got it. I think we would like to take the (2) approach, in which case I will fix the PR to adhere to the feature guidelines.

@AnatoliB
Copy link

AnatoliB commented Jan 29, 2021

@anthonychu Since we agreed that this is actually a customer-facing feature, do you want to review it from this perspective?

UPDATE: We've discussed this offline, and I realized that the Node.js implementation is different. As opposed to this PR, it does not expose a customer-facing feature, so keeping callbacks consistent is pointless, and the question below is irrelevant.

I have a question on consistency with the Node.js implementation. While comparing this PR with Node.js implementation, I noticed that the callback functions in the Node.js case look like this:

type InvocationRequestBefore = (context: Context, userFn: Function) => Function;
type InvocationRequestAfter = (context: Context) => void;

Does anybody know what is the story behind userFn? In the Python case, only context is passed in both cases. Do we have a reason to follow the same pattern?

@lzchen
Copy link
Member Author

lzchen commented May 5, 2021

Closing in favor of #815

@lzchen lzchen closed this May 5, 2021
@vrdmr vrdmr deleted the lzchen/hooks branch February 4, 2022 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants