-
Notifications
You must be signed in to change notification settings - Fork 34
[Ratelimit Processor] Instrument the ratelimiter service with telemetry metrics #562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach looks good to me. Some thoughts on additional metrics that we would need to introduce
- Request duration
- Can we measure the no of requests that are over the limits at a given time? I believe we can by using the decision dimension, just confirm if thats is the case.
- Concurrent requests
@lahsivjar Could you think of additional metrics that were useful in MIS so we can add them here? Thanks.
} | ||
|
||
func newGubernatorRateLimiter(cfg *Config, set processor.Settings) (*gubernatorRateLimiter, error) { | ||
var behavior int32 | ||
|
||
telemetryBuilder, err := metadata.NewTelemetryBuilder(set.TelemetrySettings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add this also to the local rate limiter, to make sure we track all the incoming requests to this component. I would move this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vigneshshanmugam Could you explain a bit more here please? Sorry I'm lost 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was definitely without much context, apologeis. What I meant was to add it to
func (r *localRateLimiter) RateLimit(ctx context.Context, hits int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you think of additional metrics that were useful in MIS so we can add them here? Thanks.
Currently in MIS we collect only 2 metrics from the rate limiter (ref):
- A metric to track the status of the ingest requests. This metric has dimensions with bounded cardinality to record the decision and the reason for the decision by the rate limiter service.
- A metric to track the update requests sent between gubernator instances to update them. This metric has just one dimension of status.
While I like the idea of collecting request duration, however, I am not sure how useful concurrent requests would be. We should also be wary of adding too many metrics in hot path, as it does add some overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vigneshshanmugam Could you explain a bit more here please? Sorry I'm lost 😅
I think I can explain this @kaiyan-sheng.
This rate limiter can be local (using a go library for the implementation) or from the gubernator. Basically, if you define a config from the gubernator, we expect it to be from there, otherwise it is local:
opentelemetry-collector-components/processor/ratelimitprocessor/factory.go
Lines 60 to 64 in a2d4b06
if config.Gubernator != nil { | |
return newGubernatorRateLimiter(config, set) | |
} | |
return newLocalRateLimiter(config, set) | |
}) |
So the rate limiter is later created this way:
opentelemetry-collector-components/processor/ratelimitprocessor/processor.go
Lines 67 to 75 in a2d4b06
return &LogsRateLimiterProcessor{ | |
rateLimiterProcessor: rateLimiterProcessor{ | |
Component: rateLimiter, | |
rl: rateLimiter.Unwrap(), | |
}, | |
count: getLogsCountFunc(strategy), | |
next: next, | |
} | |
} |
(I am using one signal just for example).
Since the telemetry builder for the local and gubernator is the same, I think @vigneshshanmugam is suggesting to only initiate it one time (so somewhere in the processor.go
).
Then you can also check the functions:
func (r *MetricsRateLimiterProcessor) ConsumeMetrics(ctx context.Context, md pmetric.Metrics) error { |
Could the metric from the rate limiter be set here? Or does it really neeed to be in the local.go
and gubernator.go
? I don't know the answer to this now, I haven't checked the code in detail since the PR was first opened.
I hope it is clear. We can discuss this in more detail next week if you would like :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@constanca-m Thanks for the comments! Yep make sense to move telemetryBuilder, err := metadata.NewTelemetryBuilder(set.TelemetrySettings)
into factory.go
so we don't create it separately for local.go
and gubernator.go
. But for metrics, I think it's still necessary to have in both places. I will keep them separate for now, move forward with this task and let's revisit it next week!
Co-authored-by: Vignesh Shanmugam <[email protected]>
Yes we can use the
These two metrics are implemented in a separate PR. |
@kaiyan-sheng what is still pending on this PR? |
Nothing is pending on this PR, it's ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM
@kaiyan-sheng I think we need the |
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
@gizas Oh good point! I forgot there are two diff reasons there for the local case. I will put them back! |
r.requestTelemetry(ctx, []attribute.KeyValue{ | ||
telemetry.WithReason(telemetry.RequestErr), | ||
telemetry.WithDecision("accepted"), | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So continuing my comment https://github.com/elastic/opentelemetry-collector-components/pull/562/files#r2098431295, maybe you could check errors.Is(...,...)
in processor.go
and then set the metric in processor.go
. This is so that local.go
and gubernator.go
won't need to have duplicated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++, would be good to do as part of the followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few nits, mostly there.
r.requestTelemetry(ctx, []attribute.KeyValue{ | ||
telemetry.WithReason(telemetry.RequestErr), | ||
telemetry.WithDecision("accepted"), | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++, would be good to do as part of the followup.
processor/ratelimitprocessor/internal/metadatatest/generated_telemetrytest.go
Show resolved
Hide resolved
Signed-off-by: Andreas Gkizas <[email protected]>
Signed-off-by: Andreas Gkizas <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks you 🎉
This is the initial implementation for https://github.com/elastic/hosted-otel-collector/issues/513
The code adds a newMetrics reader in the ratelimitprocessor.
At the moment we have implemented only
otelcol_ratelimit.requests
metric.Int64CounterAlso the newMetricsReader makes use of the global otel processor that we assume that is initialized upon processor usage.
To be reviewed:
Sample document
After running
make smoketest
from thehosted-otel-collector
repo, I can see this document in Kibana: