-
Notifications
You must be signed in to change notification settings - Fork 34
[ratelimiter] Remove wrong error, remove duplicate code #578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ratelimiter] Remove wrong error, remove duplicate code #578
Conversation
limitPercentUsed = 1.0 - min(1, float64(resp.Remaining)/float64(resp.Limit)) | ||
} | ||
|
||
limitBucket := getLimitThresholdBucket(limitPercentUsed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gizas @kaiyan-sheng I am (temporarily) removing this - I can add it again in this PR, I removed it just to clean up. Can you give me more details on the choice of adding this? And why local.go
does not have this? Maybe by mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also having trouble interpreting this attribute value. What does it mean? I understand the limit % part, but then I get lost on telemetry.WithLimitThreshold(limitBucket),
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the relevant test case here:
opentelemetry-collector-components/processor/ratelimitprocessor/gubernator_test.go
Line 311 in 1c294d1
limitUsedPercent := []int64{20, 30, 60, 80, 90, 95, 100} |
Seems we added a new attribute https://github.com/elastic/opentelemetry-collector-components/blob/main/processor/ratelimitprocessor/gubernator_test.go#L343C7-L343C16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gizas I am still having trouble on understanding this attribute. Can you help me on this?
So I understand limitPercentUsed
part. But then I get lost on limitBucket
.
I then have two questions (apart from understanding what limitBucket
means):
- Why doesn't local rate limiter have this attribute as well?
- What is the choice of adding this attribute?
If we want to keep this attribute, the code needs to change so RateLimit
inside the local/gubernator rate limiters starts returning a response struct instead of an error. For example:
type response struct {
limitBucket int
err error
}
I am trying to understand if this change is necessary (and how to do the same for the local rate limiter), or if we can discard this attribute for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is dictating the token bucket algorithm limit threshold in the guberator - Basically metric indicating what percentage of limit we are at/available? I am not sure how usable this is, feels like this has been carried over from MIS, I ll let @lahsivjar to comment if this was useful in his experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to see the code, seems that we calculate the limitbucket and we add it as an attribute. I dont see any other use.
And yes we should add it in local.go if we decide to add it.
I will wait for @kaiyan-sheng, @lahsivjar to comment. I would say we can remove it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should add it in local.go if we think this metric is useful so we are reporting the same metrics for both. This is from MIS indeed. We can leave it out for now if it's not gonna be useful or unsure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(apologies for not seeing this thread, my git notification management skills are on the decline)
I am not very familiar with the rate limiter code, but reviewing the comments in the above code, it appears that the purpose of limitBucket
is to give histogram-like features that can be plotted as a dimension with fixed cardinality.
While not that familiar with the code, I have used the dashboard that is powered by the above dimension, and it is useful to be able to see all the projects with >= than, say 75%, throttling limit. So, I would be in favor of keeping the limitBucket
in the metric as a dimension. For consistency, we can also add it to local rate limiter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @lahsivjar! I will add limitBucket
back and into local rate limiter in a separate pr.
assert.Equal(t, "ratelimit/abc123", req.Requests[0].Name) | ||
assert.Equal(t, "metadata_key:value2", req.Requests[0].UniqueKey) | ||
} | ||
|
||
// Ensures that specific metrics are returned in case of relevant responses of GetRateLimits | ||
func TestGubernatorRateLimiter_RateLimit_Meterprovider(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why u deleted this test case? I had added in order to check the different scenarios and the attributes :). Is it covered above in TestGubernatorRateLimiter_RateLimit
?
The TestRateLimitThresholdAttribute
is not tested for sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests for the telemetry metrics were moved to processor_test.go
, since the metrics are now only changed there. There is a test on traces, profiles, metrics and logs now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TestRateLimitThresholdAttribute is not tested for sure
Yes, this test is not there at the moment. I am trying to make sense of this attribute and how we can put it in both local and gubernator rate limiter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I am more awake 😆 Here is the link for the tests:
opentelemetry-collector-components/processor/ratelimitprocessor/processor_test.go
Line 269 in 4356def
func testRateLimitRequests(t *testing.T, tt testTelemetry) { |
And you can see that we check the telemetry for every rate limit function. For example, metrics rate limit:
opentelemetry-collector-components/processor/ratelimitprocessor/processor_test.go
Line 188 in 4356def
testRateLimitRequests(t, tt) |
@@ -204,183 +183,16 @@ func TestGubernatorRateLimiter_RateLimit_MetadataKeys(t *testing.T) { | |||
|
|||
// Each unique combination of metadata keys should get its own rate limit. | |||
err = rateLimiter.RateLimit(clientContext1, 1) | |||
assert.EqualError(t, err, "too many requests") | |||
assert.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these test changes come from the error introduced in PR #562 (throwing an error after the rate limiter has waited). I put the unit test back as it was since the beginning until that PR got merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few nits, but Looks good otherwise.
We also need to remove the limit threshold from metadata.yaml
limit_threshold: |
limitPercentUsed = 1.0 - min(1, float64(resp.Remaining)/float64(resp.Limit)) | ||
} | ||
|
||
limitBucket := getLimitThresholdBucket(limitPercentUsed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is dictating the token bucket algorithm limit threshold in the guberator - Basically metric indicating what percentage of limit we are at/available? I am not sure how usable this is, feels like this has been carried over from MIS, I ll let @lahsivjar to comment if this was useful in his experience.
testRateLimitRequests(t, tt) | ||
} | ||
|
||
func testRateLimitRequests(t *testing.T, tt testTelemetry) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a solid blocker, But we can make use of the metadatatest for these instead of creating manualreader, provider etc.
telemetry := componenttest.NewTelemetry()
tb, err := metadata.NewTelemetryBuilder(telemetry.NewTelemetrySettings())
require.NoError(t, err)
defer tb.Shutdown()
// construct expected attributes based on the metrics name and invoke mdatatest
expectedAttrs := attribute.NewSet(attrs...)
metadatatest.AssertEqualRatelimitRequests(t, tt, []metricdata.DataPoint[int64]{
{
Value: 2, // tune this
Attributes: expectedAttrs,
},
}, metricdatatest.IgnoreTimestamp())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this suggestion, thank you! I wasn't aware this existed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I ll defer to others for adding limit_threshold
if its deemed useful. Thanks
This PR: