Skip to content

[ratelimiter] Remove wrong error, remove duplicate code #578

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 29, 2025

Conversation

constanca-m
Copy link
Contributor

This PR:

@constanca-m constanca-m requested a review from a team as a code owner May 26, 2025 10:13
limitPercentUsed = 1.0 - min(1, float64(resp.Remaining)/float64(resp.Limit))
}

limitBucket := getLimitThresholdBucket(limitPercentUsed)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gizas @kaiyan-sheng I am (temporarily) removing this - I can add it again in this PR, I removed it just to clean up. Can you give me more details on the choice of adding this? And why local.go does not have this? Maybe by mistake?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also having trouble interpreting this attribute value. What does it mean? I understand the limit % part, but then I get lost on telemetry.WithLimitThreshold(limitBucket),.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@constanca-m constanca-m May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gizas I am still having trouble on understanding this attribute. Can you help me on this?

So I understand limitPercentUsed part. But then I get lost on limitBucket.

I then have two questions (apart from understanding what limitBucket means):

  • Why doesn't local rate limiter have this attribute as well?
  • What is the choice of adding this attribute?

If we want to keep this attribute, the code needs to change so RateLimit inside the local/gubernator rate limiters starts returning a response struct instead of an error. For example:

type response struct {
  limitBucket int
  err error
}

I am trying to understand if this change is necessary (and how to do the same for the local rate limiter), or if we can discard this attribute for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is dictating the token bucket algorithm limit threshold in the guberator - Basically metric indicating what percentage of limit we are at/available? I am not sure how usable this is, feels like this has been carried over from MIS, I ll let @lahsivjar to comment if this was useful in his experience.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to see the code, seems that we calculate the limitbucket and we add it as an attribute. I dont see any other use.

And yes we should add it in local.go if we decide to add it.

I will wait for @kaiyan-sheng, @lahsivjar to comment. I would say we can remove it for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should add it in local.go if we think this metric is useful so we are reporting the same metrics for both. This is from MIS indeed. We can leave it out for now if it's not gonna be useful or unsure.

Copy link
Contributor

@lahsivjar lahsivjar May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(apologies for not seeing this thread, my git notification management skills are on the decline)

https://github.com/elastic/apm-ingest-service/blob/baaf48072f5be1df0ca5e6dbaef351b96233ae0a/internal/ratelimit/ratelimit.go#L90-L98

I am not very familiar with the rate limiter code, but reviewing the comments in the above code, it appears that the purpose of limitBucket is to give histogram-like features that can be plotted as a dimension with fixed cardinality.

While not that familiar with the code, I have used the dashboard that is powered by the above dimension, and it is useful to be able to see all the projects with >= than, say 75%, throttling limit. So, I would be in favor of keeping the limitBucket in the metric as a dimension. For consistency, we can also add it to local rate limiter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @lahsivjar! I will add limitBucket back and into local rate limiter in a separate pr.

assert.Equal(t, "ratelimit/abc123", req.Requests[0].Name)
assert.Equal(t, "metadata_key:value2", req.Requests[0].UniqueKey)
}

// Ensures that specific metrics are returned in case of relevant responses of GetRateLimits
func TestGubernatorRateLimiter_RateLimit_Meterprovider(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why u deleted this test case? I had added in order to check the different scenarios and the attributes :). Is it covered above in TestGubernatorRateLimiter_RateLimit ?

The TestRateLimitThresholdAttribute is not tested for sure

Copy link
Contributor Author

@constanca-m constanca-m May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests for the telemetry metrics were moved to processor_test.go, since the metrics are now only changed there. There is a test on traces, profiles, metrics and logs now

Copy link
Contributor Author

@constanca-m constanca-m May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TestRateLimitThresholdAttribute is not tested for sure

Yes, this test is not there at the moment. I am trying to make sense of this attribute and how we can put it in both local and gubernator rate limiter.

Copy link
Contributor Author

@constanca-m constanca-m May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I am more awake 😆 Here is the link for the tests:

func testRateLimitRequests(t *testing.T, tt testTelemetry) {

And you can see that we check the telemetry for every rate limit function. For example, metrics rate limit:

@@ -204,183 +183,16 @@ func TestGubernatorRateLimiter_RateLimit_MetadataKeys(t *testing.T) {

// Each unique combination of metadata keys should get its own rate limit.
err = rateLimiter.RateLimit(clientContext1, 1)
assert.EqualError(t, err, "too many requests")
assert.NoError(t, err)
Copy link
Contributor Author

@constanca-m constanca-m May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these test changes come from the error introduced in PR #562 (throwing an error after the rate limiter has waited). I put the unit test back as it was since the beginning until that PR got merged.

Copy link
Member

@vigneshshanmugam vigneshshanmugam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nits, but Looks good otherwise.

We also need to remove the limit threshold from metadata.yaml

if we agree to get the attribute removed.

limitPercentUsed = 1.0 - min(1, float64(resp.Remaining)/float64(resp.Limit))
}

limitBucket := getLimitThresholdBucket(limitPercentUsed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is dictating the token bucket algorithm limit threshold in the guberator - Basically metric indicating what percentage of limit we are at/available? I am not sure how usable this is, feels like this has been carried over from MIS, I ll let @lahsivjar to comment if this was useful in his experience.

testRateLimitRequests(t, tt)
}

func testRateLimitRequests(t *testing.T, tt testTelemetry) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a solid blocker, But we can make use of the metadatatest for these instead of creating manualreader, provider etc.

telemetry := componenttest.NewTelemetry()
tb, err := metadata.NewTelemetryBuilder(telemetry.NewTelemetrySettings())
require.NoError(t, err)
defer tb.Shutdown()

// construct expected attributes based on the metrics name and invoke mdatatest
expectedAttrs := attribute.NewSet(attrs...)
metadatatest.AssertEqualRatelimitRequests(t, tt, []metricdata.DataPoint[int64]{
		{
			Value:      2, // tune this
			Attributes: expectedAttrs,
		},
	}, metricdatatest.IgnoreTimestamp())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this suggestion, thank you! I wasn't aware this existed.

Copy link
Member

@vigneshshanmugam vigneshshanmugam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I ll defer to others for adding limit_threshold if its deemed useful. Thanks

@constanca-m constanca-m merged commit 66579a5 into elastic:main May 29, 2025
13 checks passed
@constanca-m constanca-m deleted the ratelimiter-spawn-gubernator branch May 29, 2025 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants