[Misc] SLO-aware router with profile support #1192

zhangjyr · 2025-06-12T23:01:15Z

Pull Request Description

Introducing SLO-aware router with profile support. This PR introduces three new SLO-aware routing policies:

slo (or slo-least-load-pulling)
slo-least-load
slo-pack-load
All three routing policies will prioritize requests with a profiled SLO target.

In addition to the slo-family routing policies, this PR added built-in queues to support request reordering and future delay scheduling. In particular, QueueRouter enables the pull mode within the gateway. Below is a comparison of pulling mode and default push mode:

Push mode: The router dispatches requests to the server, possibly overloading the server.
Pull mode: The server pulls requests from the router based on the server's capacity.
With profile support, the gateway now has server capacity knowledge and can achieve pull mode within the gateway.

Additional feature added in this PR:

Add a fallback routing policy mechanism for developers to designate a default routing policy if the specified routing policy fails.
Wrap the routing policy registration mechanism in RouterManager, allowing it to be reused for managing a family of related routing policies. (Including simplify Select() to follow RouterProviderFunc)
Add profile cache API to manage model-based performance profiles.
Improving the profile generator in the GPU manager to include SLO information and detailed metrics.

Preliminary results show SLO policy can achieve the SLO target for composite workload on heterogeneous GPUs:

Workload: mixed sharegpt and bird workload with a ratio of 7:4
GPU: 1A10, 4L20
SLO: Latency per token 0.05s

Related Issues

Resolves: #642 #606

Important: Before submitting, please complete the description above and review the checklist below.

Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

[Bug]: Corrections to existing functionality
[CI]: Changes to build process or CI pipeline
[Docs]: Updates or additions to documentation
[API]: Modifications to aibrix's API or interface
[CLI]: Changes or additions to the Command Line Interface
[Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

PR title includes appropriate prefix(es)
Changes are clearly explained in the PR description
New and existing tests pass successfully
Code adheres to project style and best practices
Documentation updated to reflect changes (if applicable)
Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

…load_aware_routing # Conflicts: # pkg/cache/cache.go # pkg/plugins/gateway/gateway.go # pkg/types/router.go

…load_aware_routing # Conflicts: # pkg/cache/cache.go # pkg/plugins/gateway/algorithms/least_busy_time.go # pkg/plugins/gateway/algorithms/least_kv_cache.go # pkg/plugins/gateway/algorithms/least_latency.go # pkg/plugins/gateway/algorithms/least_request.go # pkg/plugins/gateway/algorithms/prefix_cache.go # pkg/plugins/gateway/algorithms/prefix_cache_and_load.go # pkg/plugins/gateway/algorithms/random.go # pkg/plugins/gateway/algorithms/router.go # pkg/plugins/gateway/algorithms/router_test.go # pkg/plugins/gateway/algorithms/throughput.go # pkg/plugins/gateway/gateway.go # pkg/plugins/gateway/gateway_req_body.go

Add random routing policy to e2e test.

Signed-off-by: Jingyuan <[email protected]>

…routing # Conflicts: # cmd/plugins/main.go # pkg/cache/cache.go # pkg/cache/cache_test.go # pkg/cache/model.go # pkg/cache/pod.go # pkg/plugins/gateway/algorithms/prefix_cache.go # pkg/plugins/gateway/algorithms/prefix_cache_test.go # pkg/plugins/gateway/algorithms/router_test.go # pkg/plugins/gateway/gateway_req_body.go # pkg/plugins/gateway/gateway_rsp_body.go # pkg/types/router_context.go

…routing

…urrent safety and stateful routing Signed-off-by: Jingyuan Zhang <[email protected]>

Signed-off-by: Jingyuan Zhang <[email protected]>

…routing Signed-off-by: Jingyuan Zhang <[email protected]> # Conflicts: # cmd/controllers/main.go # cmd/metadata/main.go # cmd/plugins/main.go # pkg/cache/cache.go # pkg/cache/cache_test.go # pkg/cache/model.go # pkg/cache/pod.go # pkg/cache/registry_test.go # pkg/plugins/gateway/algorithms/least_busy_time.go # pkg/plugins/gateway/algorithms/least_kv_cache.go # pkg/plugins/gateway/algorithms/least_latency.go # pkg/plugins/gateway/algorithms/least_request.go # pkg/plugins/gateway/algorithms/prefix_cache_test.go # pkg/plugins/gateway/algorithms/router.go # pkg/plugins/gateway/algorithms/router_test.go # pkg/plugins/gateway/algorithms/throughput.go # pkg/plugins/gateway/gateway.go # pkg/plugins/gateway/gateway_req_body.go # pkg/plugins/gateway/gateway_rsp_body.go # pkg/types/router.go # pkg/types/router_context.go

Add PodList interface to replace utils.PodArray. Signed-off-by: Jingyuan Zhang <[email protected]>

Signed-off-by: Jingyuan Zhang <[email protected]>

…routing Signed-off-by: Jingyuan Zhang <[email protected]> # Conflicts: # pkg/cache/cache_api.go # pkg/cache/cache_init.go

…urrent safety and stateful routing Signed-off-by: Jingyuan Zhang <[email protected]>

…routing Signed-off-by: Jingyuan Zhang <[email protected]> # Conflicts: # cmd/plugins/main.go # pkg/cache/cache_api.go # pkg/cache/cache_impl.go # pkg/cache/cache_init.go # pkg/cache/cache_test.go # pkg/cache/cache_trace.go # pkg/cache/informers.go # pkg/cache/model.go # pkg/cache/pod.go # pkg/cache/trace.go # pkg/plugins/gateway/algorithms/least_busy_time.go # pkg/plugins/gateway/algorithms/least_kv_cache.go # pkg/plugins/gateway/algorithms/least_latency.go # pkg/plugins/gateway/algorithms/least_request.go # pkg/plugins/gateway/algorithms/prefix_cache.go # pkg/plugins/gateway/algorithms/prefix_cache_and_load.go # pkg/plugins/gateway/algorithms/prefix_cache_test.go # pkg/plugins/gateway/algorithms/random.go # pkg/plugins/gateway/algorithms/router.go # pkg/plugins/gateway/algorithms/router_test.go # pkg/plugins/gateway/algorithms/throughput.go # pkg/plugins/gateway/gateway.go # pkg/plugins/gateway/gateway_req_body.go # pkg/plugins/gateway/gateway_req_headers.go # pkg/plugins/gateway/gateway_rsp_body.go # pkg/types/router.go # pkg/types/router_context.go # pkg/utils/pod.go # test/e2e/routing_strategy_test.go

…load_aware_routing Signed-off-by: Jingyuan Zhang <[email protected]> # Conflicts: # cmd/plugins/main.go # pkg/cache/cache_api.go # pkg/cache/cache_impl.go # pkg/cache/cache_init.go # pkg/cache/cache_test.go # pkg/cache/cache_trace.go # pkg/cache/informers.go # pkg/cache/model.go # pkg/cache/pod.go # pkg/cache/trace.go # pkg/metrics/metrics.go # pkg/plugins/gateway/algorithms/prefix_cache_test.go # pkg/plugins/gateway/algorithms/router_test.go # pkg/plugins/gateway/gateway_req_body.go # pkg/plugins/gateway/gateway_rsp_body.go # pkg/types/router_context.go # pkg/types/router_context_test.go # pkg/utils/sync_map.go

Signed-off-by: Jingyuan Zhang <[email protected]>

gemini-code-assist · 2025-06-12T23:01:21Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Signed-off-by: Jingyuan Zhang <[email protected]>

Jeffwan · 2025-06-14T01:29:32Z

I notice there're some refactor changes (e.g. internal interface change etc) Technically, that affects other aspects, could it be some separate changes? I mean splitting the changes into common parts (stakeholder needs to review it) and slo specific changes (review could be loose and feature can be protected by feature gate).

If the splitting is too complicated, we can have 1st round review and check how to move forward

Signed-off-by: Jingyuan Zhang <[email protected]>

zhangjyr · 2025-06-14T04:41:26Z

@Jeffwan, I think the only internal interface change is the Select(). The function is called only in one place, and if you find it is not appropriate, we can restore it.

…load_aware_routing Signed-off-by: Jingyuan Zhang <[email protected]> # Conflicts: # pkg/plugins/gateway/algorithms/prefix_cache_preble.go

Jeffwan · 2025-06-16T23:34:55Z

cmd/plugins/main.go

@@ -77,7 +78,7 @@ func main() {
 		panic(err)
 	}

-	cache.InitForGateway(config, stopCh, redisClient)
+	cache.InitForGateway(config, stopCh, redisClient, routing.NewSLORouter)


is there better way to handle this case? routing.NewSLORouter is a specific solution but cmd/plugins/main.go is for common purpose? Can we have factory for such initialization?

Jeffwan · 2025-06-16T23:35:42Z

config/gateway/gateway-plugin/gateway-plugin.yaml

@@ -2,7 +2,7 @@ apiVersion: v1
 kind: Service
 metadata:
  name: gateway-plugins
-  namespace: aibrix-system
+  namespace: system


system will be overitten eventually to aibrix-system here right? changing to system to be aligned with default setting?

Jeffwan · 2025-06-16T23:38:54Z

pkg/cache/cache_api.go

+	// Parameters:
+	//   deploymentName: Name of the deployment
+	//   modelName: Name of the model
+	GetModelProfileByDeploymentName(deploymentName string, modelName string) (*ModelGPUProfile, error)


TODO: we may use other objects to orchestrate pods in future. in that case, deployment might be changed in future. This looks good at this moment.

one more problem is, deployment without namespace can not be used to identify a deployment. we need to append the namespace field

Jeffwan · 2025-06-16T23:40:23Z

pkg/cache/cache_impl.go

-				break
+	// Current implementation assumes AddRequestCount() will not be called concurrently.
+	// TODO: Implment "wait for trace term" logic if AddRequestCount() is called concurrently.
+	if ctx == nil || ctx.CanAddTrace() {


is the refactor for common case?

Jeffwan · 2025-06-16T23:41:46Z

pkg/cache/cache_init.go

 	metaModels utils.SyncMap[string, *Model] // model_name -> *Model

+	// Deploymnent related storage
+	deploymentProfiles utils.SyncMap[string, *ModelGPUProfile] // deployment_name -> *ModelGPUProfile


same here. we can use namespace/deployment as the key

Jeffwan · 2025-06-16T23:43:59Z

pkg/cache/informers.go

@@ -98,6 +97,7 @@ func (c *Store) addPod(obj interface{}) {
 	// only track pods with model deployments
 	modelName, ok := pod.Labels[modelIdentifier]
 	if !ok {
+		// klog.InfoS("ignored pod without model label", "name", pod.Name)


use log level instead?

Jeffwan · 2025-06-17T00:21:25Z

pkg/plugins/gateway/queue/simple_queue.go

+			}
+		}
+	}
+	q.queue, q.baseCursor = newQueue, q.baseCursor+dequeuePos


could it be a problem is the other goroutine invoke physicalPosRLocked? can we introduce something like and use it in physicalPosRLocked and setBaseCursor in expand?

func (q *SimpleQueue[V]) getBaseCursor() int64 { return atomic.LoadInt64(&q.baseCursor) }

Jeffwan · 2025-06-17T00:22:40Z

pkg/cache/output_predictor.go

+	atomic.AddInt32(&hist.size, -hist.Tail().getSkipped())
+}
+
+func NewSimmpleOutputPredictor(maxInputTokens, maxOutputTokens int, window time.Duration) *SimmpleOutputPredictor {


let's briefly talk about the algorithm here? as a comment

Jeffwan · 2025-06-17T00:25:04Z

pkg/types/router_context.go

 	debugDelay   time.Duration
+	tokens       []int
+	predictor    OutputPredictor


one of my concerns is which field can be used for profile disabled routing algorithms? As a routing algorithm developer, which field should I expected to be available if I enable/disable some features.

Jeffwan · 2025-06-17T00:27:15Z

pkg/plugins/gateway/queue/slo_queue.go

+	return
+}
+
+func (q *SLOQueue) higherRank(rank1 float64, rank2 float64) float64 {


directly return bool looks simplier

Jeffwan · 2025-06-17T00:28:14Z

pkg/plugins/gateway/queue/slo_queue.go

+	queueOverallSLO          bool = false
+	monogenousGPURouting     bool = true
+	monogenousGPURoutingOnly bool = monogenousGPURouting && false
+	initialTotalSubQueues    int  = 8  // Expect no more than 8 subqueues


are these magic numbers const or should be adjusted based on the available resources?

Signed-off-by: Jingyuan Zhang <[email protected]>

Jingyuan Zhang and others added 30 commits February 12, 2025 16:24

Prepare profile generator for router.

5652f12

Merge branch 'main' into jingyuan/load_aware_routing

e3e4d4f

Finish cache refactor and load utilization accounting

dc7c12c

Fix the missing part: trigger scheduling on there is spare capacity.

02b8392

Merge commit '9990ab80b86467e33eb6b1a12042ff73269a8ffa' into feature/…

91d401e

…load_aware_routing # Conflicts: # pkg/cache/cache.go # pkg/plugins/gateway/gateway.go # pkg/types/router.go

Cache and router refactoring for stateful router.

d00c6b6

remove unused code

c48804f

Pass existing tests.

f2bfc7d

Remove queue router

8cebab0

Add tests for pod model relationship and minor fixes.

9d77b54

Add random routing policy to e2e test.

remove unused file in this refactor

fe2fb18

Merge branch 'main' into feature/cache_router_refactor

632966b

Signed-off-by: Jingyuan <[email protected]>

Bug fix

57f8a03

Lint fix

2a6abc5

Bug fix

8d787d9

Bug fix and remove unnecessary log.

b43d3b0

Bug fix

323acf2

Add more tests for basic classes.

0999e1f

Merge branch 'feature/cache_router_refactor' into feature/load_aware_…

35ec945

…routing

Rebase: Cache and Router refactoring for concurrent performance, conc…

2c047f5

…urrent safety and stateful routing Signed-off-by: Jingyuan Zhang <[email protected]>

bug ifx

cea9419

Signed-off-by: Jingyuan Zhang <[email protected]>

Rename TraceCache to RequestTracker

97ed8c6

Add PodList interface to replace utils.PodArray. Signed-off-by: Jingyuan Zhang <[email protected]>

Bug fix: concurrent registry array update

77cc884

Signed-off-by: Jingyuan Zhang <[email protected]>

Merge branch 'feature/cache_router_refactor' into feature/load_aware_…

313e73e

…routing Signed-off-by: Jingyuan Zhang <[email protected]> # Conflicts: # pkg/cache/cache_api.go # pkg/cache/cache_init.go

Rebase: Cache and Router refactoring for concurrent performance, conc…

4d7347c

…urrent safety and stateful routing Signed-off-by: Jingyuan Zhang <[email protected]>

Jingyuan Zhang added 4 commits June 12, 2025 15:01

Merge branch 'main' into feature/load_aware_routing

daf3890

remove unused file

75b8459

Signed-off-by: Jingyuan Zhang <[email protected]>

Disable unrelated modification

3f92bb7

Signed-off-by: Jingyuan Zhang <[email protected]>

Change default SLO router

38ed814

Signed-off-by: Jingyuan Zhang <[email protected]>

zhangjyr requested review from Jeffwan, varungup90 and nwangfw June 12, 2025 23:01

zhangjyr and others added 7 commits June 12, 2025 16:07

Merge branch 'main' into feature/load_aware_routing

a931453

Python lint

db1ab57

Signed-off-by: Jingyuan Zhang <[email protected]>

Fix unit tests

e19c139

Signed-off-by: Jingyuan Zhang <[email protected]>

Improve for race test.

225bac8

Signed-off-by: Jingyuan Zhang <[email protected]>

Merge branch 'main' into feature/load_aware_routing

6822b84

Bug fix

3270ca8

Signed-off-by: Jingyuan Zhang <[email protected]>

Disable simple_queue_test in race test.

18562f2

Signed-off-by: Jingyuan Zhang <[email protected]>

Jingyuan Zhang added 2 commits June 13, 2025 20:51

Disable race tests.

449bf19

Signed-off-by: Jingyuan Zhang <[email protected]>

Disable race tests.

81d1cbc

Signed-off-by: Jingyuan Zhang <[email protected]>

zhangjyr and others added 2 commits June 13, 2025 22:27

Merge branch 'main' into feature/load_aware_routing

cb00177

Merge commit '6cd953f203f631b6db86d79fb1bd7064cbf1f668' into feature/…

63a03af

…load_aware_routing Signed-off-by: Jingyuan Zhang <[email protected]> # Conflicts: # pkg/plugins/gateway/algorithms/prefix_cache_preble.go

Jeffwan reviewed Jun 16, 2025

View reviewed changes

Jeffwan reviewed Jun 17, 2025

View reviewed changes

Jingyuan Zhang added 7 commits June 17, 2025 20:55

Make gpu benchmark support both jsonl and json

57b8c77

Signed-off-by: Jingyuan Zhang <[email protected]>

Bug fix and lint fix

54dfa61

Signed-off-by: Jingyuan Zhang <[email protected]>

Support workload from new generator

7a1cbe6

Signed-off-by: Jingyuan Zhang <[email protected]>

Bug fix

f2faf80

Signed-off-by: Jingyuan Zhang <[email protected]>

Bug fix

7cddb5a

Signed-off-by: Jingyuan Zhang <[email protected]>

Fix interval degradation

a2c4bb8

Signed-off-by: Jingyuan Zhang <[email protected]>

Lint fix

2de4a84

Signed-off-by: Jingyuan Zhang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Misc] SLO-aware router with profile support #1192

[Misc] SLO-aware router with profile support #1192

Uh oh!

zhangjyr commented Jun 12, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jun 12, 2025

Uh oh!

Jeffwan commented Jun 14, 2025

Uh oh!

zhangjyr commented Jun 14, 2025

Uh oh!

Jeffwan Jun 16, 2025

Uh oh!

Jeffwan Jun 16, 2025

Uh oh!

Jeffwan Jun 16, 2025

Uh oh!

Jeffwan Jun 16, 2025

Uh oh!

Jeffwan Jun 16, 2025

Uh oh!

Jeffwan Jun 16, 2025

Uh oh!

Jeffwan Jun 17, 2025

Uh oh!

Jeffwan Jun 17, 2025

Uh oh!

Jeffwan Jun 17, 2025

Uh oh!

Jeffwan Jun 17, 2025

Uh oh!

Jeffwan Jun 17, 2025

Uh oh!

Uh oh!

[Misc] SLO-aware router with profile support #1192

Are you sure you want to change the base?

[Misc] SLO-aware router with profile support #1192

Uh oh!

Conversation

zhangjyr commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Description

Related Issues

Pull Request Title Format

Submission Checklist

Uh oh!

gemini-code-assist bot commented Jun 12, 2025

Uh oh!

Jeffwan commented Jun 14, 2025

Uh oh!

zhangjyr commented Jun 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhangjyr commented Jun 12, 2025 •

edited

Loading