Description
Describe the feature
Right now, the dynamic router config only accepts a JSON file through the --dynamic-config-json
option.
It would be nice if we could pass a config file formatted in a more user-friendly manner (top-level keys would be model names). For instance, we implemented some wrappers to let us define a more readable YAML config file (see below).
Why do you need this feature?
Right now, our JSON config would have looked like this:
{
"service_discovery": "static",
"routing_logic": "roundrobin",
"static_backends": "https://endpoint1.example.com/bge-m3,https://endpoint2.example.com/bge-m3,https://endpoint3.example.com/bge-m3,https://endpoint4.example.com/bge-m3",
"static_models": "bge-m3,bge-m3,bge-reranker-v2-m3,bge-reranker-v2-m3"
}
However, this syntax is not really convenient when having several models/backends. Also some config options (e.g., --callbacks
, --static-model-types
) are not optional fields so it would have forced us to keep in sync the JSON config file and how we actually start vllm-router.
For instance, in order to perform health checks, we have to run it with --static-backend-health-checks --static-model-types embeddings,rerank
options.
Instead, we decided to add extra logic to support the following YAML config file:
---
routing-logic: roundrobin
service-discovery: static
callbacks: callbacks.custom_callbacks.custom_callback_handler_instance
models:
bge-m3:
endpoints:
- https://endpoint1.example.com/bge-m3
- https://endpoint2.example.com/bge-m3
model_type: embeddings
bge-reranker-v2-m3:
endpoints:
- https://endpoint3.example.com/bge-reranker-v2-m3
- https://endpoint4.example.com/bge-reranker-v2-m3
model_type: rerank
We then parse this file in our Docker entrypoint and can start vllm-router with the right config options (context: we deploy it as a Docker image within a kubernetes cluster).
Additional context
Creating a new --dynamic-config-yaml
config option seems logical for this but we would have to think about how this new YAML file co-exists with the already-available options (JSON and "CLI options").
Maybe a refactoring of the JSON config file would be first required to align things. This would be a breaking change though.
WDYT folks?