-
Notifications
You must be signed in to change notification settings - Fork 287
Description
System Info
text-embeddings-interface:cpu-1.5
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
Run a container using the text-embeddings-interface:cpu-1.5
image so that cpuset.cpus
is limited in cgroups. This can be done using docker --cpuset-cpus ...
or Kubernetes NRI resource policies or CPU manager.
For instance, in system with 128 vCPU / 64 physical CPU cores, the output of text-generation-router
shows:
(Following clip is from the ChatQnA example application, kubectl logs chatqna-teirerank-...)
2024-09-09T11:54:19.994401Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-********-*ase", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "chatqna-teirerank-7fd4d88d85-z2nzh", port: 2082, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
---8<--- snip --->8---
2024-09-09T11:54:34.747212Z INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
2024-09-09T11:54:34.758273Z WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 80, index: 0, mask: {1, 65, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758288Z WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 84, index: 4, mask: {5, 69, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758307Z WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 81, index: 1, mask: {2, 66, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758353Z WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 83, index: 3, mask: {4, 68, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758355Z WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 82, index: 2, mask: {3, 67, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758391Z WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 85, index: 5, mask: {6, 70, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
...
That is, the model backend launches a wrong number of threads and tries to set CPU affinity of each thread to CPUs that are not allowed for this container.
Expected behavior
The model backend should align the number of threads with the number of CPUs available for it, and it should set CPU affinity of its threads only on available CPUs.