Skip to content

Too many model backend threads destroy performance when running on CPU #405

@askervin

Description

@askervin

System Info

text-embeddings-interface:cpu-1.5

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Run a container using the text-embeddings-interface:cpu-1.5 image so that cpuset.cpus is limited in cgroups. This can be done using docker --cpuset-cpus ... or Kubernetes NRI resource policies or CPU manager.

For instance, in system with 128 vCPU / 64 physical CPU cores, the output of text-generation-router shows:
(Following clip is from the ChatQnA example application, kubectl logs chatqna-teirerank-...)

2024-09-09T11:54:19.994401Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-********-*ase", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "chatqna-teirerank-7fd4d88d85-z2nzh", port: 2082, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }

---8<--- snip --->8---

2024-09-09T11:54:34.747212Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
2024-09-09T11:54:34.758273Z  WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 80, index: 0, mask: {1, 65, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758288Z  WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 84, index: 4, mask: {5, 69, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758307Z  WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 81, index: 1, mask: {2, 66, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758353Z  WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 83, index: 3, mask: {4, 68, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758355Z  WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 82, index: 2, mask: {3, 67, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758391Z  WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 85, index: 5, mask: {6, 70, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
...

That is, the model backend launches a wrong number of threads and tries to set CPU affinity of each thread to CPUs that are not allowed for this container.

Expected behavior

The model backend should align the number of threads with the number of CPUs available for it, and it should set CPU affinity of its threads only on available CPUs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions