-
Notifications
You must be signed in to change notification settings - Fork 301
Description
Priority
P2-High
OS type
Ubuntu
Hardware type
Xeon-SPR
Installation method
- Pull docker images from hub.docker.com
- Build docker images from source
Deploy method
- Docker compose
- Docker
- Kubernetes
- Helm
Running nodes
Single Node
What's the version?
Observed with latest chatqna.yaml (git 67394b8) where tei and teirerank containers use image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
** ctr -n k8s.io images ls | grep text-embeddings **
ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 application/vnd.oci.image.index.v1+json sha256:0502794a4d86974839e701dadd6d06e693ec78a0f6e87f68c391e88c52154f3f 48.2 MiB linux/amd64 io.cri-containerd.image=managed
ghcr.io/huggingface/text-embeddings-inference@sha256:0502794a4d86974839e701dadd6d06e693ec78a0f6e87f68c391e88c52154f3f application/vnd.oci.image.index.v1+json sha256:0502794a4d86974839e701dadd6d06e693ec78a0f6e87f68c391e88c52154f3f 48.2 MiB linux/amd64 io.cri-containerd.image=managed
Description
When managing CPU affinity (with NRI resource policies or Kubernetes cpu-manager) on a node and creating ChatQnA/kubernetes/manifests/xeon/chatqna.yaml, tei and teirerank containers do not handle properly their internal threading and thread-CPU affinities.
They seem to create a thread for every CPU in the system, yet they should create a thread for every CPU allowed for the container.
In the logs it looks like this:
**kubectl logs -n benchmark chatqna-teirerank-674b878d9c-sdkg9**
...
2024-09-06T07:10:06.082735Z INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
2024-09-06T07:10:06.095067Z WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for
thread: 80, index: 0, mask: {1, 65, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-06T07:10:06.095106Z WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for
thread: 81, index: 1, mask: {2, 66, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-06T07:10:06.095128Z WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for
thread: 82, index: 2, mask: {3, 67, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2
....
2024-09-06T07:10:06.260526Z WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for
thread: 88, index: 8, mask: {9, 73, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-06T07:10:08.576066Z WARN text_embeddings_router: router/src/lib.rs:267: Backend does not support a batch size > 8
2024-09-06T07:10:08.576082Z WARN text_embeddings_router: router/src/lib.rs:268: forcing `max_batch_requests=8`
2024-09-06T07:10:08.576195Z WARN text_embeddings_router: router/src/lib.rs:319: Invalid hostname, defaulting to 0.0.0.0
2024-09-06T07:10:08.579399Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1778: Starting HTTP server: 0.0.0.0:2082
2024-09-06T07:10:08.579418Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1779: Ready
And in the system's process/thread's CPU affinity level like this:
**grep Cpus_allowed_list /proc/2370247/task/2370*/status**
...
/proc/2370247/task/2370368/status:Cpus_allowed_list: 40-47
/proc/2370247/task/2370369/status:Cpus_allowed_list: 40-47
/proc/2370247/task/2370370/status:Cpus_allowed_list: 40-47
/proc/2370247/task/2370371/status:Cpus_allowed_list: 40-47
/proc/2370247/task/2370372/status:Cpus_allowed_list: 40-47
/proc/2370247/task/2370373/status:Cpus_allowed_list: 40
/proc/2370247/task/2370374/status:Cpus_allowed_list: 41
/proc/2370247/task/2370375/status:Cpus_allowed_list: 42 /proc/2370247/task/2370376/status:Cpus_allowed_list: 43
/proc/2370247/task/2370377/status:Cpus_allowed_list: 44
/proc/2370247/task/2370378/status:Cpus_allowed_list: 45
/proc/2370247/task/2370379/status:Cpus_allowed_list: 46
/proc/2370247/task/2370380/status:Cpus_allowed_list: 47
/proc/2370247/task/2370381/status:Cpus_allowed_list: 40-47
/proc/2370247/task/2370382/status:Cpus_allowed_list: 40-47
/proc/2370247/task/2370383/status:Cpus_allowed_list: 40-47
/proc/2370247/task/2370384/status:Cpus_allowed_list: 40-47
/proc/2370247/task/2370385/status:Cpus_allowed_list: 40-47
/proc/2370247/task/2370386/status:Cpus_allowed_list: 40-47
...
That is, only few threads got correct CPU pinning, the rest (that are way too many) run on all allowed CPUs for the container. As a result this destroys the performance of tei and teirerank on CPU.
The log looks like the ort library is trying to create a thread and set affinity for every CPU in the system while it should not try to use any other than allowed CPUs (limited by cgroups cpuset.cpus). Cannot say if the root cause is in the ort library or how it is used here.
Reproduce steps
- Install the balloons NRI policy to manage CPUs.
helm repo add nri-plugins https://containers.github.io/nri-plugins
helm install balloons nri-plugins/nri-resource-policy-balloons --set patchRuntimeConfig=true
- Replace the default balloons configuration with one that runs tei/tei-rerank on dedicated CPUs.
cat > chatqna-balloons.yaml << EOF
apiVersion: config.nri/v1alpha1
kind: BalloonsPolicy
metadata:
name: default
namespace: kube-system
spec:
allocatorTopologyBalancing: true
balloonTypes:
- name: tgi
allocatorPriority: high
minCPUs: 32
minBalloons: 1
preferNewBalloons: true
hideHyperthreads: true
matchExpressions:
- key: name
operator: Equals
values: ["tgi"]
- name: embedding
allocatorPriority: high
minCPUs: 16
minBalloons: 2
preferNewBalloons: true
hideHyperthreads: true
matchExpressions:
- key: name
operator: In
values:
- tei
- teirerank
- allocatorPriority: normal
minCPUs: 14
hideHyperthreads: false
name: default
namespaces:
- "*"
log:
debug: ["policy"]
pinCPU: true
pinMemory: false
reservedPoolNamespaces:
- kube-system
reservedResources:
cpu: "2"
EOF
kubectl delete -n kube-system balloonspolicy default
kubectl create -n kube-system -f balloons-chatqna.yaml
- Deploy the chatqna yaml
kubectl create -f ChatQnA/kubernetes/manifests/xeon/chatqna.yaml
- Follow logs from chatqna-tei and chatqna-teirerank.
Raw log
No response