-
Notifications
You must be signed in to change notification settings - Fork 287
Closed
Description
System Info
Image: v1.5 CPU
Model used: intfloat/multilingual-e5-large
Deployment: Docker
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
When using the latest cpu image with ONNX support, running the model intfloat/multilingual-e5-large doesnt work:
docker run --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 --model-id intfloat/multilingual-e5-large
cpu-1.5: Pulling from huggingface/text-embeddings-inference
Digest: sha256:0502794a4d86974839e701dadd6d06e693ec78a0f6e87f68c391e88c52154f3f
Status: Image is up to date for ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
2024-07-12T10:41:32.130048Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "int*****/************-**-**rge", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "91e8108076dd", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-07-12T10:41:32.130194Z INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-07-12T10:41:32.212394Z INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
2024-07-12T10:41:33.091211Z INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
2024-07-12T10:41:33.218900Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
2024-07-12T10:41:33.218946Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
2024-07-12T10:41:33.473482Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
2024-07-12T10:41:41.465955Z INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
2024-07-12T10:41:41.608869Z WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:317: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/intfloat/multilingual-e5-large/resolve/main/model.onnx)
2024-07-12T10:41:41.608925Z INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:318: Downloading `onnx/model.onnx`
2024-07-12T10:41:42.273395Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 9.0544947s
2024-07-12T10:41:42.865307Z INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
2024-07-12T10:41:42.865553Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 8 tokenization workers
2024-07-12T10:41:45.079783Z INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
Error: Could not create backend
Caused by:
Could not start backend: Failed to create ONNX Runtime session: Deserialize tensor encoder.layer.10.attention.output.dense.bias failed.GetFileLength for /data/models--intfloat--multilingual-e5-large/snapshots/ab10c1a7f42e74530fe7ae5be82e6d4f11a719eb/onnx/model.onnx_data failed:Invalid fd was supplied: -1
Expected behavior
This issue stems from the specific model using the additional file model.onnx_data
, in which the real onnx data is persisted.
This file is never downloaded by TEI.
The backend should download all necessary files to run the onnx model.
dbc-2024
Metadata
Metadata
Assignees
Labels
No labels